Digital Document Flow: DjVu vs PDF

The majority of companies sooner or later encounter the need to implement a digital document flow system. At the same time, one of the most important questions is the choice of a format in which the documents will be submitted, stored and sent to other firms or distant subdivisions of the same company. Today, we will compare two such formats: PDF and DjVu.

A Few Words About DjVu

The history of DjVu began in 1996. At that time, one of the subdivisions of the company AT&T began work on the creation of a new format. The main task that was put before the developers was to create a technology of storing and sending scanned documents, i.e., all documents in electronic form. However, at that time, few people were able to predict that in the future, companies would exchange agreements and contracts via the global network. Therefore, documents were understood to mean scanned magazines, newspaper, books and technical documentation. In 1998, the first plug-ins appeared for viewing DjVu documents with the aid of browsers.

In 1999, a new version of the DjVu format was published. The feature of uniting several images in one file with the possibility of browsing pages, as well as the ability to add so called “hot spots”, which were like hyperlinks, were implemented. The next year, the third version of the DjVu format appeared. At about the same time, the company AT&T sold its development to the firm LizardTech, which began to use it in its own commercial products. At the same time, the format itself remained a free file format. As a result, there exist today many free software products for working with the DjVu format.

By itself, the DjVu format is a very interesting solution, presenting itself as a technology of compressing an image with losses. Essentially, it divides a document into three layers: background, foreground and a black and white mask. Each of these is processed in its own way. In addition to this, a whole range of various technologies are used: an algorithm for the separation of text from the background in a scanned image; IW44, a wavelet-based compression algorithm; JB2, an algorithm for the compression of black and white images; ZP, a universal compression algorithm; and an algorithm for unpacking upon request. Such an approach allows for a maximum degree of compression while undergoing a minimum of distortion. One scanned page from a book in the DjVu format has a size of only 10-25 kilobytes.

A Few Words About PDF

The first the public heard about PDF (Portable Document Format) was in 1991. It was then that the company Adobe Systems announced it as a new technology for presenting in an electronic form any printable product. The new format turned out to be very successful. Already in the following year, the PDF format received a prize at the prestigious international exhibition Comdex Fall. The company Adobe Systems spent a lot of attention on developing its invention. In 1994, PDF format version 1.1 appeared, in which the support of external links, device independent color, article threads, as well as commentaries and some features concerning security. The next version of the abovementioned format appeared in 1996. In it were implemented support for OPI 1.3 specifications and CYMK color space.

PDF format 1.3 was released in 1999. It differed from the previous version with support for 2-byte CID fonts, OPI 2.0 specifications and the presence of technology that allowed for smooth shades and gradients. The next version of the format was published in 2001. In it appeared the following important features, such as support for transparency, 128-bit encryption and the option of setting printing quality. And finally, the last version at the present day is version 1.7 of PDF format, which appeared in 2006.

The main differentiating feature of the PDF format is its wide distribution. Today, many manuals, periodical publications, technical documentation and other forms of documents are presented in this very format. What is more, on practically every computer software is installed for viewing documents in PDF format.

DjVu and PDF Formats: Various Uses

If you speak of using the format DjVu and PDF for the organization of a digital document flow system, then it is worth paying attention to each of their particularities, which are as follows. The main benefit of the DjVu format is its small file size. This is especially good for working with documents in which there are many indecipherable elements: pictures, plans and formulas. In addition, DjVu works wonderfully in situations when it is necessary to send not only the text itself, but also its settings: colors and patterns of the original as well as existing defects and traces which other objects left. In such a way, DjVu works wonderfully for the storage of technical documentation such as instructions and manuals, as well as historical and simply rare documents. At present there exists a wide library of every possible type of literature on the global network, which was recorded in this format.

However, it is worth noting that each time, while speaking about files in the DjVu format, we are alluding to scanned documents which really exist on paper or another medium. This is not so simple. The given format was originally created for the storage of scanned copies of documents. And today there are virtually no instruments, nor is there any need, to create DjVu files otherwise.

There are many disadvantages to using the DjVu format for digital document flow systems. For one thing, it compresses data with losses. This isn’t very good when you are talking about contracts, acts and other legal documents. Secondly, regarding the fact that DjVu is not widely distributed does not allow companies to freely use if for exchanging information with partners and clients. For the majority of corporate as well as home computers, software for viewing such electronic documents simply is not present. Thirdly, this format is completely lacking in means to secure safety and confidentiality for documents.

PDF is more suited for organizing digital document flow systems. The most important thing is that PDF is widely distributed. As we have already mentioned, practically every personal computer, irrespective of which operating system it is running, has software installed for not only reading PDF documents, but also for “materializing” them on any printing device. Not only that, but also in the exact same form in which they were created. In such a way, the problem of compatibility is eliminated – one of the most serious problems in the process of organizing a digital document flow system.

The second advantage of the PDF format is that it has built in defense against unsanctioned access. With its aid, the user can secure any document from being viewed or used by other people. Cryptographic technology is used as a means of protection, tested many times by both software developers and experts in information security.

And finally, the third advantage of PDF is that it is standardized by the ISO (International Organization for Standardization). At the moment, this format already possesses the status as a standard for storing archive documents and for the exchange of information between companies in electronic form. And that is only the beginning. The developers of this format plan on passing its specifications on to the public organization The Association for Information and Image Management. In this case, it is highly likely that PDF will become the international standard for digital document flow systems.

What conclusions can you draw from the above? It turns out that the formats DjVu and PDF cannot be considered as rivals. They are specially designed for solving different tasks, and therefore do not compete, but complement each other. As the basis for a digital document flow system, it is better to choose, of course, the PDF format, which has become the worldwide de-facto standard. It is worth noting that in certain companies, there already exist digital DjVu archives of technical and other documentation created by means of scanned paper documents. However, this does not form a basis for choosing this particular format. It is better to construct a modern, working digital document flow system and simply convert preexisting DjVu files into the PDF format.

How to Convert Documents from DjVu into PDF

To implement a system of digital document flow, ordinary software for working with files of the chosen format is used, including that which is distributed free of charge. However, there is one task that such software cannot deal with. We are talking about converting documents from the DjVu format into PDF. For solving such a task, one must use additional software such as Universal Document Converter. This is a universal product for converting documents into various formats, which works perfectly in our case.

The underlying principle of how the program Universal Document Converter works is as a virtual printer. During the process of installation, it creates in the operating system an additional printing device that is accessible to any program. Printing on this printer, the user receives the file in the needed format. Such an approach is more convenient in practice. Firstly, it allows the process of conversion to take place quickly, practically without any preparatory work. Secondly, it is so simple that it does not require any training for the end users.

The algorithm for converting a file from the DjVu format into PDF document with the aid of the program Universal Document Converter is as follows. First the user needs to open the needed DjVu file. This may be done using the program Internet Explorer with a previously installed special plug-in. After this, the user only needs to press the button “print” and choose Universal Document Converter as the printing device.
  • Kathleen Dodge-DeHaven

    Organist and Choir Director in St Augustine and St Mary Catholic Churches

    «Universal Document Converter is simple to use, and customer support is excellent: should you need to contact them, you can expect a prompt, courteous, and informed reply. I highly recommend this excellent program!»