PACKBITS – Lossless Compression Method in TIFF Files
A while ago Vladimir, a regular customer, sent the following message to Universal Document Converter’s technical support: “Right now we have a lot of Xerox 6030/6050 machines that support TIFFs that are either uncompressed or compressed using the PACKBITS compression scheme. The large file size of an uncompressed TIFF is unacceptable to us, but PACKBITS compression was removed from Universal Document Converter 5. Would it be possible to restore PACKBITS compression? And how much would that cost?”
This article focuses on PACKBITS, an obsolete data compression technique.
The PACKBITS algorithm relies on one of the oldest and simplest forms of compression: run-length encoding (RLE). Redundant information that exists as repeated data is represented more succinctly.
Given that Universal Document Converter works with raster images, it makes sense to illustrate the PACKBITS algorithm on an image.
Every raster image file is a set of pixels of various colors along with headers that store information such as the image’s dimensions, color depth, palette, etc. Pixel colors are written sequentially as rows from left to right, top to bottom. In fact, the main information stored in a raster image file consists of a single string of pixels. That’s what takes up most of the memory, and that’s what image compression algorithms are designed to shrink.
The following figure represents a 4-pixel by 4-pixel, black-and-white image. Because it is uncompressed, this image is written to the file as the following sequence: WWWWWBBWWBBWWWWW, where W is white and B is black. This string clearly contains redundancy in the form of repeated symbols. Suppose each pixel in the image is described by a single byte, i.e. 8-bit grayscale.
The PACKBITS algorithm encodes the redundant information by storing series (runs) of identical pixel color. A control byte N marks the beginning of each run. If N falls between -1 and -127, then the symbol that follows the control byte is repeated N times. In other words, the run WWWWWW would be stored as -6W. But if N falls between 0 and 127, then the following N symbols are left uncompressed. For example, WBWBWBWBWBWB would be stored as 12WBWBWBWBWBWB.
Obviously, the longest run that can be compressed consists of 128 bytes.
Thus, after being compressed with PACKBITS the image presented above (WWWWWBBWWBBWWWWW) becomes -5W-2B-2W-2B-5W. We now have just 10 symbols instead of 16, so the data has been compressed by 6 bytes.
Moreover, the greater the number of consecutive pixels of identical color an image has, the more effective the PACKBITS algorithm will be. For example, after compression the string WWWWWWWWWWWWWWWW would be -16W, e.g. after compression that are only 2 bytes rather than 16 (assuming the color white is encoded by a single byte). However, this algorithm is useless if the run to be encoded contains few repeated symbols. For example, PACKBITS compression would be entirely inappropriate for the image given below:
Assuming 8-bit color depth, the image would be represented as RGGBGGBYYBYYBYRR and take up 16 bytes of memory. After being “compressed” by the PACKBITS algorithm, the string would become 1R-2G1B-2G1B-2Y1B-2Y1B1Y-2R. That’s 22 bytes. In this case, the compressed image will occupy more disk space than the uncompressed image – no matter how paradoxical that sounds.
Two images are presented below: one is black-and-white (1-bit) and the other is color (24-bit).
Using the PACKBITS algorithm, the first image compresses from 685 KB to 79 KB, while preserving other image properties, e.g. color depth, dimensions. PACKBITS compression of the second image increases its size from 2882 KB to 2901 KB.
Clearly, RLE-based compression, and PACKBITS specifically, is only effective for files that contain a large number of runs of repeated data. As applied to images, black-and-white charts and documents are most relevant. This sort of compression does not work well in cases where an image contains lots of color transitions, hues, or gradients.
The advantage of this algorithm over more complex (and more effective) techniques lies in its simplicity and speed. Implementing the PACKBITS algorithm requires just 256 bytes of memory and very little processing power.
PACKBITS was one of the first compression algorithms, so it works in most hardware and software tools regardless of their capabilities. Of course, despite its limited popularity today, we added the ability to compress TIFFs using PACKBITS to Universal Document Converter 5.5.
And because we value our customers and want our software to meeting their needs, Vladimir received the latest version of Universal Document Converter completely free of charge.