Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Mastering Algorithms With C - Kyle Loudon [170]

By Root 1472 0

three pieces of information: the offset in the sliding window where the match begins, the number of symbols in the match, and the first symbol in the look-ahead buffer after the match. When there is no match, we encode the unmatched symbol as a symbol token . Symbol tokens simply contain the unmatched symbol itself, so no compression is accomplished. In fact, we will see that symbol tokens actually contain one bit more than the symbol itself, so a slight expansion occurs.

Once the appropriate token has been generated that encodes some number of symbols n, we shift n symbols out one end of the sliding window and replace them at the other end by the same number of symbols shifted out of the look-ahead buffer. Next, we refill the look-ahead buffer. This process keeps the sliding window up to date with only the most recent phrases. The exact number of phrases maintained by the sliding window and look-ahead buffer depends on their size.

Figure 14.2 illustrates the compression of a string using LZ77 with a sliding window of 8 bytes and a look-ahead buffer of 4 bytes. In practice, typical sizes for sliding windows are around 4K (4096 bytes). Look-ahead buffers are generally less than 100 bytes.

Figure 14.2. Compressing the string ABABCBABABCAD using LZ77

We uncompress data by decoding tokens and keeping the sliding window updated in a manner analogous to the compression process. As we decode each token, we copy the symbols that the token encodes into the sliding window. Whenever we encounter a phrase token, we consult the appropriate offset in the sliding window and look up the phrase of the specified length that we find there. Whenever we encounter a symbol token, we generate the single symbol stored in the token itself. Figure 14.3 illustrates uncompressing the data compressed in Figure 14.2.

Figure 14.3. Uncompressing the string compressed in Figure 14.2 using LZ77

Effectiveness of LZ77

The amount of compression achieved using LZ77 depends on a number of factors, such as the size chosen for the sliding window, the size set for the look-ahead buffer, and the entropy of the data itself. Ultimately, the amount of compression depends on the number of phrases we are able to match and their lengths. In most cases, LZ77 results in better compression ratios than Huffman coding, but compression times are considerably slower.

Compressing data with LZ77 is time-consuming because we spend a lot of time searching the sliding window for matching phrases. However, in general, uncompressing data with LZ77 is even faster than ucompressing data with Huffman coding. Uncompressing data with LZ77 is fast because each token tells us exactly where to read symbols out of the buffer. In fact, we end up reading from the sliding window only as many symbols as in the original data.

Interface for LZ77

Name

lz77_compress

Synopsis

int lz77_compress(const unsigned char *original, unsigned char **compressed,

int size);

Return Value

Number of bytes in the compressed data if compressing the data is successful, or -1 otherwise.

Description

Uses LZ77 to compress a buffer of data specified by original, which contains size bytes. The compressed data is written to a buffer returned in compressed. Since the amount of storage required in compressed is unknown to the caller, lz77_compress dynamically allocates the necessary storage using malloc. It is the responsibility of the caller to free this storage using free when it is no longer needed.

Complexity

O (n), where n is the number of symbols in the original data.

Name

lz77_uncompress

Synopsis

int lz77_uncompress(const unsigned char *compressed, unsigned char **original);

Return Value

Number of bytes in the restored data if uncompressing the data is successful, or -1 otherwise.

Description

Uses LZ77 to uncompress a buffer of data specified by compressed. It is assumed that the buffer contains data previously compressed with lz77_compress. The restored data is written to a buffer returned in original. Since the amount of storage required in original may not be known

Online Book Reader

Mastering Algorithms With C - Kyle Loudon [170]

®Online Book Reader