Online Book Reader

Home Category

Mastering Algorithms With C - Kyle Loudon [169]

By Root 1543 0

free(compressed);

return 0;

}

/*****************************************************************************

* *

* ------------------------------- recv_comp ------------------------------ *

* *

*****************************************************************************/

int recv_comp(int s, unsigned char **data, int *size, int flags) {

unsigned char *compressed;

int size_comp;

/*****************************************************************************

* *

* Receive the compressed data preceded by its size. *

* *

*****************************************************************************/

if (recv(s, (char *)&size_comp, sizeof(int), flags) != sizeof(int))

return -1;

if ((compressed = (unsigned char *)malloc(size_comp)) == NULL)

return -1;

if (recv(s, (char *)compressed, size_comp, flags) != size_comp) {

free(compressed);

return -1;

}

/*****************************************************************************

* *

* Uncompress the data. *

* *

*****************************************************************************/

if ((*size = huffman_uncompress(compressed, data)) < 0)

return -1;

/*****************************************************************************

* *

* Free the buffer of compressed data. *

* *

*****************************************************************************/

free(compressed);

return 0;

}

Description of LZ77


LZ77 ( Lempel-Ziv-1977) is a simple but surprisingly effective form of data compression that takes an entirely different approach from Huffman coding. LZ77 is a dictionary-based method, which means that it tries to compress data by encoding long strings of symbols, called phrases, as small tokens that reference entries in a dictionary. Compression is achieved by using relatively small tokens in place of longer phrases that appear several times in the data. As with Huffman coding, it is important to realize that a symbol is not necessarily a character of text: a symbol can be any amount of data we choose, but it is often one byte's worth.

Maintaining a Dictionary of Phrases


Different dictionary-based compression methods use various approaches for maintaining their dictionaries. LZ77 uses a look-ahead buffer and a sliding window . LZ77 works by first loading a portion of the data into the look-ahead buffer. To understand how the look-ahead buffer stores phrases that effectively form a dictionary, picture the buffer as a sequence of symbols s 1, . . . , sn , and Pb as a set of phrases constructed from the symbols. From the sequence s 1, . . . , sn , we form n phrases, defined as:

Pb = {(s1), (s1, s2), . . . ,(s1, . . . ,sn )}

This means that if the look-ahead buffer contains the symbols (A, B, D), for example, the phrases in the buffer are {(A), (A, B), (A, B, D)}. Once data passes through the look-ahead buffer, it moves into the sliding window and becomes part of the dictionary. To understand how phrases are represented in the sliding window, consider the window to be a sequence of symbols s 1, . . ., sm , and Pw to be a set of phrases constructed from these symbols. From the sequence s 1, . . ., sm , we form the set of phrases as follows:

Pw = {p1, p2, . . . , pm}, where pi = {(si), (si, si+1), . . . , si, si+1, . . . , sm)}

Thus, if the sliding window contains the symbols (A, B, C), the phrases in the window, and hence the dictionary, are {(A), (A, B), (A, B, C), (B), (B, C), (C)}. The main idea behind LZ77 is to look continually for the longest phrase in the look-ahead buffer that matches a phrase currently in the dictionary. In the look-ahead buffer and sliding window just described, the longest match is the two-symbol phrase (A, B).

Compressing and Uncompressing Data


As we compress the data, two situations can exist between the look-ahead buffer and the sliding window at any given moment: there can either be a phrase of some length that matches, or there may be no match at all. When there is at least one match, we encode the longest match as a phrase token . Phrase tokens contain

Return Main Page Previous Page Next Page

®Online Book Reader