Online Book Reader

Home Category

Mastering Algorithms With C - Kyle Loudon [162]

By Root 1488 0
dynamically allocates the necessary storage using malloc. It is the responsibility of the caller to free this storage using free when it is no longer needed.

Complexity

O (n), where n is the number of symbols in the original data.

Name


huffman_uncompress

Synopsis

int huffman_uncompress(const unsigned char *compressed, unsigned

char **original);

Return Value

Number of bytes in the restored data if uncompressing the data is successful, or -1 otherwise.

Description

Uses Huffman coding to uncompress a buffer of data specified by compressed. It is assumed that the buffer contains data previously compressed with huffman_compress. The restored data is written to a buffer returned in original. Since the amount of storage required in original may not be known to the caller, huffman_uncompress dynamically allocates the necessary storage using malloc. It is the responsibility of the caller to free this storage using free when it is no longer needed.

Complexity

O (n), where n is the number of symbols in the original data.

Implementation and Analysis of Huffman Coding


With Huffman coding, we try to compress data by encoding symbols as Huffman codes generated in a Huffman tree. To uncompress the data, we rebuild the Huffman tree used in the compression process and convert each code back to the symbol it represents. In the implementation presented here, a symbol in the original data is one byte.

huffman_compress


The huffman_compress operation (see Example 14.3) compresses data using Huffman coding. It begins by scanning the data to determine the frequency of each symbol. The frequencies are placed in an array, freqs. After scanning the data, the frequencies are scaled so that each can be represented in a single byte. This is done by determining the maximum number of times any symbol occurs in the data and adjusting the other frequencies accordingly. Since symbols that do not occur in the data should be the only ones with frequencies of 0, we perform a simple test to ensure that any nonzero frequencies that scale to less than 1 end up being set to 1 instead of 0.

Once we have determined and scaled the frequencies, we call build_tree to build the Huffman tree. The build_tree function begins by inserting into a priority queue one binary tree for each symbol occurring at least once in the data. Nodes in the trees are HuffNode structures (see Example 14.1). This structure consists of two members: symbol is a symbol from the data (used only in leaf nodes), and freq is a frequency. Each tree initially contains only a single node, which stores one symbol and its scaled frequency as recorded and scaled in the freqs array.

To build the Huffman tree, we use a loop to perform size - 1 merges of the trees within the priority queue. On each iteration, we call pqueue_extract twice to extract the two binary trees whose root nodes have the smallest frequencies. We then sum the frequencies, merge the trees into a new one, store the sum of the frequencies in the new tree's root, and insert the new tree back into the priority queue. We continue this process until, after size - 1 iterations, the only tree remaining in the priority queue is the final Huffman tree.

Using the Huffman tree built in the previous step, we call build_table to build a table of the Huffman codes assigned to every symbol. Each entry in the table is a HuffCode structure. This structure consists of three members: used is a flag set to 1 or indicating whether the entry has a code stored in it, code is the Huffman code stored in the entry, and size is the number of bits the code contains. Each code is a short integer because it can be proven (although this is not shown here) that when all frequencies are scaled to fit within one byte, no code will be longer than 16 bits.

We build the table by traversing the Huffman tree using a preorder traversal (see Chapter 9). In each activation of build_table, code keeps track of the current Huffman code being generated, and size maintains the number of bits it contains. As we traverse the tree, each time we move

Return Main Page Previous Page Next Page

®Online Book Reader