Main Content

Huffman Coding

Huffman coding offers a way to compress data. The average length of a Huffman code depends on the statistical frequency with which the source produces each symbol from its alphabet. A Huffman code dictionary, which associates each data symbol with a codeword, has the property that no codeword in the dictionary is a prefix of any other codeword in the dictionary. The huffmandict, huffmanenco, and huffmandeco functions support Huffman coding and decoding.

Note

For long sequences from sources that have skewed distributions and small alphabets, arithmetic coding compresses better than Huffman coding. For more information, see Arithmetic Coding.

Create Huffman Code Dictionary Using MATLAB

Huffman coding requires statistical information about the source of the data being encoded. This example shows how to create a Huffman code dictionary using the huffmandict function and then shows the codeword vector associated with a particular value from the data source.

Create a vector of data symbols and assign a probability to each. Consider a data source that produces 1s with probability 0.1, 2s with probability 0.1, and 3s with probability 0.8.

symbols = [1 2 3];
prob = [0.1 0.1 0.8];

The main computational step in encoding data from this source using a Huffman code is to create a dictionary that associates each data symbol with a codeword. The second input argument to the huffmandict function lists the probability with which the source produces each symbol in its alphabet. Create a Huffman code dictionary. The most probable data symbol, 3, is associated with a one-digit codeword, while less probable data symbols are associated with two-digit codewords.

dict = huffmandict(symbols,prob)
dict=3×2 cell array
    {[1]}    {[1 1]}
    {[2]}    {[1 0]}
    {[3]}    {[  0]}

Display the second row of the dictionary. The output shows that a Huffman encoder receiving the data symbol 2 substitutes the sequence 1 0.

dict{2,:}
ans = 
2
ans = 1×2

     1     0

Create and Decode Huffman Code Using MATLAB

This example performs Huffman encoding and decoding using a source whose alphabet has three symbols. The huffmanenco and huffmandeco functions use the dictionary created by huffmandict.

Generate a data sequence to encode.

sig = repmat([3 3 1 3 3 3 3 3 2 3],1,50);

Define the set of data symbols, the probability associated with each element, and then create the Huffman code dictionary.

symbols = [1 2 3];
p = [0.1 0.1 0.8];
dict = huffmandict(symbols,p);

Encode and decode the data. Verify that the original data, sig, and the decoded data, dhsig, are identical.

hcode = huffmanenco(sig,dict);
dhsig = huffmandeco(hcode,dict);
isequal(sig,dhsig)
ans = logical
   1

See Also

Functions

Related Topics