Huffman Coding
Huffman coding offers a way to compress data. The average length of a Huffman code depends
on the statistical frequency with which the source produces each symbol from its alphabet. A
Huffman code dictionary, which associates each data symbol with a codeword, has the property
that no codeword in the dictionary is a prefix of any other codeword in the dictionary. The
huffmandict
, huffmanenco
, and huffmandeco
functions support Huffman coding and decoding.
Note
For long sequences from sources that have skewed distributions and small alphabets, arithmetic coding compresses better than Huffman coding. For more information, see Arithmetic Coding.
Create Huffman Code Dictionary Using MATLAB
Huffman coding requires statistical information about the source of the data being encoded. This example shows how to create a Huffman code dictionary using the huffmandict
function and then shows the codeword vector associated with a particular value from the data source.
Create a vector of data symbols and assign a probability to each. Consider a data source that produces 1s with probability 0.1, 2s with probability 0.1, and 3s with probability 0.8.
symbols = [1 2 3]; prob = [0.1 0.1 0.8];
The main computational step in encoding data from this source using a Huffman code is to create a dictionary that associates each data symbol with a codeword. The second input argument to the huffmandict
function lists the probability with which the source produces each symbol in its alphabet. Create a Huffman code dictionary. The most probable data symbol, 3, is associated with a one-digit codeword, while less probable data symbols are associated with two-digit codewords.
dict = huffmandict(symbols,prob)
dict=3×2 cell array
{[1]} {[1 1]}
{[2]} {[1 0]}
{[3]} {[ 0]}
Display the second row of the dictionary. The output shows that a Huffman encoder receiving the data symbol 2
substitutes the sequence 1 0
.
dict{2,:}
ans = 2
ans = 1×2
1 0
Create and Decode Huffman Code Using MATLAB
This example performs Huffman encoding and decoding using a source whose alphabet has three symbols. The huffmanenco
and huffmandeco
functions use the dictionary created by huffmandict
.
Generate a data sequence to encode.
sig = repmat([3 3 1 3 3 3 3 3 2 3],1,50);
Define the set of data symbols, the probability associated with each element, and then create the Huffman code dictionary.
symbols = [1 2 3]; p = [0.1 0.1 0.8]; dict = huffmandict(symbols,p);
Encode and decode the data. Verify that the original data, sig
, and the decoded data, dhsig
, are identical.
hcode = huffmanenco(sig,dict); dhsig = huffmandeco(hcode,dict); isequal(sig,dhsig)
ans = logical
1