Arithmetic Coding
Arithmetic coding offers a way to compress data and can be useful for data sources that
have a small alphabet. The length of an arithmetic code, instead of being fixed relative to
the number of symbols being encoded, depends on the statistical frequency with which the
source produces each symbol from its alphabet. For long sequences from sources that have
skewed distributions and small alphabets, arithmetic coding compresses better than Huffman
coding. The arithenco
and arithdeco
functions support arithmetic coding and decoding.
Represent Arithmetic Coding Parameters
Arithmetic coding requires statistical information about the source of the data being
encoded. In particular, the counts
input argument in the
arithenco
and arithdeco
functions lists the
frequency with which the source produces each symbol in its alphabet. You can determine the
frequencies by studying a set of test data from the source. The set of test data can have
any size you choose, as long as each symbol in the alphabet has a nonzero frequency.
For example, before encoding data from a source that produces 10 xs, 10 ys, and 80 zs in a typical 100-symbol set of test data, define
counts = [10 10 80];
Alternatively, if a larger set of test data from the source contains 22 xs, 23 ys, and 185 zs, then define
counts = [22 23 185];
Create and Decode Arithmetic Code Using MATLAB
In this example, you encode and decode a sequence from a source that has three symbols by using an arithmetic code.
Create a sequence vector containing symbols from the set of {1,2,3}.
seq = [3 3 1 3 3 3 3 3 2 3];
Set the counts
vector to define an encoder that produces 10 ones, 20 twos, and 70 threes from a typical 100-symbol set of test data.
counts = [10 20 70];
Apply the arithmetic encoder and decoder functions.
code = arithenco(seq,counts); dseq = arithdeco(code,counts,length(seq));
Verify that the decoder output matches the original input sequence.
isequal(seq,dseq)
ans = logical
1