Data Compression


Criteria

Survey Formats

Basics

Compression Methods

Data Formats


Arithmetic Coding (AC)

Principle of the AC

Dividing into Intervals

Assignment to Codes

Sub-Intervals

Codes for 2 Symbols

Scheme Intervals "aaaa"

Scheme Intervals "abcd"

General Algorithm

AC versus Huffman

Data with high Redundancy

Adaptive AC

Implementations


Glossary

Index


Download


www.BinaryEssence.com

Principle of the Arithmetic Coding


The entire data set is represented by a single rational number, which is placed between 0 and 1. This range is divided into sub-intervals each representing a certain symbol. The number of sub-intervals is identical to the number of symbols in the current set of symbols and the size is proportional to their probability of appearance. For each symbol in the original data a new interval division takes place, on the basis of the last sub-interval.


Example Sub-Interval:


   Probability  Initial Intervals
a  0.5   (50%)   [0.0; 0.5)
b  0.3   (30%)   [0.5; 0.8)
c  0.2   (20%)   [0.8; 1.0)
Size Current Interval: 1.0

  Start      'b'              'a'
a [0.0; 0.5)     [0.50; 0.65) --> [0.500; 0.575)
b [0.5; 0.8) --> [0.65; 0.74)     [0.575; 0.620)
c [0.8; 1.0)     [0.74; 0.80)     [0.620; 0.650)
Size Current Interval:  0.30              0.150

Starting with the initial sub-division the symbol 'b' is coded with a number, which is greater or equal than 0.5 and less than 0.8. If the symbol 'b' would stand alone, any number from this interval could be selected.


The encoding of the following symbol bases on the current interval [0.5; 0.8), which will be divided again into sub-intervals. If the second symbol is an 'a', then any number could be coded from the interval [0.50; 0.65). With any further symbol the last current interval have to be divided again and again, the number of significant digits of the code word increases continuously.


Even if the accuracy of the number is increased permanently, it is not necessary to process the entire one. With the help of suitable algorithms parts that are no longer relevant can be eliminated.


 <   ^   > 

Arithmetic Coding Arithmetic Coding Dividing into Intervals