Principle of the Arithmetic Coding
The entire data set is represented by a single rational number, which is placed between 0 and 1. This range is divided into subintervals each representing a certain symbol. The number of subintervals is identical to the number of symbols in the current set of symbols and the size is proportional to their probability of appearance. For each symbol in the original data a new interval division takes place, on the basis of the last subinterval.
Example SubInterval:
Probability Initial Intervals
a 0.5 (50%) [0.0; 0.5)
b 0.3 (30%) [0.5; 0.8)
c 0.2 (20%) [0.8; 1.0)
Size Current Interval: 1.0
Start 'b' 'a'
a [0.0; 0.5) [0.50; 0.65) > [0.500; 0.575)
b [0.5; 0.8) > [0.65; 0.74) [0.575; 0.620)
c [0.8; 1.0) [0.74; 0.80) [0.620; 0.650)
Size Current Interval: 0.30 0.150
Starting with the initial subdivision the symbol 'b' is coded with a number, which is greater or equal than 0.5 and less than 0.8. If the symbol 'b' would stand alone, any number from this interval could be selected.
The encoding of the following symbol bases on the current interval [0.5; 0.8), which will be divided again into subintervals. If the second symbol is an 'a', then any number could be coded from the interval [0.50; 0.65). With any further symbol the last current interval have to be divided again and again, the number of significant digits of the code word increases continuously.
Even if the accuracy of the number is increased permanently, it is not necessary to process the entire one. With the help of suitable algorithms parts that are no longer relevant can be eliminated.
< ^ >
