Data Compression


Criteria

Survey Formats

Basics

Compression Methods

Shannon-Fano

Huffman

Lempel-Ziv (LZ)

Arithmetic Coding

Run Length Encoding

Burrows-Wheeler (BWT)

Encoding

Transformed Data

Decoding

Characteristics

Applications

Implementations

Data Formats


Glossary

Index


Download


www.BinaryEssence.com

BWT: Decoding


The real innovation of the BWT is the reversible transformation. Only the symbols of the last column and the number of the row carrying the original data within the encoder matrix are necessary. Additionally, no sorting of the entire matrix is needed which would require massive computing performance. So the decoding is considerably faster than the encoding process.


Principle of the decoding


The coded data contain the last column (Last) of the encoder matrix, from which the first column (First) can be reconstructed. One takes advantage of the circumstance that the first column (F) contains the same symbols, however, in a sorted order.


   Encoder                 Decoder
   F                   L   L F
0  a a b r a c a d a b r   r a
1  a b r a a b r a c a d   d a
2  a b r a c a d a b r a   a a
3  a d a b r a a b r a c   c a
4  a c a d a b r a a b r   r a
5  b r a a b r a c a d a   a b
6  b r a c a d a b r a a   a b
7  d a b r a a b r a c a   a d
8  c a d a b r a a b r a   a c
9  r a a b r a c a d a b   b r
10 r a c a d a b r a a b   b r

The reconstructed columns (L) and (F) correspond to the first two columns of a matrix, which can be derived from the encoder matrix. For this purpose the encoder matrix must be rotated by a column to the right.


   Encoder                 Decoder
   F                   L   L F
0  a a b r a c a d a b r   r a a b r a c a d a b
1  a b r a a b r a c a d   d a b r a a b r a c a
2  a b r a c a d a b r a   a a b r a c a d a b r
3  a d a b r a a b r a c   c a d a b r a a b r a
4  a c a d a b r a a b r   r a c a d a b r a a b
5  b r a a b r a c a d a   a b r a a b r a c a d
6  b r a c a d a b r a a   a b r a c a d a b r a
7  d a b r a a b r a c a   a d a b r a a b r a c
8  c a d a b r a a b r a   a c a d a b r a a b r
9  r a a b r a c a d a b   b r a a b r a c a d a
10 r a c a d a b r a a b   b r a c a d a b r a a

The resulting "virtual" matrix contains the same contents as the encoder matrix, i.e. a direct relation exists between the two matrices. For real decoding both matrices are not necessary as demonstrated later. They shall only illustrate the approach.


Provided that the assignment of the two identical rows would be known, then the original data could be reconstructed. This is possible by the help of a transformation vector addressing the row of the encoder matrix belonging to the identical row of the virtual matrix. Two characteristics are utilised to find this assignment:


The first symbol of a row must be identical in both matrices, i.e. for each symbol in L an identical symbol will be available in F.


If there is more than one identical symbol in L, they have to be assigned in the order of their occurrence. This results from the sorting of both matrices. If the first symbol is identical, only the remaining columns have to be regarded, which are identical with the sorted encoder matrix.


These relations lead to the transformation vector T. The assignments for the symbol 'r' are emphasized by colour.


   Encoder                Decoder
   F                   L  L F                    T
0  a a b r a c a d a b r  r a a b r a c a d a b  9
1  a b r a a b r a c a d  d a b r a a b r a c a  7
2  a b r a c a d a b r a  a a b r a c a d a b r  0
3  a d a b r a a b r a c  c a d a b r a a b r a  8
4  a c a d a b r a a b r  r a c a d a b r a a b 10
5  b r a a b r a c a d a  a b r a a b r a c a d  1
6  b r a c a d a b r a a  a b r a c a d a b r a  2
7  d a b r a a b r a c a  a d a b r a a b r a c  3
8  c a d a b r a a b r a  a c a d a b r a a b r  4
9  r a a b r a c a d a b  b r a a b r a c a d a  5
10 r a c a d a b r a a b  b r a c a d a b r a a  6

Starting with the primary index, the transformation vector T indicates the previous symbol. The last symbol will be addressed by the primary index as stored together with the transformed data. In this example the index 2 addresses the symbol L(2) = 'a'. The corresponding element of the vector T(2) = 0 identifies the previous symbol L(T(2)) = L(0) = 'r', and so on.


Reconstruction with the transformation vector:


arranged according to the structure of L:

 Row   L   T           S
  0    r   9  step  2  ra
  1    d   7  step  5  dabra
  2    a   0  step  1  a
  3    c   8  step  7  cadabra
  4    r  10  step  9  racadabra
  5    a   1  step  4  abra
  6    a   2  step 11  abracadabra
  7    a   3  step  6  adabra
  8    a   4  step  8  acadabra
  9    b   5  step  3  bra
 10    b   6  step 10  bracadabra


arranged step-by-step:

 Row   L   T           S
  2    a   0  step  1  a
  0    r   9  step  2  ra
  9    b   5  step  3  bra
  5    a   1  step  4  abra
  1    d   7  step  5  dabra
  7    a   3  step  6  adabra
  3    c   8  step  7  cadabra
  8    a   4  step  8  acadabra
  4    r  10  step  9  racadabra
 10    b   6  step 10  bracadabra
  6    a   2  step 11  abracadabra

In order to reconstruct these relationships neither the original matrix nor the virtual matrix is necessary. Only the knowledge of L and F is sufficient.


Equivalent procedures may be used starting the reconstruction of the data in their original order instead of backward. That is possible for example by the help of modifications regarding the sort sequence, the primary index and the procedure for the transformation vector.


 <   ^   > 

Burrows-Wheeler-Transformation (BWT) BWT: Characteristics of Transformed Data Characteristics of BWT