BWT: Decoding
The real innovation of the BWT is the reversible transformation. Only the symbols of the last column and the number of the row carrying the original data within the encoder matrix are necessary. Additionally, no sorting of the entire matrix is needed which would require massive computing performance. So the decoding is considerably faster than the encoding process.
Principle of the decoding
The coded data contain the last column (Last) of the encoder matrix, from which the first column (First) can be reconstructed. One takes advantage of the circumstance that the first column (F) contains the same symbols, however, in a sorted order.
Encoder Decoder
F L L F
0 a a b r a c a d a b r r a
1 a b r a a b r a c a d d a
2 a b r a c a d a b r a a a
3 a d a b r a a b r a c c a
4 a c a d a b r a a b r r a
5 b r a a b r a c a d a a b
6 b r a c a d a b r a a a b
7 d a b r a a b r a c a a d
8 c a d a b r a a b r a a c
9 r a a b r a c a d a b b r
10 r a c a d a b r a a b b r
The reconstructed columns (L) and (F) correspond to the first two columns of a matrix, which can be derived from the encoder matrix. For this purpose the encoder matrix must be rotated by a column to the right.
Encoder Decoder
F L L F
0 a a b r a c a d a b r r a a b r a c a d a b
1 a b r a a b r a c a d d a b r a a b r a c a
2 a b r a c a d a b r a a a b r a c a d a b r
3 a d a b r a a b r a c c a d a b r a a b r a
4 a c a d a b r a a b r r a c a d a b r a a b
5 b r a a b r a c a d a a b r a a b r a c a d
6 b r a c a d a b r a a a b r a c a d a b r a
7 d a b r a a b r a c a a d a b r a a b r a c
8 c a d a b r a a b r a a c a d a b r a a b r
9 r a a b r a c a d a b b r a a b r a c a d a
10 r a c a d a b r a a b b r a c a d a b r a a
The resulting "virtual" matrix contains the same contents as the encoder matrix, i.e. a direct relation exists between the two matrices. For real decoding both matrices are not necessary as demonstrated later. They shall only illustrate the approach.
Provided that the assignment of the two identical rows would be known, then the original data could be reconstructed. This is possible by the help of a transformation vector addressing the row of the encoder matrix belonging to the identical row of the virtual matrix. Two characteristics are utilised to find this assignment:
The first symbol of a row must be identical in both matrices, i.e. for each symbol in L an identical symbol will be available in F.
If there is more than one identical symbol in L, they have to be assigned in the order of their occurrence. This results from the sorting of both matrices. If the first symbol is identical, only the remaining columns have to be regarded, which are identical with the sorted encoder matrix.
These relations lead to the transformation vector T. The assignments for the symbol 'r' are emphasized by colour.
Encoder Decoder
F L L F T
0 a a b r a c a d a b r r a a b r a c a d a b 9
1 a b r a a b r a c a d d a b r a a b r a c a 7
2 a b r a c a d a b r a a a b r a c a d a b r 0
3 a d a b r a a b r a c c a d a b r a a b r a 8
4 a c a d a b r a a b r r a c a d a b r a a b 10
5 b r a a b r a c a d a a b r a a b r a c a d 1
6 b r a c a d a b r a a a b r a c a d a b r a 2
7 d a b r a a b r a c a a d a b r a a b r a c 3
8 c a d a b r a a b r a a c a d a b r a a b r 4
9 r a a b r a c a d a b b r a a b r a c a d a 5
10 r a c a d a b r a a b b r a c a d a b r a a 6
Starting with the primary index, the transformation vector T indicates the previous symbol. The last symbol will be addressed by the primary index as stored together with the transformed data. In this example the index 2 addresses the symbol L(2) = 'a'. The corresponding element of the vector T(2) = 0 identifies the previous symbol L(T(2)) = L(0) = 'r', and so on.
Reconstruction with the transformation vector:
arranged according to the structure of L:
Row L T S
0 r 9 step 2 ra
1 d 7 step 5 dabra
2 a 0 step 1 a
3 c 8 step 7 cadabra
4 r 10 step 9 racadabra
5 a 1 step 4 abra
6 a 2 step 11 abracadabra
7 a 3 step 6 adabra
8 a 4 step 8 acadabra
9 b 5 step 3 bra
10 b 6 step 10 bracadabra
arranged stepbystep:
Row L T S
2 a 0 step 1 a
0 r 9 step 2 ra
9 b 5 step 3 bra
5 a 1 step 4 abra
1 d 7 step 5 dabra
7 a 3 step 6 adabra
3 c 8 step 7 cadabra
8 a 4 step 8 acadabra
4 r 10 step 9 racadabra
10 b 6 step 10 bracadabra
6 a 2 step 11 abracadabra
In order to reconstruct these relationships neither the original matrix nor the virtual matrix is necessary. Only the knowledge of L and F is sufficient.
Equivalent procedures may be used starting the reconstruction of the data in their original order instead of backward. That is possible for example by the help of modifications regarding the sort sequence, the primary index and the procedure for the transformation vector.
< ^ >
