BWT: Encoding
The transformation is made by a matrix, in which a block of the original data is arranged. The first row contains the unchanged original data. Each row following represent a rotation to the left of the preceding row:
Matrix for "abracadabra":
0 1 2 3 4 5 6 7 8 9 10
0 a b r a c a d a b r a a
/ / / / / / / / / / /
1 b r a c a d a b r a a
2 r a c a d a b r a a b
3 a c a d a b r a a b r
4 c a d a b r a a b r a
5 a d a b r a a b r a c
6 d a b r a a b r a c a
7 a b r a a b r a c a d
8 b r a a b r a c a d a
9 r a a b r a c a d a b
10 a a b r a c a d a b r
Afterwards this matrix will be sorted "lexicographically", i.e. in the order commonly used by encyclopedias. The most significant symbol is represented by the first column in this example:
0 1 2 3 4 5 6 7 8 9 10
0 a a b r a c a d a b r
1 a b r a a b r a c a d
2 a b r a c a d a b r a
3 a d a b r a a b r a c
4 a c a d a b r a a b r
5 b r a a b r a c a d a
6 b r a c a d a b r a a
7 d a b r a a b r a c a
8 c a d a b r a a b r a
9 r a a b r a c a d a b
10 r a c a d a b r a a b
The last column of the matrix and information about the row originally containing source data will be stored. In this example the row 2 represents the original string "abracadabra". The BurrowsWheelerTransformation result in the following data stream:
r d a c r a a a a b b 2
The BWT exists in several variants which insignificantly differ from each other but are identical from the principle. Here the algorithm is described following the original publication by Michael Burrows and David Wheeler.
< ^ >
