Data Compression


Criteria

Survey Formats

Basics

Compression Methods

Shannon-Fano

Huffman

Lempel-Ziv (LZ)

Arithmetic Coding

Run Length Encoding

Burrows-Wheeler (BWT)

Encoding

Transformed Data

Decoding

Characteristics

Applications

Implementations

Data Formats


Glossary

Index


Download


www.BinaryEssence.com

BWT: Characteristics of Transformed Data


Data converted by BWT consist of identical symbols, but their order is determined by the characteristics of the original data. The following examples shall demonstrate this.


The transformed symbols represent the predecessors of the data in the first column because of the rotation of the original data.


   a b r a c a d a b r a

   a a b r a c a d a b r    ra
   a b r a a b r a c a d    da
   a b r a c a d a b r a    aa
   a d a b r a a b r a c    ca
   a c a d a b r a a b r    ra
   b r a a b r a c a d a    ab
   b r a c a d a b r a a    ab
   d a b r a a b r a c a    ad
   c a d a b r a a b r a    ac
   r a a b r a c a d a b    br
   r a c a d a b r a a b    br

Thus the context is represented, in which certain symbols appear. The exemplary data set starts with a collection of all symbols {'r', 'd', 'a', 'k', 'r'} preceding the symbol 'a'. This characteristic is emphasized particularly in the next example.


On the left side the original data consist of three identical, consecutive sequences ("abcd"). Therefore always a 'd' precedes an 'a', an 'a' always a 'b', etc. Unlike this the original data for the right example does not provide redundant structures.


a b c d a b c d a b c d   a b c d a a c c b d d b

a b c d a b c d a b c d   a a c c b d d b a b c d
a b c d a b c d a b c d   a b c d a a c c b d d b
a b c d a b c d a b c d   a c c b d d b a b c d a
b c d a b c d a b c d a   b a b c d a a c c b d d
b c d a b c d a b c d a   b c d a a c c b d d b a
b c d a b c d a b c d a   b d d b a b c d a a c c
c d a b c d a b c d a b   c b d d b a b c d a a c
c d a b c d a b c d a b   c c b d d b a b c d a a
c d a b c d a b c d a b   c d a a c c b d d b a b
d a b c d a b c d a b c   d a a c c b d d b a b c
d a b c d a b c d a b c   d b a b c d a a c c b d
d a b c d a b c d a b c   d d b a b c d a a c c b

After the Burrows-Wheeler-Transformation uniform symbol combinations lead to repetitions within transformed data. The arising redundancy may be exploited by simple statistic codes, which react in high measures to such changes in symbol distribution, e.g. the adaptive Huffman coding.


 <   ^   > 

Adaptive Huffman Code []

Burrows-Wheeler-Transformation (BWT) BWT: Encoding BWT: Decoding