Data Compression


Criteria

Survey Formats

Basics

Compression Methods

Shannon-Fano

Huffman

Lempel-Ziv (LZ)

Arithmetic Coding

Run Length Encoding

Burrows-Wheeler (BWT)

Encoding

Transformed Data

Decoding

Characteristics

Applications

Implementations

Data Formats


Glossary

Index


Download


www.BinaryEssence.com

Burrows-Wheeler-Transformation (BWT)


The BWT is named after Michael Burrows and David J. Wheeler, who had published this procedure in their article "A Block-sorting Lossless Data Compression Algorithm" in 1994. The fundamental algorithm had been developed by David J. Wheeler in 1983.


Principle of the algorithm


A block of original data will be converted into a certain form and will be sorted afterwards. The result is a sequence of ordered data that reflect frequently appearing symbol combinations by repetitions.


Example: transformed text data


PeterキPiperキpickedキaキpeckキofキpickledキpeppers.
キAキpeckキofキpickledキpeppersキPeterキPiperキpicked.

                v
  覧覧覧覧覧覧覧覧覧覧覧覧覧覧覧
  Burrows-Wheeler-Transformation
  覧覧覧覧覧覧覧覧覧覧覧覧覧覧覧
                v

.dkkAaddsrrffrrsdキキeeiiiieeeeppkllkppppttppPP
ooppppPPcccccckkキキキキキキiipp.キキキキキキキeeeeeeeerree

Both data blocks consist of identical symbols, only varying in their order. The transformed data may be encoded substantially better, e.g. by adaptive Huffman or arithmetic coding.


The total amount of symbols slightly increases, because additional information are required allowing the reconstruction of the original data.


In real application the number of symbols in a block is however very large and the additional expenditure very small. A typical block size amounts to 900 KByte.


 <   ^   > 

Huffman Code []

Adaptive Huffman Code []

Arithmetic Coding []

Survey Compression Methods Run Length Encoding (RLE) BWT: Encoding