Burrows–Wheeler Transform

Question 1

What is Burrows–Wheeler Transform? (Definition)

Answer

The Burrows-Wheeler transform (BWT) is a technique for rearranging/reordering characters in a text message. Mainly used in data compression, BWT tends to bring identical characters closer together, this property is used as a pre-processing that then allows for further compression (e.g. by RLE coding).

Question 2

How to encrypt using BWT cipher?

Answer

Step 1: list all possible rotations of the message (of the character string).

Example:

DECODE

EDECOD

DEDECO

ODEDEC

CODEDE

ECODED

Step 2: sort this list in alphabetic/lexicographic order.

Example:

1	CODEDE
2	DECODE
3	DEDECO
4	ECODED
5	EDECOD
6	ODEDEC

Step 3: Extract the last characters of each line/rotation. The encrypted message consists of these letters/characters.

Example: The encoded message is EEODDC

In the original version of the article describing BWT, a numeric key is associated with this message. This key value is the rank of the original message once the list is sorted.

Example: The key is 2 (DECODE, the original text, is on the row 2 if the table).

In practice, it is common for the string to end with a special character like null (00) or ETX (End of Text) or EOF (End of File). This additional character/byte allows you to do without a key because it indicates the end of the message. It is often represented by the $ character.

dCode only accepts ASCII characters, and the $ character is not an EOF marker by default, it will be sorted like the dollar sign (ASCII code 36).

Question 3

How to decrypt BWT cipher?

Answer

Decryption/Burrows Wheeler Inverse Transformation requires to know the key and the ciphered message (with N characters).

Example: The ciphertext EODC (4 characters) and the key 1

Step A: initialize an empty array with N rows and N columns.

Step B: write the encrypted message in the last empty column of the table

Step C: sort the rows of the table in alphabetical order

Repeat steps B and C as many times as there are letters in the message.

Example: State of the table after each step:

A

B₁

C₁

B₂

C₂

B₃

C₃

B₄

C₄

----

---E

---O

---D

---C

---D

---E

---O

--EC

--OD

--DE

--CO

--DE

--EC

--OD

-ECO

-ODE

-DEC

-COD

-DEC

-ECO

-ODE

ECOD

ODEC

DECO

CODE

DECO

ECOD

ODEC

Once the algorithm is completed, the plaintext is at the row number key of the table.

Example: At the row 1, after the last step of the algorithm, is the plain message: CODE

If the text was encoded with a special character at the end (like null or EOF), then the key is not needed, because the original message (among all the rotations) is the one with this special character at the end.

Question 4

How to decipher BWT without a key?

Answer

The key is actually unimportant for intelligible text because when decrypting all the lines of the final table are actually rotations of the original text.

If a special character, such as null or EOF, has been added to the end of the text before encoding, it is not necessary to use a key to decode. The original message can be identified directly: it is the one, among all possible rotations, that ends with this special character.

dCode offers to calculate the most probable key automatically when the text is in English.

Question 5

How to choose the compression key?

Answer

The BWT key is calculated automatically and cannot be chosen.

Question 6

Why BWT is used in data-compression?

Answer

The encoded message tends to have identical sequences of letters that are repeated, which facilitates their compression (via algorithms like Run Length Encoding - RLE).

Question 7

How to recognize BWT ciphertext?

Answer

The ciphered message has a high number of repeated letters and a classic index of coincidence.

The message is sometimes overencrypted with a RLE encoding.

Question 8

What are the variants of the BWT cipher?

Answer

BWT can be used without a key, but in this case, a unique character of the original text and its position are needed, such as EOF character or null placed in last position.

Question 9

What is the complexity of the BWT algorithm?

Answer

Several implementations are possible but the best ones are in O(n) for the duration and O(n log(σ) (or even better) for the memory. With n the input size and σ the size of the alphabet.

Question 10

When was BWT invented?

Answer

Burrows-Wheeler Transform was invented in 1994 by Michael Burrows and David Wheeler

Burrows–Wheeler Transform

BWT Decompress

BWT Compress

Answers to Questions (FAQ)