Tools that apply Burrows-Wheeler algorithm. Burrows-Wheeler transform (BWT) is an algorithm maximizing repeated letters in a text, which is useful in data compression.
Burrows–Wheeler Transform - dCode
Tag(s) : Compression
dCode is free and its tools are a valuable help in games, maths, geocaching, puzzles and problems to solve every day!
A suggestion ? a feedback ? a bug ? an idea ? Write to dCode!
The Burrows-Wheeler transform (BWT) is a technique for rearranging/reordering characters in a text message. Mainly used in data compression, BWT tends to bring identical characters closer together, this property is used as a pre-processing that then allows for further compression (e.g. by RLE coding).
Step 1: list all possible rotations of the message (of the character string).
Example:
DECODE |
EDECOD |
DEDECO |
ODEDEC |
CODEDE |
ECODED |
Step 2: sort this list in alphabetic/lexicographic order.
Example:
1 | CODEDE |
---|---|
2 | DECODE |
3 | DEDECO |
4 | ECODED |
5 | EDECOD |
6 | ODEDEC |
Step 3: Extract the last characters of each line/rotation. The encrypted message consists of these letters/characters.
Example: The encoded message is EEODDC
In the original version of the article describing BWT, a numeric key is associated with this message. This key value is the rank of the original message once the list is sorted.
Example: The key is 2 (DECODE, the original text, is on the row 2 if the table).
In practice, it is common for the string to end with a special character like null (00) or ETX (End of Text) or EOF (End of File). This additional character/byte allows you to do without a key because it indicates the end of the message. It is often represented by the $ character.
dCode only accepts ASCII characters, and the $ character is not an EOF marker by default, it will be sorted like the dollar sign (ASCII code 36).
Decryption/Burrows Wheeler Inverse Transformation requires to know the key and the ciphered message (with N characters).
Example: The ciphertext EODC (4 characters) and the key 1
Step A: initialize an empty array with N rows and N columns.
Step B: write the encrypted message in the last empty column of the table
Step C: sort the rows of the table in alphabetical order
Repeat steps B and C as many times as there are letters in the message.
Example: State of the table after each step:
A | B₁ | C₁ | B₂ | C₂ | B₃ | C₃ | B₄ | C₄ | ||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
Once the algorithm is completed, the plaintext is at the row number key of the table.
Example: At the row 1, after the last step of the algorithm, is the plain message: CODE
If the text was encoded with a special character at the end (like null or EOF), then the key is not needed, because the original message (among all the rotations) is the one with this special character at the end.
The key is actually unimportant for intelligible text because when decrypting all the lines of the final table are actually rotations of the original text.
If a special character, such as null or EOF, has been added to the end of the text before encoding, it is not necessary to use a key to decode. The original message can be identified directly: it is the one, among all possible rotations, that ends with this special character.
dCode offers to calculate the most probable key automatically when the text is in English.
The BWT key is calculated automatically and cannot be chosen.
The encoded message tends to have identical sequences of letters that are repeated, which facilitates their compression (via algorithms like Run Length Encoding - RLE).
The ciphered message has a high number of repeated letters and a classic index of coincidence.
The message is sometimes overencrypted with a RLE encoding.
BWT can be used without a key, but in this case, a unique character of the original text and its position are needed, such as EOF character or null placed in last position.
Several implementations are possible but the best ones are in O(n) for the duration and O(n log(σ) (or even better) for the memory. With n the input size and σ the size of the alphabet.
Burrows-Wheeler Transform was invented in 1994 by Michael Burrows and David Wheeler
dCode retains ownership of the "Burrows–Wheeler Transform" source code. Except explicit open source licence (indicated Creative Commons / free), the "Burrows–Wheeler Transform" algorithm, the applet or snippet (converter, solver, encryption / decryption, encoding / decoding, ciphering / deciphering, breaker, translator), or the "Burrows–Wheeler Transform" functions (calculate, convert, solve, decrypt / encrypt, decipher / cipher, decode / encode, translate) written in any informatic language (Python, Java, PHP, C#, Javascript, Matlab, etc.) and all data download, script, or API access for "Burrows–Wheeler Transform" are not public, same for offline use on PC, mobile, tablet, iPhone or Android app!
Reminder : dCode is free to use.
The copy-paste of the page "Burrows–Wheeler Transform" or any of its results, is allowed (even for commercial purposes) as long as you credit dCode!
Exporting results as a .csv or .txt file is free by clicking on the export icon
Cite as source (bibliography):
Burrows–Wheeler Transform on dCode.fr [online website], retrieved on 2025-01-21,