Unicode Coding

Question 1

What is the Unicode standard? (Definition)

Answer

Unicode is a computer coding system that aims to unify text exchanges at the international level. With Unicode, each computer character is described by a name and a code (codepoint), identifying it uniquely regardless of the computer medium or the software used. Unicode has already listed over 100000 characters.

Among the first Unicode characters are the 128 ASCII codes (including the Latin alphabet), then the international phonetic alphabet, then the local alphabets (Greek, Cyrillic, etc.), then symbols and many others.

A message encoded with Unicode is composed of numbers that are automatically translated to the screen in characters that can be displayed to the user (via UTF-8 or UTF16).

Question 2

How to crypt a text with a Unicode cipher?

Answer

Unicode encryption can be made by displaying the Unicode codes of each of the characters in the message.

Example: The message DCΦD€ (the DCODE word with a letter phi Φ and a symbol euro €)

Each character is actually encoded in the form:

Displayed Character	Unicode Code	Hexadecimal Unicode Code
D	68	44
C	67	43
Φ	934	03A6
D	68	44
€	8364	20AC

Unicode numeric identifiers, like ASCII, are regularly displayed in hexadecimal format for a more concise writing.

The full coding table is available on official site here or here (affiliate link)

Question 3

How to decrypt a text with a Unicode cipher?

Answer

In order make the translation of a Unicode message, reassociate each identifier code its Unicode character.

Example: The message 68,67,934,68,8364 is translated by each number: 68 => D, 67 => C, and so on, in order to obtain DCΦD€.

Question 4

How to recognize Unicode ciphertext?

Answer

The message is composed of numbers (decimal or hexadecimal format, more rarely binary). For text composed of letters from the Latin alphabet, the numbers between 64 and 122 (corresponding to the ASCII and Unicode codes of the letters A-Z and a-z) will be the most frequent.

Question 5

What is UTF-8? (Definition)

Answer

UTF-8 is a 100% Unicode compatible coding system, which has the advantage of being backward compatible with ASCII. UTF8 is used on more than 90% of websites.

UTF16 a 16-bit encoding variant, used by Windows.

UTF32 is another variant, still little used.

Question 6

Where can I find the complete Unicode classification?

Answer

The official website of the Unicode Consortium is here

Unicode Coding

Unicode Character Information Finder

Codepoints converter

Unicode Encoder

Answers to Questions (FAQ)

What is the Unicode standard? (Definition)

How to crypt a text with a Unicode cipher?

How to decrypt a text with a Unicode cipher?

How to recognize Unicode ciphertext?

What is UTF-8? (Definition)

Where can I find the complete Unicode classification?

Source code

Cite dCode

Need Help ?

Questions / Comments