Tool to detect the language of a text document (Lang Recognition). There are hundreds of languages, this tool will detect the language of a text.
Language Recognition - dCode
Tag(s) : Data Processing
dCode is free and its tools are a valuable help in games, maths, geocaching, puzzles and problems to solve every day!
A suggestion ? a feedback ? a bug ? an idea ? Write to dCode!
A language is a system of signs (characters) that enables communication between people. Language recognition (dialect or lang detection) is a process that aims to determine the language in which a text is written. It uses algorithms and statistical models to analyze the linguistic characteristics of the text and assign a specific language to it.
Language recognition is mainly based on the analysis of the frequencies of words, characters and n-grams (combinations of n consecutive characters) present in the text. This data is then compared to the characteristics of the different languages to determine the most probable.
— A quick method is to detect/identify/recognize common words in a language.
Example: In English a, to, of, etc., in French, de, la, le, un, et
— Count rare characters (or missing ones) to discriminate some foreign languages
Example: In English, the letter j is rare, in French, the letter w is rare and limited to words imported from other languages
— Look for language specific characters/signs as diacritics
Example: In Spanish, the ñ (tilde n) appears often, in French, accent letters such as â,ç,è,é,ê,î,ô,û,ù are used, etc.
dCode is limited to a hundred languages worldwide (Only contemporary and non-fictional languages).
dCode offers word counting and / or character frequency analysis tools.
The languages with the most speakers in the world are
English |
Mandarin Chinese |
Spanish |
Arabic |
Hindi/Hindustani |
French |
Language recognition processes written texts to determine the language used, while language in speech detection is concerned with identifying spoken language in real-time audio or speech. It's a much different process that requires voice recognition technologies. dCode does not offer this feature.
dCode retains ownership of the "Language Recognition" source code. Except explicit open source licence (indicated Creative Commons / free), the "Language Recognition" algorithm, the applet or snippet (converter, solver, encryption / decryption, encoding / decoding, ciphering / deciphering, breaker, translator), or the "Language Recognition" functions (calculate, convert, solve, decrypt / encrypt, decipher / cipher, decode / encode, translate) written in any informatic language (Python, Java, PHP, C#, Javascript, Matlab, etc.) and all data download, script, or API access for "Language Recognition" are not public, same for offline use on PC, mobile, tablet, iPhone or Android app!
Reminder : dCode is free to use.
The copy-paste of the page "Language Recognition" or any of its results, is allowed (even for commercial purposes) as long as you credit dCode!
Exporting results as a .csv or .txt file is free by clicking on the export icon
Cite as source (bibliography):
Language Recognition on dCode.fr [online website], retrieved on 2024-11-21,