PHP Classes

PHP Text Language Detection Library: Detect the language of a given text string

Recommend this page to a friend!
  Info   View files Documentation   View files View files (88)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 288 This week: 2All time: 7,515 This week: 96Up
Version License PHP version Categories
language-detection 1.3Custom (specified...7.1Localization, Algorithms, Text proces..., A..., P...
Collaborate with this project 

Author

language_detection - github.com

Description

This package can detect the language of a given text string.

It can parse given training text in many different idioms into a sequence of n-gram items and builds a database file in JSON format to be used in the detection phase.

The package can then take a given text and detect its language using the database previously generated in the training phase.

The package comes with text samples used for training and detecting text in 73 languages.

Picture of AccountKiller
  Performance   Level  
Name: AccountKiller <contact>
Classes: 1 package by
Country: ???
Age: ???
All time rank: 3855
Week rank: 106 Up

Documentation

language_detection

Build Status Version Total Downloads Maintenance License

Detect the language from a given text. To do that it generates a language profile based on N-grams for every file in etc directory. Then it generate such language profile for the unknown text and compare the previosly language profiles against the unknown.

Requirements:

Only requirement is a PHP version greater than or equal to 7.1. > Note: language_detection requires the Multibyte String extension in order to work.

Install via Composer

composer require patrick-schur/language-detection

Or add the following to composer.json

{
  "require": {
     "patrick-schur/language-detection": "*"
  }
}

Basic Usage

Before we can recognize the language from a given text, we have to generate a language profile for each language. From the beginning it comes with a pre-trained language profile (etc/_langs.json).<br> Also you can add new files to etc or change existing ones.

First we have to generate a language profile.

require_once 'vendor/autoload.php';
 
use LanguageDetector\Trainer;
 
$t = new Trainer;
 
$t->learn();

If we have our language profile, we can classify texts by their language. To detect the language correctly, the length of the input text should be at least some sentences.

require_once 'vendor/autoload.php';
 
use LanguageDetector\LanguageDetector;
 
$ld = new LanguageDetector;
 
var_dump($ld->detect('Das ist ein deutscher Satz.')); // de

Supported languages:

It supports up to now 73 languages. If your language not supported, feel free to add your own language files.

  • ab (abkhaz)
  • af (afrikaans)
  • am (amharic)
  • ar (arabic)
  • az (azerbaijani)
  • be (belarusian)
  • bg (bulgarian)
  • bn (bengali)
  • co (corsican)
  • cs (czech)
  • cy (welsh)
  • de (german)
  • dk (danish)
  • el (greek)
  • en (english)
  • eo (esperanto)
  • es (spanish)
  • et (estonian)
  • eu (basque)
  • fa (persian)
  • fi (finnish)
  • fj (fijian)
  • fo (faroese)
  • fr (french)
  • ga (irish)
  • gd (scottish)
  • gl (galician)
  • gn (guarani)
  • ha (hausa)
  • he (hebrew)
  • hi (hindi)
  • hr (croatian)
  • hu (hungarian)
  • hy (armenian)
  • ia (interlingua)
  • ig (igbo)
  • io (ido)
  • is (icelandic)
  • it (italian)
  • iu (inuktitut)
  • jp (japanese)
  • jv (javanese)
  • ka (georgian)
  • ko (korean)
  • ku (kurdish)
  • la (latin)
  • lg (ganda)
  • lo (lao)
  • lt (lithuanian)
  • lv (latvian)
  • mh (marshallese)
  • mn (mongolian)
  • ms (malay)
  • mt (maltese)
  • nl (dutch)
  • no (norwegian)
  • nv (navajo)
  • pl (polish)
  • pt (portuguese)
  • ro (romanian)
  • ru (russian)
  • sk (slovak)
  • sl (slovene)
  • so (somali)
  • sv (swedish)
  • th (thai)
  • tr (turkish)
  • ty (tahitian)
  • ug (uyghur)
  • uk (ukrainian)
  • uz (uzbek)
  • vi (vietnamese)
  • zh (chinese)

  Files folder image Files  
File Role Description
Files folder imageetc (74 files)
Files folder imagesrc (1 directory)
Files folder imagetests (3 files)
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE.md Lic. License text
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  etc  
File Role Description
  Accessible without login Plain text file ab.txt Doc. Documentation
  Accessible without login Plain text file af.txt Doc. Documentation
  Accessible without login Plain text file am.txt Doc. Documentation
  Accessible without login Plain text file ar.txt Doc. Documentation
  Accessible without login Plain text file az.txt Doc. Documentation
  Accessible without login Plain text file be.txt Doc. Documentation
  Accessible without login Plain text file bg.txt Doc. Documentation
  Accessible without login Plain text file bn.txt Doc. Documentation
  Accessible without login Plain text file co.txt Doc. Documentation
  Accessible without login Plain text file cs.txt Doc. Documentation
  Accessible without login Plain text file cy.txt Doc. Documentation
  Accessible without login Plain text file de.txt Doc. Documentation
  Accessible without login Plain text file dk.txt Doc. Documentation
  Accessible without login Plain text file el.txt Doc. Documentation
  Accessible without login Plain text file en.txt Doc. Documentation
  Accessible without login Plain text file eo.txt Doc. Documentation
  Accessible without login Plain text file es.txt Doc. Documentation
  Accessible without login Plain text file et.txt Doc. Documentation
  Accessible without login Plain text file eu.txt Doc. Documentation
  Accessible without login Plain text file fa.txt Doc. Documentation
  Accessible without login Plain text file fi.txt Doc. Documentation
  Accessible without login Plain text file fj.txt Doc. Documentation
  Accessible without login Plain text file fo.txt Doc. Documentation
  Accessible without login Plain text file fr.txt Doc. Documentation
  Accessible without login Plain text file ga.txt Doc. Documentation
  Accessible without login Plain text file gd.txt Doc. Documentation
  Accessible without login Plain text file gl.txt Doc. Documentation
  Accessible without login Plain text file gn.txt Doc. Documentation
  Accessible without login Plain text file ha.txt Doc. Documentation
  Accessible without login Plain text file he.txt Doc. Documentation
  Accessible without login Plain text file hi.txt Doc. Documentation
  Accessible without login Plain text file hr.txt Doc. Documentation
  Accessible without login Plain text file hu.txt Doc. Documentation
  Accessible without login Plain text file hy.txt Doc. Documentation
  Accessible without login Plain text file ia.txt Doc. Documentation
  Accessible without login Plain text file ig.txt Doc. Documentation
  Accessible without login Plain text file io.txt Doc. Documentation
  Accessible without login Plain text file is.txt Doc. Documentation
  Accessible without login Plain text file it.txt Doc. Documentation
  Accessible without login Plain text file iu.txt Doc. Documentation
  Accessible without login Plain text file jp.txt Doc. Documentation
  Accessible without login Plain text file jv.txt Doc. Documentation
  Accessible without login Plain text file ka.txt Doc. Documentation
  Accessible without login Plain text file ko.txt Doc. Documentation
  Accessible without login Plain text file ku.txt Doc. Documentation
  Accessible without login Plain text file la.txt Doc. Documentation
  Accessible without login Plain text file lg.txt Doc. Documentation
  Accessible without login Plain text file lo.txt Doc. Documentation
  Accessible without login Plain text file lt.txt Doc. Documentation
  Accessible without login Plain text file lv.txt Doc. Documentation
  Accessible without login Plain text file mh.txt Doc. Documentation
  Accessible without login Plain text file mn.txt Doc. Documentation
  Accessible without login Plain text file ms.txt Doc. Documentation
  Accessible without login Plain text file mt.txt Doc. Documentation
  Accessible without login Plain text file nl.txt Doc. Documentation
  Accessible without login Plain text file no.txt Doc. Documentation
  Accessible without login Plain text file nv.txt Doc. Documentation
  Accessible without login Plain text file pl.txt Doc. Documentation
  Accessible without login Plain text file pt.txt Doc. Documentation
  Accessible without login Plain text file ro.txt Doc. Documentation
  Accessible without login Plain text file ru.txt Doc. Documentation
  Accessible without login Plain text file sk.txt Doc. Documentation
  Accessible without login Plain text file sl.txt Doc. Documentation
  Accessible without login Plain text file so.txt Doc. Documentation
  Accessible without login Plain text file sv.txt Doc. Documentation
  Accessible without login Plain text file th.txt Doc. Documentation
  Accessible without login Plain text file tr.txt Doc. Documentation
  Accessible without login Plain text file ty.txt Doc. Documentation
  Accessible without login Plain text file ug.txt Doc. Documentation
  Accessible without login Plain text file uk.txt Doc. Documentation
  Accessible without login Plain text file uz.txt Doc. Documentation
  Accessible without login Plain text file vi.txt Doc. Documentation
  Accessible without login Plain text file zh.txt Doc. Documentation
  Accessible without login Plain text file _langs.json Data Auxiliary data

  Files folder image Files  /  src  
File Role Description
Files folder imageLanguageDetector (3 files, 1 directory)

  Files folder image Files  /  src  /  LanguageDetector  
File Role Description
Files folder imageTokenizer (3 files)
  Plain text file LanguageDetector.php Class Class source
  Plain text file NGramParser.php Class Class source
  Plain text file Trainer.php Class Class source

  Files folder image Files  /  src  /  LanguageDetector  /  Tokenizer  
File Role Description
  Plain text file Tokenizer.php Class Class source
  Plain text file TokenizerInterface.php Class Class source
  Plain text file WordTokenizer.php Class Class source

  Files folder image Files  /  tests  
File Role Description
  Plain text file LanguageDetectorTest.php Class Class source
  Plain text file NGramParserTest.php Class Class source
  Plain text file TrainerTest.php Class Class source

 Version Control Unique User Downloads Download Rankings  
 100%
Total:288
This week:2
All time:7,515
This week:96Up