Hi,
does anyone know a Ruby library/module to detect which natural language
is a given input text?
Cheers
Thomas
Hi,
does anyone know a Ruby library/module to detect which natural language
is a given input text?
Cheers
Thomas
You can use the zlib library for this as does deplate:
from: Deplate 0.8.5 – convert wiki-like markup to latex, docbook, html, or “html-slides” -- Modules
The algorithm of this plugin is based on D Benedetto & E Caglioti
& V Loreto ?Language Trees and Zipping?[1]. It?s a direct port of
Dirk Holtwick?s ?Guess language of text using ZIP?[2].
[1] Format selector for cond-mat/0108530
[2] http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/355807
Ruby code:
Works quite well for me.
Thomas.
On May 7, 1:54 pm, micathom [email protected] wrote:
Ruby code:CVS Info for project deplate…
Works quite well for me.
Thomas.
thx, I don’t have a clue how it works, but it’s great
thx, I don’t have a clue how it works, but it’s great
You need some base corpus/sample (I use the GPL License) for each
language and calculate an index number for this on initiatlization.
Then you compare a sample text’s index with the base index.
An example for how to use this can be found at:
First call #register(language_name, text) for each corpus, then call
#guess_with_diff(text) to guess a text’s language.
Sorry, I meant I have no idea how the algorithm is working. The code
itself works like a charm.
Actually I’m using the “Declaration of Human Rights”
Cheers,
Thomas
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs