Natural language detection library

Hi,

does anyone know a Ruby library/module to detect which natural language
is a given input text?

Cheers

Thomas

You can use the zlib library for this as does deplate:

from: Deplate 0.8.5 – convert wiki-like markup to latex, docbook, html, or “html-slides” -- Modules

The algorithm of this plugin is based on D Benedetto & E Caglioti
& V Loreto ?Language Trees and Zipping?[1]. It?s a direct port of
Dirk Holtwick?s ?Guess language of text using ZIP?[2].

[1] Format selector for cond-mat/0108530
[2] http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/355807

Ruby code:

Works quite well for me.

Thomas.

On May 7, 1:54 pm, micathom [email protected] wrote:

Ruby code:CVS Info for project deplate

Works quite well for me.

Thomas.

thx, I don’t have a clue how it works, but it’s great :wink:

thx, I don’t have a clue how it works, but it’s great :wink:

You need some base corpus/sample (I use the GPL License) for each
language and calculate an index number for this on initiatlization.
Then you compare a sample text’s index with the base index.

An example for how to use this can be found at:

First call #register(language_name, text) for each corpus, then call
#guess_with_diff(text) to guess a text’s language.

Sorry, I meant I have no idea how the algorithm is working. The code
itself works like a charm.
Actually I’m using the “Declaration of Human Rights” :wink:

Cheers,

Thomas