Natural language detection library

Hi,

does anyone know a Ruby library/module to detect which natural language
is a given input text?

Cheers

Thomas

You can use the zlib library for this as does deplate:

from: http://deplate.sourceforge.net/Modules.html#hd008001001

The algorithm of this plugin is based on D Benedetto & E Caglioti
& V Loreto ?Language Trees and Zipping?[1]. It?s a direct port of
Dirk Holtwick?s ?Guess language of text using ZIP?[2].

[1] http://xxx.uni-augsburg.de/format/cond-mat/0108530
[2] http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/355807

Ruby code:
http://deplate.cvs.sourceforge.net/deplate/deplate/lib/deplate/guesslanguage.rb?view=markup

Works quite well for me.

Thomas.

On May 7, 1:54 pm, micathom [email protected] wrote:

Ruby code:http://deplate.cvs.sourceforge.net/deplate/deplate/lib/deplate/guessl

Works quite well for me.

Thomas.

thx, I don’t have a clue how it works, but it’s great :wink:

thx, I don’t have a clue how it works, but it’s great :wink:

You need some base corpus/sample (I use the GPL License) for each
language and calculate an index number for this on initiatlization.
Then you compare a sample text’s index with the base index.

An example for how to use this can be found at:
http://deplate.cvs.sourceforge.net/deplate/deplate/lib/deplate/mod/guesslanguage.rb?view=markup

First call #register(language_name, text) for each corpus, then call
#guess_with_diff(text) to guess a text’s language.

Sorry, I meant I have no idea how the algorithm is working. The code
itself works like a charm.
Actually I’m using the “Declaration of Human Rights” :wink:

Cheers,

Thomas

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs