Hi, does anyone know a Ruby library/module to detect which natural language is a given input text? Cheers Thomas
on 2007-05-07 13:18
on 2007-05-07 13:56
You can use the zlib library for this as does deplate: from: http://deplate.sourceforge.net/Modules.html#hd008001001 > The algorithm of this plugin is based on D Benedetto & E Caglioti > & V Loreto ?Language Trees and Zipping?. It?s a direct port of > Dirk Holtwick?s ?Guess language of text using ZIP?.  http://xxx.uni-augsburg.de/format/cond-mat/0108530  http://aspn.activestate.com/ASPN/Cookbook/Python/R... Ruby code: http://deplate.cvs.sourceforge.net/deplate/deplate... Works quite well for me. Thomas.
on 2007-05-08 16:50
On May 7, 1:54 pm, micathom <micat...@gmail.com> wrote: > > Ruby code:http://deplate.cvs.sourceforge.net/deplate/deplate...... > > Works quite well for me. > > Thomas. thx, I don't have a clue how it works, but it's great ;-)
on 2007-05-09 10:10
> thx, I don't have a clue how it works, but it's great ;-) You need some base corpus/sample (I use the GPL License) for each language and calculate an index number for this on initiatlization. Then you compare a sample text's index with the base index. An example for how to use this can be found at: http://deplate.cvs.sourceforge.net/deplate/deplate... First call #register(language_name, text) for each corpus, then call #guess_with_diff(text) to guess a text's language.
on 2007-05-09 11:36
Sorry, I meant I have no idea how the algorithm is working. The code itself works like a charm. Actually I'm using the "Declaration of Human Rights" ;-) Cheers, Thomas