[0.10.0] Index#add_document bug with strange value?

pyros · August 26, 2006, 2:42pm

Perhaps, I found where is my problem (during a big import).
Why this silly (really silly :)) example crash ?

http://pastie.caboo.se/10357

/usr/lib/ruby/site_ruby/1.8/ferret/index.rb:211:in `add_document’: IO
Error occured at <except.c>:79 in xraise (IOError)
Error occured in fs_store.c:225 - fso_flush_i
flushing src of length -2

    from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:211:in `<<'
    from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize'
    from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:186:in `<<'
    from test.rb:13
    from test.rb:8

pyros · August 29, 2006, 9:38am

On 8/26/06, Florent S. [email protected] wrote:

   from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:211:in `<<'
   from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize'
   from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:186:in `<<'
   from test.rb:13
   from test.rb:8

Hi Florent,

This is something that I still need to work on. The Locale sensitive
analyzers aren’t as robust as they could be. Try using the
AsciiStandardAnalyzer instead. Or better yet, don’t index binary data.
You can store binary data but indexing it doesn’t usually make a lot
of sense. At least not without a custom analyzer. Having said that, I
will try and fix this.

Cheers,
Dave

pyros · August 29, 2006, 9:59am

This is something that I still need to work on. The Locale sensitive
analyzers aren’t as robust as they could be. Try using the
AsciiStandardAnalyzer instead. Or better yet, don’t index binary data.
You can store binary data but indexing it doesn’t usually make a lot
of sense. At least not without a custom analyzer. Having said that, I
will try and fix this.

I’m totaly agree with you, it’s just an example. The real data is a file
with encodings bug and the result is the same.

Thanks for your answer.

pyros · September 2, 2006, 3:45am

On 8/29/06, David B. [email protected] wrote:

AsciiStandardAnalyzer instead. Or better yet, don’t index binary data.
You can store binary data but indexing it doesn’t usually make a lot
of sense. At least not without a custom analyzer. Having said that, I
will try and fix this.

Cheers,
Dave

Just an update on this issue. I’ve now made the StandardAnalyzer more
robust so it won’t crash as easily (hopefully not at all) with bad
data. In the process of fixing this I also added a fix so that the
StandardTokenizer will now tokenize negative numbers. ie it will parse
“-23” as “-23” instead of just “23”.

Cheers,
Dave

pyros · September 2, 2006, 2:35pm

Cool ! and as usual, great job Dave !