Two repeatable crash bugs in Ferret proper

Hi guys! Been reading this list for a while.

I have two repeatable Ferret crash bugs, both seg faults.

  1. The first bug appears to seg fault Ferret when you use quotes in a
    search
    argument (eg ‘file_name:“file name”’)
  2. The second bug appears to seg fault Ferret when you attempt to index
    text
    with very long tokens (above 256 chars). It may have something to do
    with
    URL characters and the default analyzer, since other very long tokens
    parse
    successfully.

The code and my system specs are below.

I’ve sent the first one to David, but the second I haven’t. He
recommended
I talk to you guys.

They’re both relatively easy to work around. So don’t worry about me.
I’d
fix them in the C++ myself, but I’m not really geared up for that
environment. I figure someone here is better equipped to handle this.

Schnitz

— First bug: quotes in search terms

#!/usr/bin/ruby

require ‘rubygems’
require ‘ferret’

Strangely, the omit_norms is required to exercise the bug.

field_infos = Ferret::Index::FieldInfos.new(:index => :omit_norms)
field_infos.add_field( :phile_id )
field_infos.add_field( :file_name )

index = Ferret::Index::Index.new(
:field_infos => field_infos,
:path =>‘./exercisequotebugindex’,
:create => true )

index << { :file_name => “[new] Yo La Tengo - Beanbag Chair.mp3”,
:phile_id
=> “428570” }

Works

docs = index.search( “file_name:‘yo la’” )
puts index[docs.hits[0].doc][:phile_id]
docs = index.search( “file_name:"yo"” )
puts index[docs.hits[0].doc][:phile_id]

Does not work; will seg fault

docs = index.search( “file_name:"yo la"” )

This doesn’t either

docs = index.search( %Q!file_name:“yo la”! )

— Second bug: long tokens (?)

#!/usr/bin/ruby

require ‘rubygems’
require ‘ferret’

Strangely, the omit_norms is required to exercise the bug.

field_infos = Ferret::Index::FieldInfos.new()
field_infos.add_field( :comment_id )
field_infos.add_field( :comment_body )

index = Ferret::Index::Index.new(
:field_infos => field_infos,
:path =>‘./exercisequotebugindex’,
:create => true )

index << { :comment_id => 1, :comment_body => "weird URL, huh? [a
href="
http://www.hotelbogotaberlin.com/bogota_e/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html/index_e.html\“]”
}

— My system specs

Ubuntu 6.06.1 LTS (under VMWare)
ruby 1.8.4 (2005-12-24) [i486-linux]
ferret-0.10.12
gem_plugin- 0.2.1
rubygems-update-0.9.0
rails-1.1.6