[Ferret] Serious memory leak on Joyent / TextDrive / Solaris

There is serious memory leak bug in ferret. I’m having this error on
TextDrive Container (aka. Joyent Accelerators) OpenSolaris with Ferret
0.11.4

It happens while searching for some terms with accented or special
characters.
This makes ferret to allocate lots of memory (usually reaching 3+ GB)
and failing if another query like this is executed.

Any ideas on that, could this be locale or any other system settings?

Any suggestions on how to debug that?

Look the console session bellow where the error is reproduced even in a
very simple index:

require ‘ferret’
include Ferret
index = Index::Index.new()
index << “My document”
index.search_each(“bonô”) { |id,score| puts score }
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:749:in
parse': failed to allocate memory (NoMemoryError) from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:749:indo_process_query’
from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:382:in
search_each' from /opt/csw/lib/ruby/1.8/monitor.rb:229:insynchronize’
from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:380:in
search_each' from (irb):10:inirb_binding’
from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding’
from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52
[92140-AA:~/web/labs/blogblogs/trunk] pocscom$

On 4/14/07, Manoel L. [email protected] wrote:

/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:749:in
search_each' from (irb):10:in irb_binding’
from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding’
from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52
[92140-AA:~/web/labs/blogblogs/trunk] pocscom$

Hi Manoel,

I really think this has something to do with OpenSolaris. Just to
narrow the problem down further, could you try this;

require 'rubygems'
require 'ferret'

tokenizer =

Ferret::Analysis::StandardAnalyzer.new().token_stream(:field,
“bon\303\264”)
while token = tokenizer.next
puts token
end

I suspect this will cause the same problem. If it does, I’ll try
writing a simple C program to test your locale library.

Cheers,
Dave

David,

I did that and it entered on an infinite loop, see:

?> require ‘ferret’
=> false
n\303\264")r =
Ferret::Analysis::StandardAnalyzer.new().token_stream(:field,
=> #Ferret::Analysis::TokenStream:0x98967d0

while token = tokenizer.next
puts token
end
token[“bon”:0:3:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]


[]s

On 4/15/07, Manoel L. [email protected] wrote:

while token = tokenizer.next
token[“”:4:4:1]


Hi Manoel,

I finally managed to work out a fix for this after working on it for
hours. It appears that OpenSolaris has a bug in it’s isdigit
implementation although I can’t be sure. isdigit(-76) returns true.
I’m not sure which character encoding this would be true for however.

Anyway, I’ve made it so that you won’t get this infinite loop anymore
but I haven’t really fixed your problem. Your main issue seems to be
that you don’t have a UTF-8 locale installed on your system. You’ll
need to do that before you will be able to analyze UTF-8 data.

So, having said all that, I don’t think there is any point in me
putting out a quick release now (to give you the fix) as you will need
to set up your locale to handle UTF-8 and that will already fix your
problem.

Hope that helps,
Dave

Same here.
I’m running Windows XP SP2, also with mswin32 Ferret gem. Problem occurs
when I try to search for a word with a special character, i.e. ‘ä’
specifically.

Hello.

I just want to add I’m having exactly this same problem on my desktop
running a brazillian edition of Windows XP Pro SP2, with the latest
mswin32 Ferret gem. I could not log anything since I have only 1.5GB of
installed memory, and the system freezes if I don’t kill the ruby.exe
task. But testing that bit in the console, I got the same infinite loop.

This also always occurs when I search for a variety of accented words.

On my Linux box it runs no problem.