Forum: Ferret [Ferret] Serious memory leak on Joyent / TextDrive / Solaris

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Manoel L. (Guest)
on 2007-04-13 22:18
There is serious memory leak bug in ferret. I'm having this error on
TextDrive Container (aka. Joyent Accelerators) OpenSolaris with Ferret
0.11.4

It happens while searching for some terms with accented or special
characters.
This makes ferret to allocate lots of memory (usually reaching 3+ GB)
and failing if another query like this is executed.

Any ideas on that, could this be locale or any other system settings?

Any suggestions on how to debug that?

Look the console session bellow where the error is reproduced even in a
very simple index:

>> require 'ferret'
>> include Ferret
>> index = Index::Index.new()
>> index << "My document"
>> index.search_each("bonô") { |id,score| puts score }
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:749:in
`parse': failed to allocate memory (NoMemoryError)
        from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:749:in
`do_process_query'
        from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:382:in
`search_each'
        from /opt/csw/lib/ruby/1.8/monitor.rb:229:in `synchronize'
        from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:380:in
`search_each'
        from (irb):10:in `irb_binding'
        from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding'
        from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52
[92140-AA:~/web/labs/blogblogs/trunk] pocscom$
David B. (Guest)
on 2007-04-15 03:28
(Received via mailing list)
On 4/14/07, Manoel L. <removed_email_address@domain.invalid> wrote:
>
> /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.11.4/lib/ferret/index.rb:749:in
> `search_each'
>         from (irb):10:in `irb_binding'
>         from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52:in `irb_binding'
>         from /opt/csw/lib/ruby/1.8/irb/workspace.rb:52
> [92140-AA:~/web/labs/blogblogs/trunk] pocscom$

Hi Manoel,

I really think this has something to do with OpenSolaris. Just to
narrow the problem down further, could you try this;


    require 'rubygems'
    require 'ferret'

    tokenizer =
Ferret::Analysis::StandardAnalyzer.new().token_stream(:field,
"bon\303\264")
    while token = tokenizer.next
      puts token
    end

I suspect this will cause the same problem. If it does, I'll try
writing a simple C program to test your locale library.

Cheers,
Dave
Manoel L. (Guest)
on 2007-04-15 08:40
David,


I did that and it entered on an infinite loop, see:

?> require 'ferret'
=> false
n\303\264")r =
Ferret::Analysis::StandardAnalyzer.new().token_stream(:field,
=> #<Ferret::Analysis::TokenStream:0x98967d0>
>> while token = tokenizer.next
>> puts token
>> end
token["bon":0:3:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
token["":4:4:1]
...
...
...

[]s
David B. (Guest)
on 2007-04-17 08:29
(Received via mailing list)
On 4/15/07, Manoel L. <removed_email_address@domain.invalid> wrote:
> >> while token = tokenizer.next
> token["":4:4:1]
> ...
> ...
> ...

Hi Manoel,

I finally managed to work out a fix for this after working on it for
hours. It appears that OpenSolaris has a bug in it's isdigit
implementation although I can't be sure. isdigit(-76) returns true.
I'm not sure which character encoding this would be true for however.

Anyway, I've made it so that you won't get this infinite loop anymore
but I haven't really fixed your problem. Your main issue seems to be
that you don't have a UTF-8 locale installed on your system. You'll
need to do that before you will be able to analyze UTF-8 data.

So, having said all that, I don't think there is any point in me
putting out a quick release now (to give you the fix) as you will need
to set up your locale to handle UTF-8 and that will already fix your
problem.

Hope that helps,
Dave
Fernando (Guest)
on 2007-04-17 22:02
Hello.

I just want to add I'm having exactly this same problem on my desktop
running a brazillian edition of Windows XP Pro SP2, with the latest
mswin32 Ferret gem. I could not log anything since I have only 1.5GB of
installed memory, and the system freezes if I don't kill the ruby.exe
task. But testing that bit in the console, I got the same infinite loop.

This also always occurs when I search for a variety of accented words.

On my Linux box it runs no problem.
Tina (Guest)
on 2007-05-01 02:03
Same here.
I'm running Windows XP SP2, also with mswin32 Ferret gem. Problem occurs
when I try to search for a word with a special character, i.e. 'ä'
specifically.
This topic is locked and can not be replied to.