Segmentation fault, proc, eval, long string

bobh · November 30, 2006, 4:38pm

Hi,

I’m getting a ‘Segmentation fault’ in ruby 1.8.5 running on debian in
a Xen VPS. The same code running on OS X and a different version of
linux has no problems.

The process to get this is maybe a little strange.

read a large file into a string (1.3MB)
eval the string (the string is a single ruby proc definition that
when called will build an object structure in memory)
call the proc → Segmentation fault very soon after

The file was generated by the same program but it was running but on
a different machine, in this case the other linux box I mentioned above.

Knowning full well that there can be all kinds of differences between
the linuxes, I’ll claim that the only interesting difference that I
can find is/was in the architectures reported by ruby --version: on
the machine that works reports i686-linux, the machine that doesn’t
reports i386-linux – so I rebuilt a version that was also i686 and,
of course, this made no difference. So all that means is that I can’t
find the truly interesting difference.

If I edit the file from where the string is read, and replace a bunch
of assignments of a particular type of object (the objects are still
created) (about 6000 of them) then the problem disappears. There’s
nothing special about the objects I got rid of, it was just easy to
use regular expressions to identify them and get rid of their
assignment.

If I try running ruby through gdb there is a SIGSEGV signal at eval.c:
2890 – which is the unknown_node method but I can’t get a more
complete stacktrace (until I figure out how to build ruby with the
debug information not stripped out). Manually poking around though,
method_call calls rb_call0 calls unknown_node so I’m betting on this.
And so? Well maybe the eval of the string produced an invalid proc
object? What’s the cause of this? Too long a string? too many objects
in the eval? too big a proc object? But why work on one linux box and
fail on the other?

I’m wondering if anyone has seen anything like this before or maybe
have any experience debugging this kind of thing? Any suggestions
very much appreciated.

Thanks,
Bob

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · November 30, 2006, 6:10pm

On 11/30/06, Bob H. [email protected] wrote:

when called will build an object structure in memory)
3) call the proc → Segmentation fault very soon after

Hrm. This looks similar to the problem reported here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/80435

bobh · November 30, 2006, 6:52pm

A little more on this…

On 30-Nov-06, at 10:36 AM, Bob H. wrote:

that when called will build an object structure in memory)
machine that doesn’t reports i386-linux – so I rebuilt a version
If I try running ruby through gdb there is a SIGSEGV signal at
eval.c:2890 – which is the unknown_node method but I can’t get a
more complete stacktrace (until I figure out how to build ruby with
the debug information not stripped out). Manually poking around
though, method_call calls rb_call0 calls unknown_node so I’m
betting on this. And so? Well maybe the eval of the string produced
an invalid proc object? What’s the cause of this? Too long a
string? too many objects in the eval? too big a proc object? But
why work on one linux box and fail on the other?

So I put some printf into the eval.c file and it turns out that
rb_eval is called recursively 5301 times before seg faulting, while
trying to handle a NODE_DASGN_CURR node. There are no other eval node
types being evaluated when this begins, every node is a NODE_DASGN_CURR.

There is nothing that is anywhere that deep in the script that I am
evaluating. So it looks as though the proc object is corrupt??

So maybe this is reproducible?? Well, so it is. If I run this script:

module SomeModule
def initialize
@@proc = nil
end

def SomeModule.build
if @@proc then
result = @@proc.call
@@proc = nil
return result
end
end
end

N = 5000

the_string = “”

the_string << “module SomeModule\n”
the_string << " @@proc = Proc.new {\n"
the_string << " thing = []\n"

N.times do | i |
the_string << " v#{i} = [#{i}]\n"
end

N.times do | i |
the_string << " thing << v#{i}\n"
end

the_string << " thing\n"
the_string << " } #proc\n"
the_string << “end\n”

puts(“the_string length: #{the_string.length}”)
eval(the_string, nil, “ruby_definition”, 0)
SomeModule.build

It will fail on the one linux box, run on the other, and run on OS X.
With a little binary search, the smallest N that causes the segfault
is 3024 (3023 works).

Does this help?

hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – <http://rubyforge.org/projects/
xampl/>

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · November 30, 2006, 7:02pm

On 30-Nov-06, at 12:09 PM, Wilson B. wrote:

eval the string (the string is a single ruby proc definition that
when called will build an object structure in memory)

call the proc → Segmentation fault very soon after

Hrm. This looks similar to the problem reported here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/80435

Thanks for the link.

Could be, but that thread kind of petered out. There were some others
that I found that didn’t seem to resolve. There was one in Japanese
that I certainly could not follow

Cheers,
Bob

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · November 30, 2006, 7:08pm

On 30-Nov-06, at 12:50 PM, Bob H. wrote:

The process to get this is maybe a little strange.

read a large file into a string (1.3MB)

eval the string (the string is a single ruby proc definition
that when called will build an object structure in memory)

call the proc → Segmentation fault very soon after

[snip]

  result = @@proc.call
the_string << “module SomeModule\n”

X. With a little binary search, the smallest N that causes the
segfault is 3024 (3023 works).

So, to increase the strangeness a bit… if I run this from within
vim (i.e. using “:!ruby crash.rb”) it works for some pretty big Ns.

Sigh.

Cheers,
Bob

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · November 30, 2006, 7:22pm

On 11/30/06, Wilson B. [email protected] wrote:

prematurely garbage collected.

Maybe the size of that method hits a Ruby threshold that triggers GC
inappropriately?

Try turning GC off; if that fixes it, that might help narrow it down.

Oh, and what happens when you freeze the string before eval’ing it?

bobh · November 30, 2006, 7:18pm

On 11/30/06, Bob H. [email protected] wrote:

The process to get this is maybe a little strange.
Thanks for the link.

Could be, but that thread kind of petered out. There were some others
that I found that didn’t seem to resolve. There was one in Japanese
that I certainly could not follow

Can you get a full stack trace from gdb or something?
I found a pile of other links by googling for ‘unknown node type’ that
seem to suggest that maybe some of your objects are getting
prematurely garbage collected.

Maybe the size of that method hits a Ruby threshold that triggers GC
inappropriately?

Try turning GC off; if that fixes it, that might help narrow it down.

bobh · November 30, 2006, 7:42pm

On 30-Nov-06, at 1:20 PM, Wilson B. wrote:

Oh, and what happens when you freeze the string before eval’ing it?

same thing. Tried freezing the proc too, no change. Thanks again though.

Cheers,
Bob

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · November 30, 2006, 7:40pm

On 30-Nov-06, at 1:18 PM, Wilson B. wrote:

That didn’t make any difference. Nice idea though.

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · November 30, 2006, 7:56pm

On 11/30/06, Bob H. [email protected] wrote:

The process to get this is maybe a little strange.
Knowning full well that there can be all kinds of differences
disappears. There’s nothing special about the objects I got rid of,
string? too many objects in the eval? too big a proc object? But
So maybe this is reproducible?? Well, so it is. If I run this script:
return result
the_string << " thing = []\n"
the_string << " } #proc\n"

Does this help?

Segfaults for me on my Debian box with ruby 1.8.4 (2005-12-24)
[i386-linux]

bobh · November 30, 2006, 8:43pm

On 30-Nov-06, at 1:56 PM, Wilson B. wrote:

are still created) (about 6000 of them) then the problem
an invalid proc object? What’s the cause of this? Too long a
evaluating. So it looks as though the proc object is corrupt??
result = @@proc.call
the_string << “module SomeModule\n”

With a little binary search, the smallest N that causes the segfault
is 3024 (3023 works).

Does this help?

Segfaults for me on my Debian box with ruby 1.8.4 (2005-12-24)
[i386-linux]

Oh dear. In some ways I was hoping for something unique to my
machine. Thanks a lot for trying this.

Cheers,
Bob

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · December 1, 2006, 1:31pm

Bob H. schrieb:

(…)

It will fail on the one linux box, run on the other, and run on OS X.
With a little binary search, the smallest N that causes the segfault is
3024 (3023 works).

Bob, you can use parsetree to dump the AST of the generated proc. I’m
sure you’ll see the deep nesting of the nodes.

Regards,
Pit

bobh · December 1, 2006, 3:22pm

On 1-Dec-06, at 7:31 AM, Pit C. wrote:

It will fail on the one linux box, run on the other, and run on OS
X. With a little binary search, the smallest N that causes the
segfault is 3024 (3023 works).

Bob, you can use parsetree to dump the AST of the generated proc.
I’m sure you’ll see the deep nesting of the nodes.

Okay, I’ll do that. But a Segmentation Fault? Surely there’s a more
polite way to deal with the problem.

Cheers,
Bob

Regards,
Pit

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · December 1, 2006, 4:58pm

Wilson B. schrieb:

On 12/1/06, Pit C. [email protected] wrote:

Bob, you can use parsetree to dump the AST of the generated proc. I’m
sure you’ll see the deep nesting of the nodes.

I just tried this, and here’s what it gave me (for a smaller N, so the
whole process doesn’t crash. Should show the same structure no matter
what N is, though)

(…)

Wilson, this is not the dump of the generated proc. You have to pass the
contents of @@proc to ParseTree.

Regards,
Pit

bobh · December 1, 2006, 4:36pm

On 12/1/06, Pit C. [email protected] wrote:

(…)

It will fail on the one linux box, run on the other, and run on OS X.
With a little binary search, the smallest N that causes the segfault is
3024 (3023 works).

Bob, you can use parsetree to dump the AST of the generated proc. I’m
sure you’ll see the deep nesting of the nodes.

I just tried this, and here’s what it gave me (for a smaller N, so the
whole process doesn’t crash. Should show the same structure no matter
what N is, though)

[[:module,
:SomeModule,
[:defn,
:initialize,
[:scope, [:block, [:args], [:cvasgn, :@@proc, [:nil]]]]],
[:defn,
:“self.build”,
[:scope,
[:block,
[:args],
[:if,
[:cvar, :@@proc],
[:block,
[:lasgn, :result, [:call, [:cvar, :@@proc], :call]],
[:cvasgn, :@@proc, [:nil]],
[:return, [:lvar, :result]]],
nil]]]]]]

bobh · December 1, 2006, 5:08pm

On 12/1/06, Pit C. [email protected] wrote:

Wilson, this is not the dump of the generated proc. You have to pass the
contents of @@proc to ParseTree.

Haha. Oops. You’re right:

wilson@metaclass:~$ ruby boom.rb
the_string length: 186747
[[:module,
:SomeModule,
[:scope,
[:cvdecl,
:@@pr,
[:iter,
[:call, [:const, :Proc], :new],
nil,
[:block,
[:dasgn_curr,
:thing,
[:dasgn_curr,
:v0,
[:dasgn_curr,
:v1,
[:dasgn_curr,
:v2,
[:dasgn_curr,
:v3,
[:dasgn_curr,
:v4,
[:dasgn_curr,
:v5,
/usr/lib/ruby/1.8/prettyprint.rb:344:in deq': stack level too deep (SystemStackError) from /usr/lib/ruby/1.8/prettyprint.rb:343:in deq’
from /usr/lib/ruby/1.8/prettyprint.rb:171:in
break_outmost_groups' from /usr/lib/ruby/1.8/prettyprint.rb:197:in text’
from /usr/lib/ruby/1.8/pp.rb:245:in pretty_print' from /usr/lib/ruby/1.8/pp.rb:126:in pp’
from /usr/lib/ruby/1.8/prettyprint.rb:224:in group' from /usr/lib/ruby/1.8/prettyprint.rb:247:in nest’
from /usr/lib/ruby/1.8/prettyprint.rb:223:in group' ... 294 levels... from /usr/lib/ruby/1.8/pp.rb:69:in pp’
from /usr/lib/ruby/1.8/pp.rb:52:in pp' from /usr/lib/ruby/1.8/pp.rb:51:in pp’
from boom.rb:45

bobh · December 1, 2006, 5:22pm

On 12/1/06, Wilson B. [email protected] wrote:

(…)
:SomeModule,
:v0,
/usr/lib/ruby/1.8/prettyprint.rb:344:in deq': stack level too deep from /usr/lib/ruby/1.8/pp.rb:69:in pp’
from /usr/lib/ruby/1.8/pp.rb:52:in pp' from /usr/lib/ruby/1.8/pp.rb:51:in pp’
from boom.rb:45

OK. Wow.
I moved the code outside of a proc, and into a bare module, and re-ran
parse_tree on it.
[[:module,
:SomeModule,
[:scope,
[:block,
[:lasgn, :thing, [:zarray]],
[:lasgn, :v0, [:array, [:lit, 0]]],
[:lasgn, :v1, [:array, [:lit, 1]]],
[:lasgn, :v2, [:array, [:lit, 2]]],
[:lasgn, :v3, [:array, [:lit, 3]]],
[:lasgn, :v4, [:array, [:lit, 4]]],
[:lasgn, :v5, [:array, [:lit, 5]]],
[:lasgn, :v6, [:array, [:lit, 6]]],
[:lasgn, :v7, [:array, [:lit, 7]]],
[:lasgn, :v8, [:array, [:lit, 8]]],
[:lasgn, :v9, [:array, [:lit, 9]]],
[:lasgn, :v10, [:array, [:lit, 10]]],
[:lasgn, :v11, [:array, [:lit, 11]]],
[:lasgn, :v12, [:array, [:lit, 12]]],
[:lasgn, :v13, [:array, [:lit, 13]]],
[:lasgn, :v14, [:array, [:lit, 14]]],
[:lasgn, :v15, [:array, [:lit, 15]]],
[:lasgn, :v16, [:array, [:lit, 16]]],
[:lasgn, :v17, [:array, [:lit, 17]]],
[:lasgn, :v18, [:array, [:lit, 18]]],
[:lasgn, :v19, [:array, [:lit, 19]]],
[:lasgn, :v20, [:array, [:lit, 20]]],
[:lasgn, :v21, [:array, [:lit, 21]]],
[:lasgn, :v22, [:array, [:lit, 22]]],
[:lasgn, :v23, [:array, [:lit, 23]]],
[:lasgn, :v24, [:array, [:lit, 24]]],
[:lasgn, :v25, [:array, [:lit, 25]]],
[:lasgn, :v26, [:array, [:lit, 26]]],
[:lasgn, :v27, [:array, [:lit, 27]]],
[:lasgn, :v28, [:array, [:lit, 28]]],
[:lasgn, :v29, [:array, [:lit, 29]]],
[:lasgn, :v30, [:array, [:lit, 30]]],
[:lasgn, :v31, [:array, [:lit, 31]]],
[:lasgn, :v32, [:array, [:lit, 32]]],
[:lasgn, :v33, [:array, [:lit, 33]]],
[:lasgn, :v34, [:array, [:lit, 34]]],
[:lasgn, :v35, [:array, [:lit, 35]]],
[:lasgn, :v36, [:array, [:lit, 36]]],
[:lasgn, :v37, [:array, [:lit, 37]]],
[:lasgn, :v38, [:array, [:lit, 38]]],
[:lasgn, :v39, [:array, [:lit, 39]]],
[:lasgn, :v40, [:array, [:lit, 40]]],
[:lasgn, :v41, [:array, [:lit, 41]]],
[:lasgn, :v42, [:array, [:lit, 42]]],
[:lasgn, :v43, [:array, [:lit, 43]]],
[:lasgn, :v44, [:array, [:lit, 44]]],
[:lasgn, :v45, [:array, [:lit, 45]]],
[:lasgn, :v46, [:array, [:lit, 46]]],
[:lasgn, :v47, [:array, [:lit, 47]]],
[:lasgn, :v48, [:array, [:lit, 48]]],
[:lasgn, :v49, [:array, [:lit, 49]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v0]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v1]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v2]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v3]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v4]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v5]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v6]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v7]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v8]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v9]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v10]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v11]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v12]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v13]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v14]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v15]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v16]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v17]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v18]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v19]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v20]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v21]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v22]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v23]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v24]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v25]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v26]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v27]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v28]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v29]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v30]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v31]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v32]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v33]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v34]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v35]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v36]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v37]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v38]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v39]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v40]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v41]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v42]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v43]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v44]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v45]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v46]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v47]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v48]]],
[:call, [:lvar, :thing], :<<, [:array, [:lvar, :v49]]],
[:lvar, :thing]]]]]

It looks like the parser treats things very differently inside a Proc
definition.

bobh · December 1, 2006, 5:42pm

On 1-Dec-06, at 7:31 AM, Pit C. wrote:

Bob, you can use parsetree to dump the AST of the generated proc.
I’m sure you’ll see the deep nesting of the nodes.

Oh, very interesting. Yes indeed it gets kinda deep there Thanks!

So the ‘solution’ is to set the stack size bigger (ulimit -s 20000
works up to a much larger number). But this ‘solution’ does not make
me very happy. First the stack size is the same on all the machines
that I’m using, so while this fixes the machine showing the problem I
am not entirely convinced. Secondly, all this does is increases the
depth at which it will fail (and it still does fail). Seems more of a
work around than a solution.

Any better ideas?

What is actually going on anyway? Is Ruby creating a closure or
something for each new local variable introduced?

Cheers,
Bob

Regards,
Pit

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · December 1, 2006, 5:44pm

On 1-Dec-06, at 11:21 AM, Wilson B. wrote:

[:lasgn, :v2, [:array, [:lit, 2]]],
[:lasgn, :v3, [:array, [:lit, 3]]],
[:lasgn, :v4, [:array, [:lit, 4]]],
[:lasgn, :v5, [:array, [:lit, 5]]],

That’s what I want! What’s so special about a proc?

Cheers,
Bob

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

bobh · December 1, 2006, 6:21pm

Bob H. schrieb:

What is actually going on anyway? Is Ruby creating a closure or
something for each new local variable introduced?

Bob, the whole proc is a closure, so there’s a difference between
executing “a = 1” inside or outside of a proc. But I’m not sure whether
it’s necessary to evaluate NODE_DASGN_CURR in the recursive way it is
done now. Unfortunately I’ve not enough time to check this myself. Maybe
someone else can take a look?

Regards,
Pit