DRb Crashing


#1

If I launch this server:

#!/usr/local/bin/ruby

require “drb”
require “rinda/tuplespace”

tuplespace = Rinda::TupleSpace.new
DRb.start_service(“druby://localhost:61676”, tuplespace)

loop do
nums = Array.new(rand(9) + 2) { rand(10) + 1 }
ops = Array.new(nums.size - 1) { %w{+ - * /}[rand(4)] }
problem = nums.zip(ops).flatten.compact.join(" ")

tuplespace.write([“Problem”, problem])
puts tuplespace.take([“Result”, String]).last
end

END

Then run this client:

#!/usr/local/bin/ruby -w

require “drb”
require “rinda/tuplespace”

DRb.start_service
tuplespace = Rinda::TupleSpaceProxy.new(
DRbObject.new_with_uri(“druby://localhost:61676”)
)

while problem = tuplespace.take([“Problem”, %r{^\d+(?: [-+*/] \d+)+$}])
tuplespace.write([“Result”, “#{problem.last} = #{eval
problem.last}”])
end

END

The client crashes, generally within a few seconds. Adding a sleep
inside the server loop seems to resolve the issue.

Anyone know why?

James Edward G. II


#2

On Nov 15, 2005, at 5:12 PM, Eric H. wrote:

The client crashes, generally within a few seconds. Adding a
sleep inside the server loop seems to resolve the issue.

With what error?

$ ruby client.rb
(druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:
332:in move': undefined methodpush’ for :EDQUOT=:Symbol
(NoMethodError)
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/
monitor.rb:229:in synchronize' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/ tuplespace.rb:329:inmove’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1552:in perform_without_block' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1512:inperform’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1586:in main_loop' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1582:inmain_loop’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1578:in main_loop' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1427:inrun’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1424:in run' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1344:ininitialize’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1624:in start_service' from (druby://localhost:61676) server.rb:7 from /usr/local/lib/ruby/1.8/rinda/rinda.rb:153:intake’
from client.rb:11

James Edward G. II


#3

On Nov 15, 2005, at 3:17 PM, James Edward G. II wrote:

tuplespace.rb:332:in move': undefined methodpush’
for :EDQUOT=:Symbol (NoMethodError)

I’m seeing similar, but only on the Mac.

I wrote [ruby-core:06629], so you may want to add your ruby versions
and any additional insight there, as I think this is a problem
somewhere in the guts of Ruby.


#4

On Wed, 16 Nov 2005, James Edward G. II wrote:

If I launch this server:
[…]
tuplespace = Rinda::TupleSpaceProxy.new(
server loop seems to resolve the issue.

Anyone know why?

No, I don’t. What happens if you match the old way? Like:
while problem = tuplespace.take([“Problem”, nil])

James Edward G. II

    Hugh

#5

On Nov 15, 2005, at 4:50 PM, Hugh S. wrote:

On Wed, 16 Nov 2005, James Edward G. II wrote:

The client crashes, generally within a few seconds. Adding a
sleep inside the
server loop seems to resolve the issue.

Anyone know why?

No, I don’t. What happens if you match the old way? Like:
while problem = tuplespace.take([“Problem”, nil])

It seems to be a problem only when OS X is involved.

I’ve been running the same two scripts using Ruby 1.8.3 on FreeBSD
for the past hour without error. I’ll let them run for at least
another four or five to see if they fail.

See-also: [ruby-core:06629]


#6

On Nov 15, 2005, at 6:50 PM, Hugh S. wrote:

while problem = tuplespace.take([“Problem”, %r{^\d+(?: [-+*/] \d+)+

Anyone know why?

No, I don’t. What happens if you match the old way? Like:
while problem = tuplespace.take([“Problem”, nil])

The Regexp is pretty critical in this case. It validates the data
before an otherwise dangerous call to eval().

James Edward G. II


#7

On Nov 15, 2005, at 12:52 PM, James Edward G. II wrote:

loop do
Then run this client:

while problem = tuplespace.take([“Problem”, %r{^\d+(?: [-+*/] \d+)+
$}])
tuplespace.write([“Result”, “#{problem.last} = #{eval
problem.last}”])
end

END

The client crashes, generally within a few seconds. Adding a sleep
inside the server loop seems to resolve the issue.

With what error?

Anyone know why?

I’ve had better luck with these kinds of errors by placing the
TupleSpace in its own process.


#8

On Nov 16, 2005, at 9:42 AM, Hugh S. wrote:

If I launch this server:

tuplespace.write([“Result”, "#{problem.last} = #{eval
Anyone know why?

while problem = tuplespace.take([“Problem”, nil])
unless problem.last =~ %r{^\d+(?: [-+*/] \d+)+$}
puts “Bogus input ‘#{problem.last}’, Ted!” #:slight_smile:
else
tuplespace.write([“Result”, “#{problem.last} = #{eval
problem.last}”])
end
end

This is not equivalent. You removed the problem from the TupleSpace
whereas my version leaves it for someone else to solve.

I realize this isn’t what you were originally asking for and I can
try the change, if you want to see it. To me it’s irrelevant though,
because TupleSpace supports a Regexp search and it should not be
crashing when doing it.

This is just a simplified case I cooked up to show the issue.

James Edward G. II


#9

On Thu, 17 Nov 2005, James Edward G. II wrote:

On Nov 16, 2005, at 9:42 AM, Hugh S. wrote:

On Thu, 17 Nov 2005, James Edward G. II wrote:

On Nov 15, 2005, at 6:50 PM, Hugh S. wrote:

On Wed, 16 Nov 2005, James Edward G. II wrote:
[…]

end
end

This is not equivalent. You removed the problem from the TupleSpace whereas
my version leaves it for someone else to solve.

Yes, that’s true, but it’s not really the point I was making – one
could write it back, or whatever.

I realize this isn’t what you were originally asking for and I can try the
change, if you want to see it. To me it’s irrelevant though, because
TupleSpace supports a Regexp search and it should not be crashing when doing
it.

IIRC it didn’t in the past. My point is that if the old way works then
maybe it narrows down where to swat the bug. My expectation is that
it won’t make any difference, but it it isn’t tested we won’t know.

This is just a simplified case I cooked up to show the issue.

James Edward G. II

    Hugh

#10

On Nov 16, 2005, at 10:09 AM, Hugh S. wrote:

IIRC it didn’t in the past. My point is that if the old way works then
maybe it narrows down where to swat the bug. My expectation is that
it won’t make any difference, but it it isn’t tested we won’t know.

Same issue.

$ ruby client.rb
(druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:
446:in move': undefined methodpush’ for :push:Symbol (NoMethodError)
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/
monitor.rb:229:in synchronize' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/ tuplespace.rb:443:inmove’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1552:in perform_without_block' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1512:inperform’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1586:in main_loop' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1582:inmain_loop’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1578:in main_loop' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1427:inrun’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1424:in run' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/ drb.rb:1344:ininitialize’
from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1624:in start_service' from (druby://localhost:61676) server.rb:7 from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:intake’
from client.rb:11
$ cat client.rb
#!/usr/local/bin/ruby -w

require “drb”
require “rinda/tuplespace”

DRb.start_service
tuplespace = Rinda::TupleSpaceProxy.new(
DRbObject.new_with_uri(“druby://localhost:61676”)
)

while problem = tuplespace.take([“Problem”, nil])
tuplespace.write([“Result”, “#{problem.last} = #{eval
problem.last}”])
end

END

James Edward G. II


#11

On Nov 16, 2005, at 2:46 PM, Eric H. wrote:

   from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:in `take'
  I'm stumped.

This is likely not a problem that can be fixed with more Ruby code.

I ran the two files on DRb for over 4 hours continuously on a
FreeBSD machine, which tells me that it is Mac-specific. Mixing
FreeBSD and Mac would always crash within 5 minutes at worst. My
best guess is that Marshal is not operating correctly or
ObjectSpace#_id2ref is looking up bad objects.

I should also note that disabling the GC on the client does not
affect the frequency of crashes.


#12

On Thu, 17 Nov 2005, James Edward G. II wrote:

IIRC it didn’t in the past. My point is that if the old way works then
maybe it narrows down where to swat the bug. My expectation is that
it won’t make any difference, but it it isn’t tested we won’t know.

Same issue.

Bother.

$ ruby client.rb
(druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:446:in
move': undefined methodpush’ for :push:Symbol (NoMethodError)

which means the parameter port is set to :push

   from (druby://localhost:61676)

/usr/local/lib/ruby/1.8/monitor.rb:229:in synchronize' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:443:inmove’
from (druby://localhost:61676)
/usr/local/lib/ruby/1.8/drb/drb.rb:1552:in `perform_without_block’
from (druby://localhost:61676)

something to do with @obj and @argv. These are delivered by
send.

/usr/local/lib/ruby/1.8/drb/drb.rb:1512:in perform' from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/drb.rb:1586:inmain_loop’
from (druby://localhost:61676)

seems to come from cliemt.recvfrom.

/usr/local/lib/ruby/1.8/drb/drb.rb:1582:in main_loop' [...] /usr/local/lib/ruby/1.8/drb/drb.rb:1624:instart_service’
from (druby://localhost:61676) server.rb:7
from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:in `take’
from client.rb:11

All comments are from looking at the CVS, but the line numbers
agree, AFAICS.

$ cat client.rb
[…]
END

Looks fine to me.

James Edward G. II

    I'm stumped.
    Hugh

#13

In article removed_email_address@domain.invalid,
Eric H. removed_email_address@domain.invalid wrote:

   from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:in `take'
  I'm stumped.

This is likely not a problem that can be fixed with more Ruby code.

I ran the two files on DRb for over 4 hours continuously on a FreeBSD
machine, which tells me that it is Mac-specific. Mixing FreeBSD and
Mac would always crash within 5 minutes at worst. My best guess is
that Marshal is not operating correctly or ObjectSpace#_id2ref is
looking up bad objects.

I’ve seen similar results: runs for hours on Linux, very flakey on OSX.
It’s
kind of hard to believe that Marshal is broke on OSX, though. I’m
thinking
it’s something related to networking code.

Phil


#14

In article removed_email_address@domain.invalid,
James Edward G. II removed_email_address@domain.invalid wrote:

loop do
Then run this client:

Anyone know why?

James Edward G. II

Just wondering if you’ve gotten any resolution on this? I’m seeing the
same
thing on Tiger both with your test code and my own DRb code. While the
same
code runs fine for hours on Linux and FreeBSD. It seems that DRb cannot
be
used reliably on OSX. Anyone have any idea what might be going on?

Phil


#15

On Dec 1, 2005, at 1:07 PM, Phil T. wrote:

Just wondering if you’ve gotten any resolution on this? I’m seeing
the same
thing on Tiger both with your test code and my own DRb code. While
the same
code runs fine for hours on Linux and FreeBSD. It seems that DRb
cannot be
used reliably on OSX. Anyone have any idea what might be going on?

I haven’t had the time to look into it yet. I’ll try to get some
time in against the latest 1.8.4 preview RSN.


Eric H. - removed_email_address@domain.invalid - http://segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com


#16

On Thu, 17 Nov 2005, James Edward G. II wrote:

)

Anyone know why?

No, I don’t. What happens if you match the old way? Like:
while problem = tuplespace.take([“Problem”, nil])

The Regexp is pretty critical in this case. It validates the data before an
otherwise dangerous call to eval().

while problem = tuplespace.take([“Problem”, nil])
unless problem.last =~ %r{^\d+(?: [-+*/] \d+)+$}
puts “Bogus input ‘#{problem.last}’, Ted!” #:slight_smile:
else
tuplespace.write([“Result”, “#{problem.last} = #{eval
problem.last}”])
end
end

James Edward G. II

    Hugh

#17

On Dec 1, 2005, at 1:07 PM, Phil T. wrote:

DRb.start_service(“druby://localhost:61676”, tuplespace)
END
DRbObject.new_with_uri(“druby://localhost:61676”)
The client crashes, generally within a few seconds. Adding a sleep
used reliably on OSX. Anyone have any idea what might be going on?
Compiling Ruby with GCC3.3 seems to make the problem go away [ruby-
core:6825].

GCC 4 seems to build an ok Ruby on other systems [ruby-core:6827].


Eric H. - removed_email_address@domain.invalid - http://segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com


#18

On Nov 16, 2005, at 9:36 AM, Hugh S. wrote:

/usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize’
from (druby://localhost:61676)

All comments are from looking at the CVS, but the line numbers
agree, AFAICS.

$ cat client.rb
[…]
END

Looks fine to me.

  I'm stumped.

This is likely not a problem that can be fixed with more Ruby code.

I ran the two files on DRb for over 4 hours continuously on a FreeBSD
machine, which tells me that it is Mac-specific. Mixing FreeBSD and
Mac would always crash within 5 minutes at worst. My best guess is
that Marshal is not operating correctly or ObjectSpace#_id2ref is
looking up bad objects.