Forum: Ruby DRb Crashing

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
james (Guest)
on 2005-11-15 22:55
(Received via mailing list)
If I launch this server:

#!/usr/local/bin/ruby

require "drb"
require "rinda/tuplespace"

tuplespace = Rinda::TupleSpace.new
DRb.start_service("druby://localhost:61676", tuplespace)

loop do
   nums    = Array.new(rand(9) + 2) { rand(10) + 1 }
   ops     = Array.new(nums.size - 1) { %w{+ - * /}[rand(4)] }
   problem = nums.zip(ops).flatten.compact.join(" ")

   tuplespace.write(["Problem", problem])
   puts tuplespace.take(["Result", String]).last
end

__END__

Then run this client:

#!/usr/local/bin/ruby -w

require "drb"
require "rinda/tuplespace"

DRb.start_service
tuplespace = Rinda::TupleSpaceProxy.new(
   DRbObject.new_with_uri("druby://localhost:61676")
)

while problem = tuplespace.take(["Problem", %r{^\d+(?: [-+*/] \d+)+$}])
   tuplespace.write(["Result", "#{problem.last} = #{eval
problem.last}"])
end

__END__

The client crashes, generally within a few seconds.  Adding a sleep
inside the server loop seems to resolve the issue.

Anyone know why?

James Edward G. II
drbrain (Guest)
on 2005-11-16 01:14
(Received via mailing list)
On Nov 15, 2005, at 12:52 PM, James Edward G. II wrote:

> loop do
> Then run this client:
>
> while problem = tuplespace.take(["Problem", %r{^\d+(?: [-+*/] \d+)+
> $}])
>   tuplespace.write(["Result", "#{problem.last} = #{eval
> problem.last}"])
> end
>
> __END__
>
> The client crashes, generally within a few seconds.  Adding a sleep
> inside the server loop seems to resolve the issue.

With what error?

> Anyone know why?

I've had better luck with these kinds of errors by placing the
TupleSpace in its own process.
james (Guest)
on 2005-11-16 01:20
(Received via mailing list)
On Nov 15, 2005, at 5:12 PM, Eric H. wrote:

>> The client crashes, generally within a few seconds.  Adding a
>> sleep inside the server loop seems to resolve the issue.
>>
>
> With what error?

$ ruby client.rb
(druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:
332:in `move': undefined method `push' for :EDQUOT=:Symbol
(NoMethodError)
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/
monitor.rb:229:in `synchronize'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/
tuplespace.rb:329:in `move'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1552:in `perform_without_block'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1512:in `perform'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1586:in `main_loop'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1582:in `main_loop'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1578:in `main_loop'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1427:in `run'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1424:in `run'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1344:in `initialize'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1624:in `start_service'
         from (druby://localhost:61676) server.rb:7
         from /usr/local/lib/ruby/1.8/rinda/rinda.rb:153:in `take'
         from client.rb:11

James Edward G. II
drbrain (Guest)
on 2005-11-16 01:53
(Received via mailing list)
On Nov 15, 2005, at 3:17 PM, James Edward G. II wrote:

> tuplespace.rb:332:in `move': undefined method `push'
> for :EDQUOT=:Symbol (NoMethodError)

I'm seeing similar, but only on the Mac.

I wrote [ruby-core:06629], so you may want to add your ruby versions
and any additional insight there, as I think this is a problem
somewhere in the guts of Ruby.
hgs (Guest)
on 2005-11-16 02:50
(Received via mailing list)
On Wed, 16 Nov 2005, James Edward G. II wrote:

> If I launch this server:
        [...]
> tuplespace = Rinda::TupleSpaceProxy.new(
> server loop seems to resolve the issue.
>
> Anyone know why?

No, I don't. What happens if you match the old way? Like:
  while problem = tuplespace.take(["Problem", nil])
>
> James Edward G. II
>
        Hugh
drbrain (Guest)
on 2005-11-16 03:06
(Received via mailing list)
On Nov 15, 2005, at 4:50 PM, Hugh S. wrote:

> On Wed, 16 Nov 2005, James Edward G. II wrote:
>
>> The client crashes, generally within a few seconds.  Adding a
>> sleep inside the
>> server loop seems to resolve the issue.
>>
>> Anyone know why?
>
> No, I don't. What happens if you match the old way? Like:
>   while problem = tuplespace.take(["Problem", nil])

It seems to be a problem only when OS X is involved.

I've been running the same two scripts using Ruby 1.8.3 on FreeBSD
for the past hour without error.  I'll let them run for at least
another four or five to see if they fail.

See-also: [ruby-core:06629]
james (Guest)
on 2005-11-16 17:23
(Received via mailing list)
On Nov 15, 2005, at 6:50 PM, Hugh S. wrote:

>>
>> while problem = tuplespace.take(["Problem", %r{^\d+(?: [-+*/] \d+)+
>>
>> Anyone know why?
>>
>
> No, I don't. What happens if you match the old way? Like:
>   while problem = tuplespace.take(["Problem", nil])

The Regexp is pretty critical in this case.  It validates the data
before an otherwise dangerous call to eval().

James Edward G. II
hgs (Guest)
on 2005-11-16 17:44
(Received via mailing list)
On Thu, 17 Nov 2005, James Edward G. II wrote:

> > >
> > > )
> > >
> > > Anyone know why?
> > >
> >
> > No, I don't. What happens if you match the old way? Like:
> >  while problem = tuplespace.take(["Problem", nil])
>
> The Regexp is pretty critical in this case.  It validates the data before an
> otherwise dangerous call to eval().

  while problem = tuplespace.take(["Problem", nil])
    unless problem.last =~ %r{^\d+(?: [-+*/] \d+)+$}
      puts "Bogus input \'#{problem.last}\', Ted!" #:-)
    else
      tuplespace.write(["Result", "#{problem.last} = #{eval
problem.last}"])
    end
  end
>
> James Edward G. II
>
        Hugh
james (Guest)
on 2005-11-16 17:50
(Received via mailing list)
On Nov 16, 2005, at 9:42 AM, Hugh S. wrote:

>>>> If I launch this server:
>>>>
>>>> tuplespace.write(["Result", "#{problem.last} = #{eval
>>>> Anyone know why?
>>
>
>   while problem = tuplespace.take(["Problem", nil])
>     unless problem.last =~ %r{^\d+(?: [-+*/] \d+)+$}
>       puts "Bogus input \'#{problem.last}\', Ted!" #:-)
>     else
>       tuplespace.write(["Result", "#{problem.last} = #{eval
> problem.last}"])
>     end
>   end

This is not equivalent.  You removed the problem from the TupleSpace
whereas my version leaves it for someone else to solve.

I realize this isn't what you were originally asking for and I can
try the change, if you want to see it.  To me it's irrelevant though,
because TupleSpace supports a Regexp search and it should not be
crashing when doing it.

This is just a simplified case I cooked up to show the issue.

James Edward G. II
hgs (Guest)
on 2005-11-16 18:11
(Received via mailing list)
On Thu, 17 Nov 2005, James Edward G. II wrote:

> On Nov 16, 2005, at 9:42 AM, Hugh S. wrote:
>
> > On Thu, 17 Nov 2005, James Edward G. II wrote:
> >
> >
> > > On Nov 15, 2005, at 6:50 PM, Hugh S. wrote:
> > >
> > >
> > > > On Wed, 16 Nov 2005, James Edward G. II wrote:
        [...]
> >    end
> >  end
>
> This is not equivalent.  You removed the problem from the TupleSpace whereas
> my version leaves it for someone else to solve.

Yes, that's true, but it's not really the point I was making -- one
could write it back, or whatever.
>
> I realize this isn't what you were originally asking for and I can try the
> change, if you want to see it.  To me it's irrelevant though, because
> TupleSpace supports a Regexp search and it should not be crashing when doing
> it.

IIRC it didn't in the past. My point is that if the old way works then
maybe it narrows down where to swat the bug.  My expectation is that
it won't make any difference, but it it isn't tested we won't know.
>
> This is just a simplified case I cooked up to show the issue.
>
> James Edward G. II

        Hugh
james (Guest)
on 2005-11-16 18:56
(Received via mailing list)
On Nov 16, 2005, at 10:09 AM, Hugh S. wrote:

> IIRC it didn't in the past. My point is that if the old way works then
> maybe it narrows down where to swat the bug.  My expectation is that
> it won't make any difference, but it it isn't tested we won't know.

Same issue.

$ ruby client.rb
(druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:
446:in `move': undefined method `push' for :push:Symbol (NoMethodError)
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/
monitor.rb:229:in `synchronize'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/
tuplespace.rb:443:in `move'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1552:in `perform_without_block'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1512:in `perform'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1586:in `main_loop'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1582:in `main_loop'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1578:in `main_loop'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1427:in `run'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1424:in `run'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1344:in `initialize'
         from (druby://localhost:61676) /usr/local/lib/ruby/1.8/drb/
drb.rb:1624:in `start_service'
         from (druby://localhost:61676) server.rb:7
         from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:in `take'
         from client.rb:11
$ cat client.rb
#!/usr/local/bin/ruby -w

require "drb"
require "rinda/tuplespace"

DRb.start_service
tuplespace = Rinda::TupleSpaceProxy.new(
   DRbObject.new_with_uri("druby://localhost:61676")
)

while problem = tuplespace.take(["Problem", nil])
   tuplespace.write(["Result", "#{problem.last} = #{eval
problem.last}"])
end

__END__

James Edward G. II
hgs (Guest)
on 2005-11-16 19:39
(Received via mailing list)
On Thu, 17 Nov 2005, James Edward G. II wrote:

> >
> > IIRC it didn't in the past. My point is that if the old way works then
> > maybe it narrows down where to swat the bug.  My expectation is that
> > it won't make any difference, but it it isn't tested we won't know.
>
> Same issue.

Bother.
>
> $ ruby client.rb
> (druby://localhost:61676) /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:446:in
> `move': undefined method `push' for :push:Symbol (NoMethodError)

which means the parameter port is set to :push

>        from (druby://localhost:61676)
> /usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize'
>        from (druby://localhost:61676)
> /usr/local/lib/ruby/1.8/rinda/tuplespace.rb:443:in `move'
>        from (druby://localhost:61676)
> /usr/local/lib/ruby/1.8/drb/drb.rb:1552:in `perform_without_block'
>        from (druby://localhost:61676)

something to do with @obj and @argv. These are delivered by
__send__.

> /usr/local/lib/ruby/1.8/drb/drb.rb:1512:in `perform'
>        from (druby://localhost:61676)
> /usr/local/lib/ruby/1.8/drb/drb.rb:1586:in `main_loop'
>        from (druby://localhost:61676)

seems to come from cliemt.recvfrom.
> /usr/local/lib/ruby/1.8/drb/drb.rb:1582:in `main_loop'
        [...]
> /usr/local/lib/ruby/1.8/drb/drb.rb:1624:in `start_service'
>        from (druby://localhost:61676) server.rb:7
>        from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:in `take'
>        from client.rb:11

All comments are from looking at the CVS, but the line numbers
agree, AFAICS.
> $ cat client.rb
        [...]
> __END__

Looks fine to me.
>
> James Edward G. II
>
        I'm stumped.
        Hugh
drbrain (Guest)
on 2005-11-17 00:47
(Received via mailing list)
On Nov 16, 2005, at 9:36 AM, Hugh S. wrote:

>> /usr/local/lib/ruby/1.8/monitor.rb:229:in `synchronize'
>>        from (druby://localhost:61676)
>
> All comments are from looking at the CVS, but the line numbers
> agree, AFAICS.
>> $ cat client.rb
>         [...]
>> __END__
>
> Looks fine to me.
>
>       I'm stumped.

This is likely not a problem that can be fixed with more Ruby code.

I ran the two files on DRb for over 4 hours continuously on a FreeBSD
machine, which tells me that it is Mac-specific.  Mixing FreeBSD and
Mac would always crash within 5 minutes at worst.  My best guess is
that Marshal is not operating correctly or ObjectSpace#_id2ref is
looking up bad objects.
drbrain (Guest)
on 2005-11-17 00:50
(Received via mailing list)
On Nov 16, 2005, at 2:46 PM, Eric H. wrote:

>>
>>
>>>        from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:in `take'
>>       I'm stumped.
>
> This is likely not a problem that can be fixed with more Ruby code.
>
> I ran the two files on DRb for over 4 hours continuously on a
> FreeBSD machine, which tells me that it is Mac-specific.  Mixing
> FreeBSD and Mac would always crash within 5 minutes at worst.  My
> best guess is that Marshal is not operating correctly or
> ObjectSpace#_id2ref is looking up bad objects.

I should also note that disabling the GC on the client does not
affect the frequency of crashes.
ptkwt (Guest)
on 2005-12-01 23:11
(Received via mailing list)
In article <removed_email_address@domain.invalid>,
James Edward G. II  <removed_email_address@domain.invalid> wrote:
>loop do
>Then run this client:
>
>Anyone know why?
>
>James Edward G. II
>
>
>

Just wondering if you've gotten any resolution on this?  I'm seeing the
same
thing on Tiger both with your test code and my own DRb code.  While the
same
code runs fine for hours on Linux and FreeBSD.  It seems that DRb cannot
be
used reliably on OSX.  Anyone have any idea what might be going on?

Phil
ptkwt (Guest)
on 2005-12-01 23:47
(Received via mailing list)
In article <removed_email_address@domain.invalid>,
Eric H.  <removed_email_address@domain.invalid> wrote:
>>
>>
>>>        from /usr/local/lib/ruby/1.8/rinda/rinda.rb:229:in `take'
>>       I'm stumped.
>
>This is likely not a problem that can be fixed with more Ruby code.
>
>I ran the two files on DRb for over 4 hours continuously on a FreeBSD
>machine, which tells me that it is Mac-specific.  Mixing FreeBSD and
>Mac would always crash within 5 minutes at worst.  My best guess is
>that Marshal is not operating correctly or ObjectSpace#_id2ref is
>looking up bad objects.

I've seen similar results: runs for hours on Linux, very flakey on OSX.
It's
kind of hard to believe that Marshal is broke on OSX, though.  I'm
thinking
it's something related to networking code.

Phil
drbrain (Guest)
on 2005-12-02 00:19
(Received via mailing list)
On Dec 1, 2005, at 1:07 PM, Phil T. wrote:

> Just wondering if you've gotten any resolution on this?  I'm seeing
> the same
> thing on Tiger both with your test code and my own DRb code.  While
> the same
> code runs fine for hours on Linux and FreeBSD.  It seems that DRb
> cannot be
> used reliably on OSX.  Anyone have any idea what might be going on?

I haven't had the time to look into it yet.  I'll try to get some
time in against the latest 1.8.4 preview RSN.

--
Eric H. - removed_email_address@domain.invalid - http://segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com
drbrain (Guest)
on 2005-12-05 04:18
(Received via mailing list)
On Dec 1, 2005, at 1:07 PM, Phil T. wrote:

>> DRb.start_service("druby://localhost:61676", tuplespace)
>> __END__
>>   DRbObject.new_with_uri("druby://localhost:61676")
>> The client crashes, generally within a few seconds.  Adding a sleep
> used reliably on OSX.  Anyone have any idea what might be going on?
Compiling Ruby with GCC3.3 seems to make the problem go away [ruby-
core:6825].

GCC 4 seems to build an ok Ruby on other systems [ruby-core:6827].

--
Eric H. - removed_email_address@domain.invalid - http://segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com
This topic is locked and can not be replied to.