DRb::DRbBadScheme when using drbunix sockets, why?

Hi,

I’ve been slowly hacking on my parallel recursive readdir()
implementation using a limited number of parallel processes, since as
you all know, Ruby threads are not truly parallel.

Anyway, I’m now getting wierd errors like the following:

Starting DRb on server
Exception DRb::DRbBadScheme' at /usr/lib/ruby/1.8/drb/drb.rb:814 - drbunix:///tmp/stoffj-test_17986 Exception DRb::DRbBadScheme’ at /usr/lib/ruby/1.8/drb/drb.rb:814 -
drbunix:///tmp/stoffj-test_17986
Exception `DRb::DRbBadScheme’ at /usr/lib/ruby/1.8/drb/drb.rb:814 -
drbunix:///tmp/stoffj-test_17986
counter.count=1

And it’s not really clear to my why I’m getting these errors, or how I
can trap them so I’m not bothered by them. The basic code uses both DRb
and Slave libraries to make the interprocess communication simpler, and
to have a central counting server to regulate the number of
sub-processes I’ll be using at any one point in time.

So, first off, here’s a full set of output of my code on a test
directory.

Thanks for any hints or suggestions. I think I need to use begin …
rescue … end blocks possibly, but it’s not clear where. Oh yeah, I’m
running this all on a CentOS 5.2 (RHEL5.2) Final system, against a small
350Mb test directory tree.

Thanks,
John

------------------------------- log ----------------------------------

$ ./readdir-drb.rb tmp
Starting DRb on server
Exception DRb::DRbBadScheme' at /usr/lib/ruby/1.8/drb/drb.rb:814 - drbunix:///tmp/stoffj-test_17986 Exception DRb::DRbBadScheme’ at /usr/lib/ruby/1.8/drb/drb.rb:814 -
drbunix:///tmp/stoffj-test_17986
Exception DRb::DRbBadScheme' at /usr/lib/ruby/1.8/drb/drb.rb:814 - drbunix:///tmp/stoffj-test_17986 counter.count=1 Threaded! Exception DRb::DRbBadScheme’ at /usr/lib/ruby/1.8/drb/drb.rb:814 -
drbunix:///tmp/slave_proc_537506070_17986_17988_0_0.125356081811465
Exception DRb::DRbBadScheme' at /usr/lib/ruby/1.8/drb/drb.rb:814 - drbunix:///tmp/slave_proc_537506070_17986_17988_0_0.125356081811465 Exception DRb::DRbBadScheme’ at /usr/lib/ruby/1.8/drb/drb.rb:814 -
drbunix:///tmp/slave_proc_537506070_17986_17988_0_0.125356081811465
Threaded!
Exception DRb::DRbBadScheme' at /usr/lib/ruby/1.8/drb/drb.rb:814 - drbunix:///tmp/slave_proc_537512250_17988_17989_0_0.20115187038118 Exception DRb::DRbBadScheme’ at /usr/lib/ruby/1.8/drb/drb.rb:814 -
drbunix:///tmp/slave_proc_537512250_17988_17989_0_0.20115187038118
Exception DRb::DRbBadScheme' at /usr/lib/ruby/1.8/drb/drb.rb:814 - drbunix:///tmp/slave_proc_537512250_17988_17989_0_0.20115187038118 size = 372104960 size = 373477128 Exception NameError’ at /usr/lib/ruby/site_ruby/1.8/slave.rb:409 -
uninitialized constant Slave::DBb
Exception RuntimeError' at /usr/lib/ruby/site_ruby/1.8/slave.rb:519 - already shutdown exit (SystemExit) /usr/lib/ruby/site_ruby/1.8/slave.rb:265:in exit’
/usr/lib/ruby/site_ruby/1.8/slave.rb:265:in cling' /usr/lib/ruby/site_ruby/1.8/slave.rb:259:in call’
/usr/lib/ruby/site_ruby/1.8/slave.rb:259:in on_cut' /usr/lib/ruby/site_ruby/1.8/slave.rb:251:in initialize’
/usr/lib/ruby/site_ruby/1.8/slave.rb:251:in new' /usr/lib/ruby/site_ruby/1.8/slave.rb:251:in on_cut’
/usr/lib/ruby/site_ruby/1.8/slave.rb:265:in cling' /usr/lib/ruby/site_ruby/1.8/slave.rb:415:in initialize’
/usr/lib/ruby/site_ruby/1.8/slave.rb:593:in new' /usr/lib/ruby/site_ruby/1.8/slave.rb:593:in object’
./readdir-drb.rb:114:in readdir' ./readdir-drb.rb:86:in foreach’
./readdir-drb.rb:86:in `readdir’
./readdir-drb.rb:207

Total size: 373477128 B
Total size: 364723 KB
Total size: 356 MB
Total size: 0 GB
Exception Errno::ENOENT' at /usr/lib/ruby/1.8/fileutils.rb:1281 - No such file or directory - /tmp/slave_proc_537506070_17986_17988_0_0.125356081811465 Exception RuntimeError’ at /usr/lib/ruby/site_ruby/1.8/slave.rb:519 -
already shutdown
rt3.taec.com:~/src/Tools/philesight-20081120$ Exception NameError' at /usr/lib/ruby/site_ruby/1.8/slave.rb:409 - uninitialized constant Slave::DBb Exception Errno::ENOENT’ at /usr/lib/ruby/1.8/fileutils.rb:1281 - No
such file or directory -
/tmp/slave_proc_537512250_17988_17989_0_0.20115187038118
Exception IOError' at /usr/lib/ruby/1.8/drb/unix.rb:92 - stream closed Exception Errno::EBADF’ at /usr/lib/ruby/1.8/drb/unix.rb:92 - Bad file
descriptor - ///tmp/slave_proc_537512250_17988_17989_0_0.20115187038118
/usr/lib/ruby/1.8/drb/unix.rb:92:in close': Bad file descriptor - ///tmp/slave_proc_537512250_17988_17989_0_0.20115187038118 (Errno::EBADF) from /usr/lib/ruby/1.8/drb/unix.rb:92:in close’
from /usr/lib/ruby/1.8/drb/drb.rb:1433:in run' from /usr/lib/ruby/1.8/drb/drb.rb:1427:in start’
from /usr/lib/ruby/1.8/drb/drb.rb:1427:in run' from /usr/lib/ruby/1.8/drb/drb.rb:1347:in initialize’
from /usr/lib/ruby/1.8/drb/drb.rb:1627:in new' from /usr/lib/ruby/1.8/drb/drb.rb:1627:in start_service’
from /usr/lib/ruby/site_ruby/1.8/slave.rb:396:in initialize' ... 28 levels... from ./readdir-drb.rb:114:in readdir’
from ./readdir-drb.rb:86:in foreach' from ./readdir-drb.rb:86:in readdir’
from ./readdir-drb.rb:207

--------------------------- source

And here’s my source code, excuse my lack of Ruby knowledge, I’m a newb
to Ruby, though not to programming. Hopefully that is shown. :]

#!/usr/bin/ruby

require ‘getoptlong’
require ‘thread’
require ‘slave’
require ‘drb’
require ‘drb/unix’

$VERSION = “v1.0”;
$max_slaves = 3
$count = 50

Local Socket - need to worry about multiple runs of this script, so

add on

unique PID.

$URI = “drbunix:///tmp/stoffj-test_” + Process.pid.to_s

slave_cnt_mutex = Mutex.new

opts = GetoptLong.new(
[ “–help”, “-h”, GetoptLong::NO_ARGUMENT],
[ “–kids”, “-k”, GetoptLong::REQUIRED_ARGUMENT]
)

#---------------------------------------------------------------------
class Counter
def initialize(max=1)
@slaves = []
@max = max
@count = 1
@count_mutex = Mutex.new
end

def count
@count
end

def max
@max
end

Increment the count of slaves, returning 1 if incremented, 0 if not.

def increment
ok = nil
@count_mutex.synchronize do
if (@count < @max) then
@count += 1
ok = 1
end
end
ok
end

Decrement the count of slaves

def decrement
@count_mutex.synchronize do
if (@count > 1) then
@count -= 1
end
end
@count
end
end

#---------------------------------------------------------------------
class ReadDir
def initialize(server)
@server = server

# Slave pool for this level of readdir() recursion.
@kids = []

end

def readdir(dir)

#puts "readdir(#{dir})"

size_file = {}
size_dir = {}
size_total = 0

# Traverse the directory and collect the size of all files and
# directories

begin
  Dir.foreach(dir) do |f|
    #print " #{f},"
    if(f != "." && f != "..") then
      f_full = addpath(dir, f)
      stat = File.lstat(f_full)

      if(!stat.symlink?) then

        if(stat.file?) then
          #puts "  File: #{f}"
          size = File.size(f_full)
          size_file[f] = size
          size_total += size
        end

        if(stat.directory?) then
          #puts "DIR= #{f}"
          if (@server.max <= 1) then
            puts " no threads."
            size = readdir(f_full)
            if (size > 0) then
              size_dir[f] = size
              size_total += size
            end
          else
            ok = @server.increment
            if (ok)
              puts " Threaded!"
              @kids << Slave.object(:async => true) {
                size = readdir(f_full)
                puts "size = #{size}"
                # Duh... return the size from the slave properly
                size
              }
            else
              #puts " no free threads, do anyway"
              size = readdir(f_full)
              if(size > 0) then
                size_dir[f] = size
                size_total += size
              end
            end
          end
        end
      end
    end
  end
end

@kids.each { |kid|
  size_total += kid.value
}

#Puts "Dir: #{dir} = #{size_total}"
return size_total

end

end

#---------------------------------------------------------------------

Read a directory and add to the database; this function is recursive

for sub-directories

#---------------------------------------------------------------------
def usage
puts
puts “usage: readdir-drb [–kids NUM] ”
puts " defaults to #{$max_kids} children"
puts
puts " version: #{$version}"
puts
end

#---------------------------------------------------------------------
def addpath(a, b)
return a + b if(a =~ //$/)
return a + “/” + b
end

#---------------------------------------------------------------------

Main

#---------------------------------------------------------------------
$DEBUG = true

opts.each do |opt,arg|
case opt
when “–kids”
$max_slaves = arg.to_i
else
usage
exit
end
end

if ARGV.length != 1
puts “Missing dir argument (try --help)”
exit 0
end

dir = ARGV.shift

Start the DRb service.

puts “Starting DRb on server”
DRb.start_service $URI, Counter.new($max_slaves)

Child

Fire up the first slave process which will do the work of readdir()

DRb.start_service
counter = DRbObject.new_with_uri $URI

puts “counter.count=#{counter.count}”

Fire up a new Kid Class readdir.

kid = ReadDir.new(counter)

Now let’s try to do a recursive readdir() algorith with threads.

size = kid.readdir(dir)

sizekb = size / 1024;
sizemb = sizekb / 1024;
sizegb = sizemb / 1024;

puts “”
puts “Total size: #{size} B”
puts “Total size: #{sizekb} KB”
puts “Total size: #{sizemb} MB”
puts “Total size: #{sizegb} GB”

John Stoffel wrote:

drbunix:///tmp/stoffj-test_17986
Probably not helpful, but I thought it was more normal to represent that
file uri as:

drbunix:/tmp/stoffj-test_17986

John Stoffel wrote:

Joel VanderWerf wrote:

John Stoffel wrote:

drbunix:///tmp/stoffj-test_17986
Probably not helpful, but I thought it was more normal to represent that
file uri as:

drbunix:/tmp/stoffj-test_17986

Hmmm, decent idea, but when I use this same URI in simpler,
non-recursive code, it works just fine. I’ll give it a whirl though and
let people know.

I was wondering if maybe I need to do some being rescue end blocks when
I open DRb sockets to catch system errors such as filled /tmp, etc?

Oh well… I’m getting back a consistent answer now, just lots of other
warnings and possible errors from DRb and the Slaves.

Cheers,
John

Well, I’ve got more information now. It struck me that the errors were
NOT from my DRb code, but from the DRb calls made by the ‘slave’ module
instead! So I pulled all my own DRb calls and re-configured my code to
just use ‘slave’ module I found here:

http://codeforpeople.com/lib/ruby/slave/

using version 1.2.1 on a CentOS 5.2 system, with ruby 1.8.5 (2006-08-25)
[i386-linux] so I suspect it’s possibly my Ruby version is too old, or
the Slave module isn’t robust enough.

I guess I’ll re-post my code with some new ideas and comments.

Thanks,
John

Joel VanderWerf wrote:

John Stoffel wrote:

drbunix:///tmp/stoffj-test_17986
Probably not helpful, but I thought it was more normal to represent that
file uri as:

drbunix:/tmp/stoffj-test_17986

Hmmm, decent idea, but when I use this same URI in simpler,
non-recursive code, it works just fine. I’ll give it a whirl though and
let people know.

I was wondering if maybe I need to do some being rescue end blocks when
I open DRb sockets to catch system errors such as filled /tmp, etc?

Oh well… I’m getting back a consistent answer now, just lots of other
warnings and possible errors from DRb and the Slaves.

Cheers,
John