Threads

I’m rewriting some old Python scripts in Ruby. Just to learn a bit more.
I want to thread this code to check websites concurrently. Could somone
offer advice? What are reasonable limits on threads in Ruby?

ips.each do |ip|
    begin
        Timeout::timeout(5) do
            Net::HTTP.start(ip) do |site|
                data = site.get( '/' ).body.downcase
                if data.include?(minolta) and 

data.include?(pagescope)
puts printer + “\t” + ip
elsif data.include?(xerox) and
data.include?(printer)
puts printer + “\t” + ip
else
puts “website\t” + ip
end
end
end
# Don’t really care about exceptions just time outs mostly.
rescue Timeout::Error
puts “Timeout\t” + ip
rescue Exception
puts “General Exception\t” + ip
end
end

On Wednesday 25 October 2006 11:37, Brad T. wrote:

data.include?(pagescope)
rescue Timeout::Error
puts “Timeout\t” + ip
rescue Exception
puts “General Exception\t” + ip
end
end

i made a small example here:
http://pastie.caboo.se/19495

^ manveru

Michael F. wrote:

i made a small example here:
Parked at Loopia

^ manveru

Thank you for the example!

What are reasonable limits on threads? Say I’m scanning a class b
network (roughly 65K hosts). How would you break up the threads? I seem
to get too many execution expired errors if I have more than 500 hosts
in one thread. It seems to work best with 256 groups of 256 hosts each
or 254 if you exclude the 0’s and 255’s

What are reasonable limits when working with threads in Ruby? Any tips?

On Thursday 26 October 2006 10:10, Brad T. wrote:

What are reasonable limits on threads? Say I’m scanning a class b
network (roughly 65K hosts). How would you break up the threads? I seem
to get too many execution expired errors if I have more than 500 hosts
in one thread. It seems to work best with 256 groups of 256 hosts each
or 254 if you exclude the 0’s and 255’s

What are reasonable limits when working with threads in Ruby? Any tips?

No tips there… it always depends on the task on hand - just use what
works
for you :slight_smile:

On 26/10/06, Michael F. [email protected] wrote:

for you :slight_smile:

You might try using a MapReduce/DRb based approach instead of threads.
Have a look at Starfish (Lucas Carlson | Entrepreneur, Author, and Technology Executive) for ideas.

Farrel

On 10/24/06, Brad T. [email protected] wrote:

                if data.include?(minolta) and
    # Don't really care about exceptions just time outs mostly.

Looks like you’re trying to perform a network client operation
simultaneously across a lot of different servers. For a non-threaded
approach, look at the EventMachine library. Sync to the latest source
and
look at EventMachine::Deferrable. That should give you considerably more
scalability and performance than trying to solve this with threads. The
Deferrable pattern works like Python’s Twisted. If it doesn’t make sense
to
you, let me know and I can send you some sample code.

On 10/24/06, Brad T. [email protected] wrote:

I’m rewriting some old Python scripts in Ruby. Just to learn a bit more.
I want to thread this code to check websites concurrently. Could somone
offer advice? What are reasonable limits on threads in Ruby?

Here’s an EventMachine code sample that should do what you need. Notice,
this code is nonthreaded, but it still does all the HTTP GETs
simultaneously. Of course you’ll want to do something more interesting
in
the http.callback block.

#-------------------------------------

require ‘rubygems’
require ‘eventmachine’

$addrs = [
www.apple.com”,
www.cisco.com”,
www.microsoft.com
]

def scan_addr addr
http = EventMachine::Protocols::HttpClient.request(
:host => addr,
:port => 80,
:request => “/”
)

http.callback {|response|
puts response[:status]
puts response[:headers]
puts response[:content].length
}
end

EventMachine.run {
$addrs.each {|addr| scan_addr addr}
}

#-------------------------------------------------

On 10/26/06, [email protected] [email protected] wrote:

what would the preferrer way by to shared data from http.callback? does
it
need protection? how about sharing with ruby green threads?

I’m not exactly sure what you’re asking, Ara. What happens in this code
is
that the HTTP requests are fired off simultaneously, and as they
complete,
the callback gets called for each completion, always on the same thread.
(There are no additional green or native threads being spun here.) So
there’s no contention and no need for mutex protection. If you wanted
for
some reason to run this code simultaneously with unrelated code on other
threads, then of course you’d use the normal thread-safe procedures to
sync
this data with your other threads.

On Fri, 27 Oct 2006, Francis C. wrote:

If you wanted for some reason to run this code simultaneously with unrelated
code on other threads, then of course you’d use the normal thread-safe
procedures to sync this data with your other threads.

k - that’s the answer i was looking for.

cheers.

-a

On Fri, 27 Oct 2006, Francis C. wrote:

simultaneously. Of course you’ll want to do something more interesting in
www.microsoft.com
puts response[:status]
puts response[:headers]
puts response[:content].length
}
end

EventMachine.run {
$addrs.each {|addr| scan_addr addr}
}

what would the preferrer way by to shared data from http.callback? does
it
need protection? how about sharing with ruby green threads?

cheers.

-a