Odd result when attempting to use Mechanize in parallel with

I wrote a simple tool to iterate a network to try and find web servers
running on specific ports. We have a lot of devices & software with a
web UI, and I thought that this would be a handy way to find them,
and even tell what they are.

I thought this would be a handy coding project too, and a good way to
cut my teeth on Ruby threads, and build up some usage with
Mechanize.

BTW I am running this on Windows XP.

However my code is quite obviously executing this serially. Is there
something obviously wrong with my code below? (results after
code snippet). I am aware this could make my machine choke from
thread overkill, but I wanted to get it working in parallel first.
Perhaps Mechanize instances have some shared elements?

============================
require ‘mechanize’

threads = Array.new
puts “sweep of 153.200.72.* segment http ports”
(1…254).each do |ran|
threads << Thread.new(ran) { |r|
agent = WWW::Mechanize.new
agent.user_agent_alias = ‘Windows Mozilla’
ports = [80,8080]
ports.each do |p|
begin
page = agent.get(“http://153.200.72.”+r.to_s+“:”+p.to_s)
puts “153.200.72.”+r.to_s+“:”+p.to_s+" - “+page.title
rescue
puts “153.200.72.”+r.to_s+”:“+p.to_s+” - NOTHING"
end
end
}
threads.each { |aThread| aThread.join }
end

153.200.72.10:80 - NOTHING
153.200.72.10:8080 - NOTHING
153.200.72.11:80 - NOTHING
153.200.72.11:8080 - NOTHING
153.200.72.12:80 - NOTHING
153.200.72.12:8080 - NOTHING
153.200.72.13:80 - NOTHING
153.200.72.13:8080 - NOTHING
153.200.72.14:80 - NOTHING
153.200.72.14:8080 - NOTHING
153.200.72.15:80 - NOTHING
153.200.72.15:8080 - NOTHING
153.200.72.16:80 - NOTHING
153.200.72.16:8080 - NOTHING
153.200.72.17:80 - NOTHING
153.200.72.17:8080 - NOTHING

On Thu, 7 Dec 2006, Richard C. wrote:

However my code is quite obviously executing this serially. Is there
something obviously wrong with my code below? (results after
code snippet). I am aware this could make my machine choke from
thread overkill, but I wanted to get it working in parallel first.
Perhaps Mechanize instances have some shared elements?

============================

require ‘mechanize’

threads = Array.new

puts “sweep of 153.200.72.* segment http ports”

(1…254).each do |ran|
threads << Thread.new(ran) { |r|
agent = WWW::Mechanize.new
agent.user_agent_alias = ‘Windows Mozilla’
ports = [80,8080]
ports.each do |p|
begin
page = agent.get(“http://153.200.72.”+r.to_s+“:”+p.to_s)
puts “153.200.72.”+r.to_s+“:”+p.to_s+" - “+page.title
rescue
puts “153.200.72.”+r.to_s+”:“+p.to_s+” - NOTHING"
end
end
}
end

threads.each { |aThread| aThread.join } # THIS MUST BE OUTSIDE THE
LOOP!

fyi. starting a thread, and then immediately joining it is the same as
not
using a thread at all!

another fyi - threads are io (even socket io) is a dealy combination on
windows. run this on linux/mac if possible.

regards.

-a

On 12/7/06, [email protected] [email protected] wrote:

threads.each { |aThread| aThread.join } # THIS MUST BE OUTSIDE THE LOOP!

*d’Oh

fyi. starting a thread, and then immediately joining it is the same as not
using a thread at all!

Ah yes, cutting & pasting a line too high …

another fyi - threads are io (even socket io) is a dealy combination on
windows. run this on linux/mac if possible.

Has to be windows, but this isn’t mission critical code - just a
development tool that may eventually post the results to a wiki or
something. I can break this
up a bit so it doesn’t kill my laptop later.

regards.

Thanks. I knew it had to a WTF.

On 12/12/06, Shiwei Z. [email protected] wrote:

thread exits or until limit seconds have passed".
“.new” doesn’t only mean “creates”, it means both “creates” and
“runs”. So “.new” can make son threads run in parallel. And “.join”
needs to wait for the exit of the called thread, so it gives you the
illusion that the theads are running serially, but in fact “.join” just
wraps up the threads. It is inappropriate for us to say whether “.join”
is making threads run in parallel or serially. We can say “.join” is
serially waiting for the exits of threads that might be already running
in parallel. :slight_smile: :slight_smile:

This is what I noticed. I join up 5 threads at a time, the output jumps
up in batches of 5. This does slow down the algorithm, especially
if there is a lot of positive results - most of these threads are
waiting for the http
connection to timeout.

But I run this thing at night anyway.

As an aside, I have had difficulty getting more than ~ 5 joined threads
to work at all in windows.

Hi, Richard,

     Actually in Ruby, only by the method ".new" we can make threads

run in parallel rather than serially. And I think it can meet your
requirement, pls see the programs<multithreads_ProbingHttp.rb> I post at
the end of this mail, plus the running results.
Firstly pls notice the following points: 1) The method “.new”
means “Creates and runs a new thread to execute the instructions given
in block”. 2) The method “.join” means “The calling thread will suspend
execution and run the called thread. Does not return until the called
thread exits or until limit seconds have passed”.
“.new” doesn’t only mean “creates”, it means both “creates” and
“runs”. So “.new” can make son threads run in parallel. And “.join”
needs to wait for the exit of the called thread, so it gives you the
illusion that the theads are running serially, but in fact “.join” just
wraps up the threads. It is inappropriate for us to say whether “.join”
is making threads run in parallel or serially. We can say “.join” is
serially waiting for the exits of threads that might be already running
in parallel. :slight_smile: :slight_smile:
//////////////////////Programs
multithreads_ProbingHttp.rb/////////////////////////////////////////////////////////////////////////////////////////////////////////
require ‘mechanize’
threads = Array.new
ports = [80,8080];
puts “sweep of 192.168.1.* segment http ports.\nwaiting for results:”
(40…51).each do |ip|
add=“http://192.168.1.”+ip.to_s;
ports.each do |p|
addr=add+“:”+p.to_s;
threads << Thread.new(addr){|addr|
agent=WWW::Mechanize.new;
agent.user_agent_alias = “Windows Mozilla”;
begin
page = agent.get(addr);
puts addr+" - “+page.title;
rescue
puts addr+” - NOTHING";
end
}
end
end
sleep 10;

If the main thread exits earlier than the newly created threads, we

might not see the results output by the newly created threads. So we let
the main thread wait for
#some seconds (say, 10s), in order that the newly created threads can
end firstly.
#I don’t use “.join” method here.
puts “finished.”
//////////////////////////Running Results, which can state the son
thread were running in parallel rather that serially://////////
D:\BasicPjt>ruby multithreads_ProbingHttp.rb
sweep of 192.168.1.* segment http ports.
waiting for results:
http://192.168.1.51:80 - shiwei apache homepage.
http://192.168.1.44:80 - under construction
http://192.168.1.48:8080 - ScrumWorks
http://192.168.1.48:80 - Test Page for Apache Installation
http://192.168.1.43:80 - NOTHING
http://192.168.1.41:80 - NOTHING
http://192.168.1.47:80 - NOTHING
http://192.168.1.40:80 - NOTHING
http://192.168.1.41:8080 - NOTHING
http://192.168.1.40:8080 - NOTHING
http://192.168.1.49:80 - NOTHING
http://192.168.1.47:8080 - NOTHING
http://192.168.1.44:8080 - NOTHING
http://192.168.1.51:8080 - NOTHING
http://192.168.1.50:8080 - NOTHING
http://192.168.1.50:80 - NOTHING
http://192.168.1.49:8080 - NOTHING
http://192.168.1.43:8080 - NOTHING
finished.
D:\BasicPjt>

Shiwei,
The views expressed are my own and not necessarily those of Oracle and
its affiliates.