I have a program that uses threads to quickly check class B networks
(65,536)
hosts for public web servers. It works great. I’d like to check for
other
servers as well. Basically, I’ve got the hosts threaded. Now, I’d like
to
thread the ports so that while the hosts are being probed concurrently
that
ports on the hosts could be probed concurrently as well. It might look
like
this:
A Thread
host
A Thread
port 80
port 443
port 25
host
A Thread
port 445
port 139
host
A Thread
…
I can demonstrate actually code (with only threaded hosts) if that would
be
helpful, but I’d rather keep it abstract and discuss how to handle
threads
within a thread… hope that makes sense.
Since you’re writing this in Ruby, I have to suggest that you just write
this single-threaded. Ruby uses green threads, meaning it handles it’s
own
threading and not the system, leading to on average decreased execution
time
vs single-threaded.
That said, there should be nothing stopping you from having threads
starting
up other threads. However, said new threads will not be contextually
constrained in any way (there is no Thread Hierarchy, as you seem to be
looking for). You’ll just have to make sure that the Port threads are
given
the hostname and port to watch for before letting them go.
Since you’re writing this in Ruby, I have to suggest that you just
write
this single-threaded. Ruby uses green threads, meaning it handles
it’s own
threading and not the system, leading to on average decreased
execution time
vs single-threaded.
If you were talking about CPU bound jobs that might be true, but
probing networks has lots of inherent I/O latency. Green threads should
be just fine for this sort of thing as you are basically waiting on
various packets to return from the probed hosts.
If you were talking about CPU bound jobs that might be true, but
probing networks has lots of inherent I/O latency. Green threads should
be just fine for this sort of thing as you are basically waiting on
various packets to return from the probed hosts.
Gary W.
I’ve found Ruby to work quite well for this. I can process 65,536 hosts
in about
15 minutes in Ruby or Python. They both take about the same amount of
time.
That’s probing only one port.
What’s the nature of the probe that you’re doing? A TCP connect on a given
port? If so, look at EventMachine (on Rubyforge). You may be able to
significantly improve on your runtime.
That’s the extent of it (TCP connect). I tried EventMachine, but I found
plain
old threads to be simpler for me to use. I could not get my head wrapped
around
EventMachine. Can’t teach an old dog new tricks
I have a program that uses threads to quickly check class B networks (65,536)
hosts for public web servers. It works great. I’d like to check for other
servers as well. Basically, I’ve got the hosts threaded. Now, I’d like to
thread the ports so that while the hosts are being probed concurrently that
ports on the hosts could be probed concurrently as well. It might look like
this:
I can demonstrate actually code (with only threaded hosts) if that would be
helpful, but I’d rather keep it abstract and discuss how to handle threads
within a thread… hope that makes sense.
Basically there is no such thing as a thread within a thread. You can
start threads from a thread - actually that’s the only way since the
main program is run in a thread as well.
In your case I’d probably do not want to have one thread per port per
host. The reason being that the overhead of a thread is not
insignificant and you can generate a huge amount of threads that way.
I would rather use EventMachine (as mentioned, would be a great
opportunity to learn it) or a fixed number of threads like in a typical
farmer worker scenario. Basically you create N threads and feed them
tasks via a queue. A task would be to check one port on one host. The
advantage is that you can control concurrency and find out the optimal
level. Another advantage is that you save the overhead of multiple
thread creations and destructions.
I would rather use EventMachine (as mentioned, would be a great
opportunity to learn it) or a fixed number of threads like in a typical
farmer worker scenario. Basically you create N threads and feed them
tasks via a queue. A task would be to check one port on one host. The
advantage is that you can control concurrency and find out the optimal
level. Another advantage is that you save the overhead of multiple
thread creations and destructions.
A quick, naive EM implementation of a part of this problem would be
something like the following. (NB, there are no threads being spun
anywhere
here).
On Thu, Feb 22, 2007 at 04:33:15AM +0900, Gary W. wrote:
If you were talking about CPU bound jobs that might be true, but
probing networks has lots of inherent I/O latency. Green threads should
be just fine for this sort of thing as you are basically waiting on
various packets to return from the probed hosts.
Actually, (appealling to authority here ) I believe Francis has
stated in the past that the Thread overhead kills the advantage for even
IO bound tasks (in Ruby specifically) and that a select loop is better.
I can demonstrate actually code (with only threaded hosts) if that would
be
helpful, but I’d rather keep it abstract and discuss how to handle threads
within a thread… hope that makes sense.
What’s the nature of the probe that you’re doing? A TCP connect on a
given
port? If so, look at EventMachine (on Rubyforge). You may be able to
significantly improve on your runtime.
Actually, (appealling to authority here ) I believe Francis has
stated in the past that the Thread overhead kills the advantage for
even
IO bound tasks (in Ruby specifically) and that a select loop is
better.
I’d definitely defer to Francis on this but, under the hood, Ruby
uses select to multiplex IO from multiple threads so I’d be
surprised if a single Ruby thread using Kernel#select explicitly
would do better than the builtin multiplexing.
I suppose it depends what kind of IO you are talking about. Disk
I/O is going to be faster than network IO. I would be really
surprised to find that Ruby’s green threads overhead is high enough
to swamp the effects of network I/O.
Gary W.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.