Big Work Loads & Concurrency

I am just learning Ruby and planning a to use it for a network
monitoring system. I would like to design a system that will scale well
and handle large work loads ( tens of thousands of monitors per server
). My question is about concurrency. I am looking at threads and have
fiddled with them a bit. Ruby makes threading pretty straight forward.
Now I’m trying to figure how to best use them. The design goal is to
have a server host tens of thousands of monitors in the most efficient
way. Monitors will need to be executed at specific and varying
intervals. (i.e. 1000 monitors once a minute / 2000 monitors once every
three minutes / 5000 monitors once every 5 minutes / 10,000 monitors
once every 10 minutes, etc.)

This is a problem that I’m sure has already been solved. I would
appreciate any suggestions that the group can offer regarding a proven
design to tackle this goal.

Thanks, Don

2006/5/5, Don S. [email protected]:

once every 10 minutes, etc.)

This is a problem that I’m sure has already been solved. I would
appreciate any suggestions that the group can offer regarding a proven
design to tackle this goal.

Hi Don,

with these amount of tasks it’s inefficient to have a thread per
monitor. Things like this are typically tackled with a light weight
processing framework; you have a pool of threads that get feed tasks
via a thread safe queue (tomcat does it’s request processing similarly
although with a bit more whistles and bells). Ruby comes with
everything you need for that apart from a scheduler maybe (but better
check the RAA). You then “just” have to glue pieces together. I found
Doug Lea’s book very valuable (although it’s written for Java the
basic principles and strategies are the same):

http://g.oswego.edu/

HTH

Kind regards

robert

Depending on what you’re trying to do, a single-threaded approach may
give
you more scalability and performance:
http://rubyforge.org/projects/eventmachine

Francis,

I took a look at EventMachine. It looks very intersting. But it seems
that it’s primary use is to abstract networking. What are your thoughts
on how this could be leveraged for use in the monitoring service I
describe above (i.e. manage a large, dynamic queue of jobs and run them
at specific intervals).

Thanks! - Don

Francis C. wrote:

Depending on what you’re trying to do, a single-threaded approach may
give
you more scalability and performance:
http://rubyforge.org/projects/eventmachine

Thanks guys. I’ll look into both of these suggestions. I really
appreciate tips!

  • Don

Don, I thought about it a bit more. Your original question was how to
achieve concurrency. So there are evidently latencies in your system
that
you’d like to capture, and my question is: how do these arise? If they
come
from the network, then a solution like EventMachine is ideal. If they
are
compute-bound, then you should also look at EventMachine, and you should
not
be looking at a threaded approach unless you’re able to specify
multi-processor hardware. If the latencies are coming from elsewhere in
your
local system (such as disk i/o), then you should look at thread pools.

I hope that’s helpful. Best of luck
-f

The idea behind using a single-threaded engine (“reactor model”) in your
application would be this: Each of your tasks is implemented in nothing
more
than a instance of a Ruby class (that you write). Rather than asking you
to
maintain a thread pool and schedule each chunk of work onto the next
available thread, EventMachine just calls methods on your objects
whenever
it’s time to do some work. It definitely does have the ability to fire
requests into your objects periodically, based on timers that you set
up.
The reason to use this approach instead of thread pools is that it can
be
far faster and more scalable. The downside is that your network-protocol
handling may be a little more complicated.

Since you’re talking about monitoring a large number of external
entities
(processes? systems? users?), my next question is: what network protocol
will be used? We’ve already implemented EventMachine protocol handlers
for
HTTP/S, LDAP, SMTP, SIP, and a few others.

Excellent! It sounds like EventMachine may do exactly what I’m needing.
That’s great news. Looks like I’ll need to dig into it. Thanks!

Yes, I will be monitoring the most common network services (HHTP/S,
SMTP, POP3, DNS, IMAP, etc.) on remote systems. I’ll also want to track
things like connect times and transaction response times I haven’t
started coding the monitor classes yet, as I’m just now trying to
formulate a design. It sounds like you may have done some of the work
for me! If I understand it correctly, I should be able to build my
monitor classes implementing your existing protocol handlers and create
my own protocol handlers for those that don’t exist.

To your latency question: I hope I understand your question correctly.
I think most of the latency will come from the network. For example, if
there are four monitors that need to be run in the current 60 second
window, then I need to run them simultaneously. If they are run in
serial then monitors later in the queue could get stuck behind a long
running monitor. The polling server (the system running the monitors)
can’t get stuck on a single monitor for an extended period of time since
that would cause all work to stop. How would EventMachine handle this
without running each active monitor in it’s own worker thread?

One the issue of scaling: I would like to use a shared queue that would
allow additional polling servers to be added for scaling out. Any itial
thoughts on how EventMachine can fit into this model?

I hope I’m not getting in over my head! It seems pretty complex for
such a simple mind. :wink:

Thanks for all the help! - Don

Ok, let’s see if I understand you. (I certainly hope we’re not going
into
inappropriate territory for this list with a discussion of a particular
system design!)

You have a lot of network servers (many different protocols) floating
around
your network, and you want to periodically send client requests to each
of
them and measure the response times. (And of course send you alerts when
they don’t respond.)

One way to do this would be to tell EventMachine to kick off one request
for
each of the monitored servers every minute (for example). That’s a
simple
matter of instantiating a bunch of objects. The objects send their
requests,
then wait for the responses, and then possibly talk to some singleton or
database connection somewhere. The objects die off by themselves when
the
protocol-conversation completes or they time out. EventMachine manages
all
of this by calling methods in your objects whenever timers expire or you
send data on them or they receive data from the network. So the network
drivers are working away in the kernel while you’re processing each
request.
To start it off, you call EventMachine’s #run method, and it does
everything
else. If you want to do other work on Ruby threads while the event
machine
runs, that’s ok too. The system will be live and concurrent as long as
it
doesn’t take you an inordinate amount of time to process each response
(you’re probably just timing them, so no problem).

I’m happy to help you out if you want to give it a shot. Send me a
private
email so we don’t pollute this list :wink: