Synchronize a "mocked" clock in a distributed system

I’ve been banging on a problem for a few days now and don’t feel any
closer to solving it. I’m hoping some of the big brains on the ruby ML
can shed some light. Following are a few paragraphs with a brief system
overview before I state the problem. I apologize in advance for this
question being only tangentially related to Ruby the language. :slight_smile:

I have written a distributed message passing system (in Ruby!) for doing
some mathematical simulation work. Each component of the system does a
very specific job. Each component may run on any of 3 distinct machines
on a LAN. Components communicate with each other using the 0mq “socket”
library to pass messages on well-defined ports that all components know
about (hard-coded information instead of a dynamic lookup via a “service
directory” mechanism).

The entire system is akin to a distributed state machine. I poke a
command into it from the outside and it sets off a cascade of events
which in turn generate more events until eventually I have my answer.
Some the events have timeouts or other time-based characteristics
associated with them. Also, some of the returned data has time-based
characteristics (e.g. a timestamp) which impacts the transitioning of
the state machine. It’s all working quite nicely in real-time.

My problem is mocking out the time source so that I can run simulations
in faster than real-time. For example, I may send a request for a data
record and give it a 5 second timeout. This works fine when the clock
source is the actual operating system, but if I want to run faster than
real-time I need to mock the clock out. That is, I want to take a
simulation that might run in 4 hours real-time (with lots of waiting or
other timer related delays) to run in 20 minutes because 1 second of
simulation time is only a fraction of a second in the real world.

This is simple to do for a single component on a single system because I
can intercept all calls to Time and replace it with my own source.
However, I don’t know how to get all of the distributed components
(across multiple machines or multiple processes on one machine) to use a
mocked clock.

I tried googling around for answers, but all of the papers appear to be
concerned with adjusting clock skew across a network where each device
already has a local time source. I don’t know if those solutions apply
here.

Anyone have any bright ideas? Need more information?

cr

On 01.07.2010 23:10, Chuck R. wrote:

This is simple to do for a single component on a single system
Anyone have any bright ideas? Need more information?
A very simplistic solution would be to use DRb and have a centralized
clock. Depending on the number of clients this may of course turn out
as a bottleneck. In that case you would have to devise a more complex
mechanism.

Maybe looking at time protocols such as NTP might give you some
inspiration. Basically you want to solve the same problem, just with a
different time source (I don’t think that a mocked NTP server will work
because that needs local clocks with a particular precision.

Another option might be UDP broadcast with the “current time” - if
network latency as precision is good enough. If not, again you need a
more complex mechanism (see time protocols).

Kind regards

robert

On Thu, Jul 1, 2010 at 3:10 PM, Chuck R. [email protected]
wrote:

and give it a 5 second timeout. This works fine when the clock source is the
actual operating system, but if I want to run faster than real-time I need
to mock the clock out. That is, I want to take a simulation that might run
in 4 hours real-time (with lots of waiting or other timer related delays) to
run in 20 minutes because 1 second of simulation time is only a fraction of
a second in the real world.

It sounds like the way you’ve written your program is time-dependent, or
as
ChucK (the music language) would describe it “strongly timed”

Right off the bat my initial advice would be to eliminate the need for a
central clock in your system and make it fully asynchronous. Creating
“strongly timed” synchronized distributed systems is rather non-trivial.

On Jul 1, 2010, at 6:43 PM, Tony A. wrote:

It sounds like the way you’ve written your program is time-dependent, or as
ChucK (the music language) would describe it “strongly timed”

Right off the bat my initial advice would be to eliminate the need for a
central clock in your system and make it fully asynchronous. Creating
“strongly timed” synchronized distributed systems is rather non-trivial.

Yes, I suppose it is strongly timed. I didn’t realize that was going to
be such a problem.

Right now it is completely asynchronous when running across multiple
nodes. Each machine’s clock is NTP synched so it just does the “right
thing” when it runs in real-time. This notion of strongly timed doesn’t
rear its ugly head until I try to replace the clock.

I’m going to try to broadcast a clock pulse or heartbeat to all
components. I can set it up so that each component uses the real clock
when no clock pulse message has been received but switch over to the
mocked clock when it sees the first clock message. Hopefully the
delivery latencies don’t cause too much trouble by skewing the time
between components.

I’ll try it and see. Thanks to all for the suggestions.

cr

Chuck R. wrote:

Could you setup a mock NTP time source that supplies “fast” time to its
clients then configure each machine to use the mock NTP and update very
frequently? This may not be practical and would certainly not work if
the machines are being used for anything except your tests.

On Mon, Jul 5, 2010 at 7:44 PM, William R. [email protected]
wrote:

Could you setup a mock NTP time source that supplies “fast” time to its
clients then configure each machine to use the mock NTP and update very
frequently? This may not be practical and would certainly not work if the
machines are being used for anything except your tests.
I have heared that being killed by a sysadmin is a terrible fate :wink:

Cheers
R.

On Fri, Jul 2, 2010 at 12:10 AM, Robert K.
[email protected] wrote:

A very simplistic solution would be to use DRb and have a centralized clock.
Depending on the number of clients this may of course turn out as a
bottleneck. In that case you would have to devise a more complex mechanism.
Hmm would a messaging based time mocking server be faster? I say that
because that was my idea but I feel that Drb is easier to integrate.
Cheers
R

On Jul 5, 2010, at 2:12 PM, Robert D. wrote:

On Mon, Jul 5, 2010 at 7:44 PM, William R. [email protected] wrote:

Could you setup a mock NTP time source that supplies “fast” time to its
clients then configure each machine to use the mock NTP and update very
frequently? This may not be practical and would certainly not work if the
machines are being used for anything except your tests.
I have heared that being killed by a sysadmin is a terrible fate :wink:

The idea of using a hacked NTP daemon to speed up the clocks in not
feasible. Interesting idea though…

cr

On Mon, Jul 5, 2010 at 2:06 PM, Chuck R. [email protected]
wrote:

The idea of using a hacked NTP daemon to speed up the clocks in not
feasible. Interesting idea though…

Why can’t the “central time” be maintained by whatever process is
scattering
work to your distributed nodes, and just asynchronously included in the
messages for use whenever your workers get around to processing them?

On Jul 5, 2010, at 3:21 PM, Tony A. wrote:

On Mon, Jul 5, 2010 at 2:06 PM, Chuck R. [email protected] wrote:

The idea of using a hacked NTP daemon to speed up the clocks in not
feasible. Interesting idea though…

Why can’t the “central time” be maintained by whatever process is scattering
work to your distributed nodes, and just asynchronously included in the
messages for use whenever your workers get around to processing them?

Because there is no centralized server that all messages, data or
control must pass through.

cr

On Mon, Jul 5, 2010 at 3:15 PM, Chuck R. [email protected]
wrote:

Because there is no centralized server that all messages, data or control
must pass through.

If your system is fully asynchronous and there’s no central data source,
how
is it possible for nodes to synchronize to a central clock? That makes
absolutely no sense.

Cool, glad I could help

On Jul 6, 2010, at 12:52 PM, Tony A. wrote:

On Mon, Jul 5, 2010 at 3:15 PM, Chuck R. [email protected] wrote:

Because there is no centralized server that all messages, data or control
must pass through.

If your system is fully asynchronous and there’s no central data source, how
is it possible for nodes to synchronize to a central clock? That makes
absolutely no sense.

I wrote a long email describing why I thought I was right, but I kept
coming back to your earlier question about a centralized data source.
The problem I have with my data source is that the documents within it
have different time granularities for the data. For example, some
documents represent data aggregated over 1m, 1 day or 1 week. Since
documents of each time granularity may be requested by various
processes, I didn’t see how I could use them as a source for the mock
clock.

And then it hit me. I could have a mock clock process that subscribes to
all of those data sources and receives all of those messages. The mock
clock should only pay attention to the document data with the smallest
time granularity for setting the clock and ignore the rest.

So yes, you are right. I do have a central data source that I can use
to set the clock. I just didn’t see it before.

Thanks for pressing me on this. It forced me to really figure it out.

cr