I’m running a fairly complicated build and test system with DRb over
Ruby 1.8.6. It involves 12 Linux machines running several different
distro versions and one Windows machine.
Lately I’ve been having problems where once in awhile the machines
involved in this system just stop communicating, and I can’t figure out
why. I’ve found on occasion I can work around the problem by changing
the order of the operations or the frequency of them. It’s more or less
random when it occurs.
The only thing I can think of is that this all started when I added suse
9.3 and 9.4 machines to this system.
The other possibility is that now I have 12 Linux machines and a Windows
machine all more or less arbitrarily talking with each other, so there
might be a slowly increasing probability of a deadlock that I’m suddenly
noticing because it’s more likely with more machines.
I’m sitting here thinking of exotic ways TCP could be misconfigured out
of the box on suse 9. But deep in my soul I’m sure it’s some stupid code
Anyway, the idea here is that a Windows machine sends messages to
several Linux machines and the Linux machines send back log messages and
occasionally a series of messages that represent the contents of a file.
If anyone has insight, I’d appreciate it. I’m running out of good ideas