This problem has been bugging me for weeks, and driving me up the wall.
I cannot seen to get to the bottom of it.
The application is a chat room, for chat between a coach and a client.
Both have similar code. The coach has a few extra bits to help him ask
the right sort of questions, and update summary data, that the client
both. In both client and coach there are two HTTP message pairs going on
all the time.
Sending - POSTs the new messages to a PHP script, which replies with
200 and a zero length message (so the screen is not updated).
Receiving - uses GET from the subscribe address (push module), to
receive each message and them immediately requests the next message.
The PHP script logs each message and uses CURL to write to the publish
address, before it returns the empty result.
Thus you only get to see your own messages once they return to you via
the long polling module. The start-up code sends an “I have entered the
room” message, which is broadcast to both. The on-unload event sends an
“I’ve left” message, that is broadcast to any remaining people.
The problem is that sometimes we get a strange lop-sides set of
Coach logs in using Firefox (on one machine)
Client logs in using Chrome on the same machine. (We wish to demo
the software using screen sharing software).
Coach sends a message, which arrives on both windows.
wait 3 minutes (during which time the “Client has left” message
is not seen)
Client sends a message - which is logged and returns to him, but is
not seen by the coach.
Coach sends a message, which is logged and seen by both.
Client sends another message which is logged, and seen by the
client but not by the coach.
Refresh the client page, this will show the logged messages which
show the logging was correct.
It will also cause the client to “Enter the room” a second time,
without leaving it.
All will now work properly.
We have seen this lop-sides behaviour in reverse - the client is missing
messages sent by the coach, but receives his own back.
So far, we have only had reports of problems on the one machine. It is
behind a NAT firewall, which appears to be working properly.
We have checked for viruses.
Other information. Cannot create these problems on the test system, its
a “live-only” feature. Nginx was upgraded from nginx 0.7.67 to 1.0.6
recently, and problem appeared shortly afterwards. The push module is
0.69 in both builds. Cannot reproduce except on the one machine, which
is nearly new, and running Windows 7 Home edition. Server is running
Ubuntu 11.04 LTS (under XEN on live and VBox on test).
All ideas gratefully received.