Brett Trotter wrote:
I switched to NFS for my testing and I can transfer about 600 KiB before it
stalls for a while, but NFS seems to be good enough at re-trying over cruddy
You should be aware that you are using quite high level protocols when
using SSH or NFS. You are moving in OSI Layer 5 (Session).
Layer 1 would be PHY, the radio part.
Layer 2 would be MAC, physical addressing, media access (timing,
Layer 3 would be IP, “logical” addressing, first error detection.
Layer 4 would be TCP/UDP, with subaddressing for both, retransmistion
and congestion control with TCP.
Layer 5 finally is your SSH/NFS.
You see an error in layer 5, which leaves lots of possibilities for
underlying errors or misconfigurations. In general TCP is quite capable
of recovering from lots of errors, but it’s busy with itself for this
purpose, which leaves less capacity for its payload. IP does not correct
errors, only detects (checksums). This leaves possible error correction
to layer 2 and 1. In layer 2 we have a mixture of Ethernet and a back
off algorithm, pretty like ethernet, but in Python code: the framing and
thus addressing is done by the operating system, media access is handled
by Python, using an exponential backoff when a carrier is sensed. Note:
AFAICS check for successful transmission is done. On Ethernet you have
Collision Detection, which leads to backoffs and retransmits. The
current tunnel implementation seems to just listen for traffic that
could jam the transmission before sending. As there is no reliable way
to detect concurrent transmissions while sending (in contrast to wire
bound Ethernet) some kind of reservation like in WLAN or at least
acknowledge would be desirable. Layer 1 finally uses whatever modulation
you have available, without channel coding, like forward error
correction (FEC) with convolution codes or Reed-Solomon code.
This is quite a stack of systems with lots of places for errors. I’d
suggest to get some levels lower and analyse at MAC (L2) and network
layer (L3, IP). IP has checksums, so errors are very likely to be
reported by the IP stack in the kernel. Take a look at ‘iptraf’. As
presumingly ‘ifconfig’ and and ‘netstat -i’ show interface statistics,
that is of the MAC, you probably want to take a look at the packets
directly. As pointed out before, tcpdump get you the packets. I like
wireshark, formerly ethereal, for analysing data, but it needs X,
something that is not always accessible at test machines (especially
remote machines). nevertheless tcpdump saves data in a format that is
fine readable for wireshark. Take a look at the IP checksum under heavy
load. You may want to check some timing statistics too see if collision
has probably happened. TCP retransmits are a sign for connections that
have gone haywire.
Anyone have any thoughts on improving the stability?
FEC would be necessary, media reservation or at least acknowledging
would be fine.
Perhaps CRC at data link layer (L2) would bring some light in.
Emulab should have the right setup to test these things.
Engineers motto: cheap, good, fast: choose any two
Student of Telematik, Techn. University Graz, Austria