More gmsk issues

I switched to NFS for my testing and I can transfer about 600 KiB before
it
stalls for a while, but NFS seems to be good enough at re-trying over
cruddy
networks, so I was able to transfer a couple MB file and have the hash
come
out right, but that’s with a lot of thumb twiddling during times where
stuff
isn’t getting through sufficiently.

I’ve tweaked the MTU on both sides of the tunnel down to 200, which
seems to
help some

I’ve poked into the kernel settings for networking, but I’m afraid to
change
them so I don’t mess up my primary ethernet interface.

Anyone have any thoughts on improving the stability?

View this message in context:
http://www.nabble.com/more-gmsk-issues-tf3043203.html#a8459336
Sent from the GnuRadio mailing list archive at Nabble.com.

On Fri, Jan 19, 2007 at 03:30:02PM -0800, Brett Trotter wrote:

I’ve poked into the kernel settings for networking, but I’m afraid to change
them so I don’t mess up my primary ethernet interface.

Anyone have any thoughts on improving the stability?

What data rate are you using in the transmitter and receiver?
That is, what value are you specifying for -r ?

Eric

Brett Trotter wrote:

I switched to NFS for my testing and I can transfer about 600 KiB before it
stalls for a while, but NFS seems to be good enough at re-trying over cruddy
networks

Hello Brett!

You should be aware that you are using quite high level protocols when
using SSH or NFS. You are moving in OSI Layer 5 (Session).
Layer 1 would be PHY, the radio part.
Layer 2 would be MAC, physical addressing, media access (timing,
concurrent access).
Layer 3 would be IP, “logical” addressing, first error detection.
Layer 4 would be TCP/UDP, with subaddressing for both, retransmistion
and congestion control with TCP.
Layer 5 finally is your SSH/NFS.

You see an error in layer 5, which leaves lots of possibilities for
underlying errors or misconfigurations. In general TCP is quite capable
of recovering from lots of errors, but it’s busy with itself for this
purpose, which leaves less capacity for its payload. IP does not correct
errors, only detects (checksums). This leaves possible error correction
to layer 2 and 1. In layer 2 we have a mixture of Ethernet and a back
off algorithm, pretty like ethernet, but in Python code: the framing and
thus addressing is done by the operating system, media access is handled
by Python, using an exponential backoff when a carrier is sensed. Note:
AFAICS check for successful transmission is done. On Ethernet you have
Collision Detection, which leads to backoffs and retransmits. The
current tunnel implementation seems to just listen for traffic that
could jam the transmission before sending. As there is no reliable way
to detect concurrent transmissions while sending (in contrast to wire
bound Ethernet)[1] some kind of reservation like in WLAN or at least
acknowledge would be desirable. Layer 1 finally uses whatever modulation
you have available, without channel coding, like forward error
correction (FEC) with convolution codes or Reed-Solomon code.

This is quite a stack of systems with lots of places for errors. I’d
suggest to get some levels lower and analyse at MAC (L2) and network
layer (L3, IP). IP has checksums, so errors are very likely to be
reported by the IP stack in the kernel. Take a look at ‘iptraf’. As
presumingly ‘ifconfig’ and and ‘netstat -i’ show interface statistics,
that is of the MAC, you probably want to take a look at the packets
directly. As pointed out before, tcpdump get you the packets. I like
wireshark, formerly ethereal, for analysing data, but it needs X,
something that is not always accessible at test machines (especially
remote machines). nevertheless tcpdump saves data in a format that is
fine readable for wireshark. Take a look at the IP checksum under heavy
load. You may want to check some timing statistics too see if collision
has probably happened. TCP retransmits are a sign for connections that
have gone haywire.

Anyone have any thoughts on improving the stability?

FEC would be necessary, media reservation or at least acknowledging
would be fine.
Perhaps CRC at data link layer (L2) would bring some light in.
Emulab should have the right setup to test these things.

Patrick

[1] ALOHA as predecessor of Ethernet and being a radio system used
acknowledge messages and retransmits. See code at
http://typo3.cs.uni-paderborn.de/en/research-group/research-group-computer-networks/projects/gsr.html

Engineers motto: cheap, good, fast: choose any two
Patrick S.
Student of Telematik, Techn. University Graz, Austria

Some more notes… It’s been very late when writing this.

Patrick S. wrote:

In general TCP is quite capable
of recovering from lots of errors, but it’s busy with itself for this
purpose, which leaves less capacity for its payload. IP does not correct
errors, only detects (checksums). This leaves possible error correction
to layer 2 and 1.

Of course the application has to take care of errors. If some errors
survive the application has to cope. Some do not care at all, like video
and audio streams: If data is lost, no retransmission is tried. The
human brain has to interpolate missing information and thus do the error
correction.

In layer 2 we have a mixture of Ethernet and a back
off algorithm[…]
Note:
AFAICS check for successful transmission is done.
^^^^
Of course I meant “No check for successful transmission is done.”

Patrick

Engineers motto: cheap, good, fast: choose any two
Patrick S.
Student of Telematik, Techn. University Graz, Austria

So the bottom line question is why in the heck would a file transfer
over a
stateless protocol get frozen after a certain number of bytes when it
appears I can ping with large payloads indefinitely?

For the record, anything terribly deep in python or networking code is
beyond me- so aside from simple settings tweaks and
protocol/application
selection to garner stability, I’m probably out of my league.

No one can tell without looking at the data.

You really need to look at both sides with tcpdump. Packets that are
sent should, at least probably, arrive. It may be that there is some
bit pattern that results in them always not arriving, and you can
debug that at the MAC/PHY layer without understanding why the upper
layer is sending that pattern.


Greg T. [email protected]

Patrick S. wrote:

Of course the application has to take care of errors. If some errors
Of course I meant “No check for successful transmission is done.”

Patrick

Engineers motto: cheap, good, fast: choose any two
Patrick S.
Student of Telematik, Techn. University Graz, Austria

I appreciate all of your insightful comments, but here’s where I’m
confused.
NFS, FSP, and TFTP all use UDP and should have acknowledgement,
verification
and resend capability- and all of those seem to get tied up after X
bytes of
transfer, depending on MTU, bitrate, etc. It seems that for a given
setting,
I can only transfer a very specific number of bytes of data. Before,
during,
and after the transfer, I can ping with 1k payloads until the end of
time-
and despite the fact that FSP and TFTP don’t create a connection of any
kind, they get bogged down like SCP did when I was trying TCP
transports.

So the bottom line question is why in the heck would a file transfer
over a
stateless protocol get frozen after a certain number of bytes when it
appears I can ping with large payloads indefinitely?

For the record, anything terribly deep in python or networking code is
beyond me- so aside from simple settings tweaks and protocol/application
selection to garner stability, I’m probably out of my league.

View this message in context:
http://www.nabble.com/more-gmsk-issues-tf3043203.html#a8502777
Sent from the GnuRadio mailing list archive at Nabble.com.

Greg T. wrote:

No one can tell without looking at the data.

You really need to look at both sides with tcpdump. Packets that are
sent should, at least probably, arrive. It may be that there is some
bit pattern that results in them always not arriving, and you can
debug that at the MAC/PHY layer without understanding why the upper
layer is sending that pattern.

You’re welcome to have a look, here’s the tcpdump of both sides- I was
able
to transfer 325000 bytes of a 2mb file with FSP (I’ve gotten higher with
various settings)

http://webtrotter.com/10_0_0_1.txt and
http://webtrotter.com/10_0_0_2.txt

Note, I didn’t do a raw dump with the data contained, just -vv, but the
files are still ~200kb.

I didn’t see anything out of the ordinary- it just stalls. Again this is
with FSP set up with an infinite timeout. I didn’t ping during the
transfer
so it wouldn’t confuse the dumps, but I can ping before, after and
during as
stated previously.


View this message in context:
http://www.nabble.com/more-gmsk-issues-tf3043203.html#a8508530
Sent from the GnuRadio mailing list archive at Nabble.com.


Discuss-gnuradio mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio

You are experiencing IP fragmentation. This could explain why ping
still works.

Starting at id 32559, I can see both fragments in the .1 trace, but
only the first in the .2 trace. Look at
13:44:47.258668
in the second trace, and note that offset 0 is present but not offset
480. In the first trace at
13:44:46.815874
both fragments are there.

Is your transport protocol congestion friendly, meaning will it back
off when it doesn’t get acks? It seems to go down to 1 packet/second.

Could you set the MTU to avoid fragmentation, perhaps to 576 (ip
bytes)? Or try scping a file, which will/should adapt MSS to MTU

The last 2nd fragment (meaning offset 480) is id 32558, so it seems
that your system in getting in a state where the second fragment is
always dropped. Is there some queue which is always full or nearly
full and therefore when 2 back-to-back frames get sent the second is
always dropped? Using TCP may avoid this problem, since it will back
off more aggressively.

It may be that you are finding bugs in your system’s fragment
reassembly code.

Greg T. <[email protected]>