8b/10b for GMSK

Brett_Trotter · February 4, 2007, 11:33am

After coming up with some scripts to generate thousands of 200-byte
random
files and attempt to transfer them over a GMSK link, I isolated 5 of
10,000
‘files’ that cause the packet to be dropped (when the MTU is big enough
to
hold the entire 200 bytes- which should be all the time)- most likely
due to
repeated zeros or ones that cause a loss of sync. Upon debugging the
output
after the whitener, it does appear that there are instances of around
12-zeros in the packets that don’t work.

It seems that a reasonably simple solution would be to replace or
augment
the whitener with an 8b/10b encoding (per Widmer/Franaszek’s
5b/6b+3b/4b)
which also would allow for a beginning and end of packet comma symbol
for
synchronization. The 8b/10b could replace the whitener completely, but I
read that passing it through the whitener and then the 8b/10b yields
better
spectral properties/dc balance.

I found dozens of Verilog/VHDL implementations by googling, but none in
software. I computed some tables and was part way into making some code
but
am at a couple of question marks that need to be answered.

If I do somehow finish this (I’m not that great at Python just yet,
though am very familiar with other languages- so I can definitely finish
with the help of google), would this be something that would be
committed to
the tree (eg- do Matt/Eric/Johnathan, etc think this is the appropriate
solution)
if (1), what is the bit and byte order over the wire for GMSK so that
I
can make sure I formulate the characters/packets in the right order.
if (1), someone more familiar could probably implement this in
minutes or
perhaps a couple of hours, whereas it’ll take me the better part of a
day or
two- is anyone interested in helping me reduce the number of question
marks
and think through doing this correctly?

Another viable option might be applying the TMDS method (XOR/XNORing)
creatively to this medium. The way I read it, the clock signal is only
for
frequency reference so I don’t see a reason why we cant use TMDS on a
clock-less medium if we don’t care about blanking periods, etc.

Please let me know if I’m off my rocker or if I’m pointing in the right
direction.

I look forward to your thoughts/comments.

View this message in context:
http://www.nabble.com/8b-10b-for-GMSK-tf3169223.html#a8791437
Sent from the GnuRadio mailing list archive at Nabble.com.

Brett_Trotter · February 4, 2007, 8:25pm

On Sun, Feb 04, 2007 at 02:32:30AM -0800, Brett Trotter wrote:

After coming up with some scripts to generate thousands of 200-byte random
files and attempt to transfer them over a GMSK link, I isolated 5 of 10,000
‘files’ that cause the packet to be dropped (when the MTU is big enough to
hold the entire 200 bytes- which should be all the time)- most likely due to
repeated zeros or ones that cause a loss of sync. Upon debugging the output
after the whitener, it does appear that there are instances of around
12-zeros in the packets that don’t work.

What’s the maximum run length that you see in packets that do work?
My understanding from talking to Johnathan is that he has some
packets that work that have 17-bit runs in them (post-whitener).
(Not sure if they’re ones or zeros.)

This makes me think that the 8b/10b “solution” is premature, given
that the problematic symptom is not yet nailed-down.

In summary, create the smallest test case you can that reproduces the
problem. When you’ve got that under control, figure out the next
step.

I suspect that some adjusting of control loop constants may be in order.

Eric

Brett_Trotter · February 4, 2007, 9:51pm

Eric B. wrote:

What’s the maximum run length that you see in packets that do work?
My understanding from talking to Johnathan is that he has some
packets that work that have 17-bit runs in them (post-whitener).
(Not sure if they’re ones or zeros.)

I’ve checked into the developers/jcorgan/digital branch a program,
run_length.py, that will read a binary file and output the statistics
for runs of similar bits (either all zeros or all ones):

$ dd if=/dev/urandom of=rand.dat bs=1500 count=1
1+0 records in
1+0 records out
1500 bytes (1.5 kB) copied, 0.000451 seconds, 3.3 MB/s

$ ./run_length.py -f rand.dat
Using rand.dat for data.
Bytes read: 1500
Bits read: 12000

Runs of length 1 : 2994
Runs of length 2 : 1494
Runs of length 3 : 703
Runs of length 4 : 384
Runs of length 5 : 214
Runs of length 6 : 104
Runs of length 7 : 42
Runs of length 8 : 31
Runs of length 9 : 7
Runs of length 10 : 5
Runs of length 11 : 0
Runs of length 12 : 2

Sum of runs: 12000 bits

Maximum run length is 12 bits
$

So if you capture the the input to the modulator as a binary file, you
can use this to probe the statistics of packets that fail and ones that
succeed in your test cases. Of course, you need to capture the packet
after it has been whitened, not the input payload from a file being
sent.

In the same developer branch, I’ve added the --from-file option to
benchmark_tx.py. This allows you to transmit a continuous stream of
packets in one direction, the contents of which are read from the given
file. This eliminates the entire network stack and any bi-directional
traffic issues from the picture. You use benchmark_rx.py on the
receiving end to determine when a receive CRC error occurs. Now
normally you’ll get some of these due to noise, but if you run the
benchmark_tx.py test over and over again and the same packets fail on
receive, then you’ve identified the specific packet numbers whose
contents seem to trigger the issue people have been seeing.

I’ll be adding a command-line parameter to the benchmark_tx.py to cause
the whitened packet data to be logged to a file suitable for use with
run_length.py, so this whole process can be automated. (Not done yet.)

In general this issue “smells” like a pattern-specific failure in
receiver synchronization, which are usually run length related, yet
right now I have a failure case with a 12-bit run and a success case
with a 17-bit run, so it’s not absolutely clear that this is the issue,
or not entirely.

The suggestion to use 8b/10b line coding is an interesting experiment
that could be run in parallel to other testing, but it’s not clear yet
that it would be attacking the right problem. The 25% performance
penalty up front from such a line coding technique is a high price to
pay without conclusive evidence it’s not just shifting the problem
elsewhere in the stream.

There is an alternative that Eric and I have conceived that would be a
temporary workaround. It would not solve the original problem but would
at least allow upper level protocols that do re-transmission to recover
from the failure. I’ll talk about that once I’ve got it coded, tested,
and into my developer branch.

–
Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com

Brett_Trotter · February 5, 2007, 6:27pm

Johnathan C. wrote:

form the whitened data for submission to the modulator.

underlying protocol for the majority of users.

…which is based on a snapshot of the trunk as of a couple days ago.

We haven’t decided whether this will make it into the trunk. We’d much
rather make a real fix.

Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com

Yay- congratulations- transferred 100MB successfully and all of my
‘stumbler’ packets- even with UDP.

I’m going to continue work on the 8b/10b, because I still think that’s
the closest ‘real’ solution to the issue- and the overhead isn’t as
severe as repeating 1500 byte packets- especially when you consider that
ethernet and many other mediums use this specific implementation of
8b/10b. Worst case, it can be an option in the python that most people
can leave turned off.

FYI: Am achieving nearly 100kb/s with -r 800k on a BasicTX

Brett_Trotter · February 5, 2007, 8:59am

Johnathan C. wrote:

There is an alternative that Eric and I have conceived that would be
a temporary workaround. It would not solve the original problem but
would at least allow upper level protocols that do re-transmission to
recover from the failure. I’ll talk about that once I’ve got it
coded, tested, and into my developer branch.

This has been implemented, checked into a developer branch, and
successfully tested.

A new command-line parameter has been added to both tunnel.py and
benchmark_tx.py, --use-whitener-offset. It defaults to false so existing
behavior is unchanged.

A new 4-bit field has been added to the existing packet header, above
the 12 bits already used for the length field. This field holds an
integer (range 0-15) that represents an offset value.

When transmitting, this value (which defaults to zero), determines the
offset into the whitening array which is XORed with the payload data to
form the whitened data for submission to the modulator.

When the --use-whitener-offset option is set, this offset is
incremented for each transmitted packet, and stored in the new 4 bit
field. Thus, even identical packets, when transmitted successively,
result in completely different “on the air” data.

The receiver extracts the offset value from the header and uses it to
recover the original data. In the case where the offset option is not
used, the offset is always set to zero, so the receiver behavior is then
unchanged.

If a received packet fails CRC because of some pattern-specific
synchronization problem, and if the upper protocol layers cause a
retransmission, then the re-transmitted packet will have a different
whitened bit pattern, allowing it to go through.

While this doesn’t solve the underlying problem, whatever that may be,
it does give a workaround for users who have to get past this issue
while we are debugging it. Unfortunately this workaround is only for
people who are using higher level protocols which use retransmission to
recover from link errors. This of course includes TCP, which is the
underlying protocol for the majority of users.

I’ve verified that my prior failure test cases now “work”, in the sense
that the upper layers just continue as if the bad packets were dropped
due to noise. This includes ssh, scp, ftp, and https, each of which I
could reliably make to fail with the old code and now succeed with the
new workaround.

The code is in the developer branch:

http://gnuradio.org/svn/gnuradio/branches/developers/jcorgan/digital

…which is based on a snapshot of the trunk as of a couple days ago.

We haven’t decided whether this will make it into the trunk. We’d much
rather make a real fix.

Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com

Brett_Trotter · February 5, 2007, 7:16pm

Brett L. Trotter wrote:

Yay- congratulations- transferred 100MB successfully and all of my
‘stumbler’ packets- even with UDP.

You’re using FSP (which goes over UDP), correct? FSP has a retry
mechanism, so good. Glad it’s working now!

I’m going to continue work on the 8b/10b, because I still think
that’s the closest ‘real’ solution to the issue- and the overhead
isn’t as severe as repeating 1500 byte packets-

Well, only the “stumbler” packets get repeated, and as you found, there
were only five in 10,000 that you tried. So your overhead isn’t 100%,
it’s no different than if your channel dropped one in 2000 packets. (For
those following, the problem before resulted in a complete link failure,
as the retransmitted packets on the connection would always fail CRC.)

Worst case, it can be an option in the python that most people can
leave turned off.

An 8b/10b line coding scheme would probably be best implemented as a
standalone hierarchical block that a developer could choose to use or
not as part of a flow graph implementing a transmit and receive path.

FYI: Am achieving nearly 100kb/s with -r 800k on a BasicTX

Is that bytes or bits per second?

–
Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com

Brett_Trotter · February 5, 2007, 9:26pm

Brett L. Trotter wrote:

FYI: Am achieving nearly 100kb/s with -r 800k on a BasicTX

Is that bytes or bits per second?

89-94 kibibits per second, nifty! (And thats not accounting for SCP
overhead)

You should be getting much higher rates.

At the default of 500 kilobits per second bit rate for GMSK, I can
transfer, using scp, a 10 megabyte file in 3:32, or 212 seconds, for a
effective throughput of about 49 kilobytes per second, or about 386
kilobits per second.

That’s shows about 23% overhead in terms of modulation, packet overhead,
protocol overhead, collisions, and probably most significantly, the
overhead from relinquishing the channel while the other end is sending
ACKs.

If you had the similar efficiency as the above, you’d be getting around
600-700 kilobits per second. Are you sure your figures aren’t kilobytes
instead?

-Johnathan

Brett_Trotter · February 5, 2007, 7:20pm

Johnathan C. wrote:

Brett L. Trotter wrote:

Yay- congratulations- transferred 100MB successfully and all of my
‘stumbler’ packets- even with UDP.

You’re using FSP (which goes over UDP), correct? FSP has a retry
mechanism, so good. Glad it’s working now!

I used FSP for the ‘stumblers’ but did the 100MB with scp (though the
FSP did work for the ‘stumblers’- I’ll test FSP on the big file here
momentarily)

as the retransmitted packets on the connection would always fail CRC.)

True enough, but that was also with random data- ordered data could
yield either significantly higher hangup rate or perhaps none at all.
8b/10b seems like a good generic robustness idea- as I said, I’ll
implement it and test it, and if anyone wants to turn it on, cool beans.

Worst case, it can be an option in the python that most people can
leave turned off.

An 8b/10b line coding scheme would probably be best implemented as a
standalone hierarchical block that a developer could choose to use or
not as part of a flow graph implementing a transmit and receive path.

Sounds like a good plan and more or less what I had in mind.

FYI: Am achieving nearly 100kb/s with -r 800k on a BasicTX

Is that bytes or bits per second?

89-94 kibibits per second, nifty! (And thats not accounting for SCP
overhead)

Brett_Trotter · March 12, 2007, 12:37am

Johnathan C. wrote:

underlying protocol for the majority of users.

> We haven't decided whether this will make it into the trunk. We'd much > rather make a real fix.

I found out what is (most likely) going wrong here. If a packet payload
(post-whitening) matches the complement of the access code, its bits get
flipped. A packet that contains the access code is handled correctly. I
haven’t tried it, but it would be interesting to see what happens if you
transmit a packet that (post-whitening) contains the entirety of a valid
packet… I’m not sure what the right fix is, so no patch…

-Dan

Brett_Trotter · March 12, 2007, 12:54am

Dan H. wrote:

I found out what is (most likely) going wrong here. If a packet
payload (post-whitening) matches the complement of the access code,
its bits get flipped. A packet that contains the access code is
handled correctly. I haven’t tried it, but it would be interesting to
see what happens if you transmit a packet that (post-whitening)
contains the entirety of a valid packet… I’m not sure what the
right fix is, so no patch…

This has already been fixed on the trunk and in release 3.0.3, and it is
exactly the issue you describe.

We did decide to leave in the “whitener offset” work-around in the trunk
as it is a good thing to do in any case. The default is off but you can
pass --use-whitener-offset to the tunnel.py and related examples to
enable it. (The stable branch has the bug fix but not the workaround;
it was too invasive.)

–
Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com