Better way to read data from IO into packets?


#1

Hi,

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

[0x65, 0xEB, type:8, counter:8, length:8, data:length>,
crc:16]

I have a thread that reads the data from an IO object:

@receiver_thread = Thread.new do
Thread.abort_on_exception = true
loop do
begin
char = @io.readchar
add_char_to_packet(char) if char
rescue EOFError
Thread.pass # there is currently nothing to read
end
end

and a state machine, that decodes the format:

def add_char_to_packet(char)
  @state = :first_checksum if (@state == 0)
  case @state
  when :first_startbyte
    @data = ""
    @state = ((char == STARTBYTES[0]) ? :second_startbyte :

:first_startbyte)
when :second_startbyte
@state = (char == STARTBYTES[1]) ? :type :
# special case: first startbyte is repeated
(char == STARTBYTES[0] ? :second_startbyte : :first_startbyte)
when :type
@type = TYPE.invert[char]
@state = :counter
when :counter
@counter = char
@state = :length
when :length
@length = char
@state = @length
when Integer
@data << char
@state -= 1
when :first_checksum
@checksum = (char << 8)
@state = :second_checksum
[…]

This works, but the code is ugly and also a little slow because I have
to process each byte seperately. Is there a better way?

Thank you,
Levin


#2

Levin A. wrote:

Hi,

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

[0x65, 0xEB, type:8, counter:8, length:8, data:length>, crc:16]

Why not read enough bytes to make sure you get the length byte:

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack “C5a#{len}n”

Or is the problem that there may be a variable number of “start bytes”?
In that case, maybe you could tell from the first 5 bytes how many start
bytes there are, and then read enough to capture the length byte.


#3

Robert K. wrote:

crc:16]
see any advantage of resorting to sysread here - it may even prevent read
buffering => things get slower than necessary.

You’re right. I was thinking about readchar, which the op used and I
assumed to be unbuffered, but I don’t even know if it is!

Or is the problem that there may be a variable number of “start
bytes”? In that case, maybe you could tell from the first 5 bytes how
many start bytes there are, and then read enough to capture the
length byte.

Hm… doesn’t seem to be the case.

But there is some logic in the op’s code that looks for repeated “start
byte”. I’m just not sure what the limit is.


#4

On 12/18/05, Joel VanderWerf removed_email_address@domain.invalid wrote:

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

[0x65, 0xEB, type:8, counter:8, length:8, data:length>, crc:16]

Why not read enough bytes to make sure you get the length byte:

Because the application may be started in the middle of a packet or
the stream may be corrupted due to transmission errors.

But you are right, I should optimize that and only read single bytes
if I need to resynchronize.

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack “C5a#{len}n”

Or is the problem that there may be a variable number of “start bytes”?
In that case, maybe you could tell from the first 5 bytes how many start
bytes there are, and then read enough to capture the length byte.

Hmm, maybe I can use regular expressions to check for the correct
format:

buffer = “bad data” << [0x65,0xEB,0,4,65,66,67,68,00,00].pack(“C*”)
buffer.scan( /
\x65\xEB # startbytes
(.) # type
(.) # length-byte
(.*) # data
(…) # checksum
/x )

I would need a way to discard old or bad data from the buffer,
probably need to think about it more

(btw: is there a way to use the length-byte in the regular expression
itself? Something like /(.)(.{\1})/)

Thank you,
Levin


#5

On 12/19/05, Joel VanderWerf removed_email_address@domain.invalid wrote:

But there is some logic in the op’s code that looks for repeated “start
byte”. I’m just not sure what the limit is.

A packet always starts with the two bytes “\x65\xEB”, everything else
resets the state machine.

The special case in the code was needed to correcly handle
“\x65\x65\xEB” (one bad character before valid start of packet) –
the state machine needs to always look for “\xEB” after “\x65”

-Levin


#6

Levin A. wrote:

Because the application may be started in the middle of a packet or

Or is the problem that there may be a variable number of "start
(.) # type
(.) # length-byte
(.*) # data
(…) # checksum
/x )

I would need a way to discard old or bad data from the buffer,
probably need to think about it more

Why not just use a regexp to verify the initial sequence and use their
offsets. Or do something like

buffer.gsub!(/\A.*?(\x65\xEB)/, ‘\1’)

(btw: is there a way to use the length-byte in the regular expression
itself? Something like /(.)(.{\1})/)

No.

Kind regards

robert

#7

Joel VanderWerf wrote:

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack “C5a#{len}n”

Why do you use sysread? I’d prefer to use #read in this case - I don’t
see any advantage of resorting to sysread here - it may even prevent
read
buffering => things get slower than necessary.

Or is the problem that there may be a variable number of “start
bytes”? In that case, maybe you could tell from the first 5 bytes how
many start bytes there are, and then read enough to capture the
length byte.

Hm… doesn’t seem to be the case.

robert