Forum: Ruby Better way to read data from IO into packets?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0d298cda3121e5cacaa2465437769025?d=identicon&s=25 Levin Alexander (Guest)
on 2005-12-18 13:54
(Received via mailing list)
Hi,

I have a small program, that reads data from a serial port and chops
it into packets.  A Packet has the following format:

  [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>,
<crc:16>]

I have a thread that reads the data from an IO object:

  @receiver_thread = Thread.new do
    Thread.abort_on_exception = true
    loop do
      begin
        char = @io.readchar
        add_char_to_packet(char) if char
      rescue EOFError
        Thread.pass # there is currently nothing to read
      end
    end

and a state machine, that decodes the format:

    def add_char_to_packet(char)
      @state = :first_checksum if (@state == 0)
      case @state
      when :first_startbyte
        @data = ""
        @state = ((char == STARTBYTES[0]) ? :second_startbyte :
:first_startbyte)
      when :second_startbyte
        @state = (char == STARTBYTES[1]) ? :type :
          # special case: first startbyte is repeated
          (char == STARTBYTES[0] ? :second_startbyte : :first_startbyte)
      when :type
        @type = TYPE.invert[char]
        @state = :counter
      when :counter
        @counter = char
        @state = :length
      when :length
        @length = char
        @state = @length
      when Integer
        @data << char
        @state -= 1
      when :first_checksum
        @checksum = (char << 8)
        @state = :second_checksum
     [...]

This works, but the code is ugly and also a little slow because I have
to process each byte seperately.  Is there a better way?

Thank you,
Levin
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2005-12-18 23:20
(Received via mailing list)
Levin Alexander wrote:
> Hi,
>
> I have a small program, that reads data from a serial port and chops
> it into packets.  A Packet has the following format:
>
>   [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>, <crc:16>]

Why not read enough bytes to make sure you get the length byte:

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack "C5a#{len}n"

Or is the problem that there may be a variable number of "start bytes"?
In that case, maybe you could tell from the first 5 bytes how many start
bytes there are, and then read enough to capture the length byte.
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2005-12-19 09:49
(Received via mailing list)
Joel VanderWerf wrote:
>
> s = io.sysread(5)
> len = s[4]
> s << io.sysread(len+2)
> ary = s.unpack "C5a#{len}n"

Why do you use sysread?  I'd prefer to use #read in this case - I don't
see any advantage of resorting to sysread here - it may even prevent
read
buffering => things get slower than necessary.

> Or is the problem that there may be a variable number of "start
> bytes"? In that case, maybe you could tell from the first 5 bytes how
> many start bytes there are, and then read enough to capture the
> length byte.

Hm...  doesn't seem to be the case.

    robert
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2005-12-19 20:12
(Received via mailing list)
Robert Klemme wrote:
>>><crc:16>]
> see any advantage of resorting to sysread here - it may even prevent read
> buffering => things get slower than necessary.

You're right. I was thinking about readchar, which the op used and I
assumed to be unbuffered, but I don't even know if it is!

>>Or is the problem that there may be a variable number of "start
>>bytes"? In that case, maybe you could tell from the first 5 bytes how
>>many start bytes there are, and then read enough to capture the
>>length byte.
>
>
> Hm...  doesn't seem to be the case.

But there is some logic in the op's code that looks for repeated "start
byte". I'm just not sure what the limit is.
0d298cda3121e5cacaa2465437769025?d=identicon&s=25 Levin Alexander (Guest)
on 2005-12-19 23:46
(Received via mailing list)
On 12/18/05, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

> > I have a small program, that reads data from a serial port and chops
> > it into packets.  A Packet has the following format:
> >
> >   [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>, <crc:16>]
>
> Why not read enough bytes to make sure you get the length byte:

Because the application may be started in the middle of a packet or
the stream may be corrupted due to transmission errors.

But you are right, I should optimize that and only read single bytes
if I need to resynchronize.

> s = io.sysread(5)
> len = s[4]
> s << io.sysread(len+2)
> ary = s.unpack "C5a#{len}n"
>
> Or is the problem that there may be a variable number of "start bytes"?
> In that case, maybe you could tell from the first 5 bytes how many start
> bytes there are, and then read enough to capture the length byte.

Hmm, maybe I can use regular expressions to check for the correct
format:

  buffer = "bad data" << [0x65,0xEB,0,4,65,66,67,68,00,00].pack("C*")
  buffer.scan( /
    \x65\xEB     # startbytes
    (.)          # type
    (.)          # length-byte
    (.*)         # data
    (..)         # checksum
  /x )

I would need a way to discard old or bad data from the buffer,
probably need to think about it more

(btw: is there a way to use the length-byte in the regular expression
itself?  Something like /(.)(.{\1})/)

Thank you,
Levin
0d298cda3121e5cacaa2465437769025?d=identicon&s=25 Levin Alexander (Guest)
on 2005-12-19 23:55
(Received via mailing list)
On 12/19/05, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

> But there is some logic in the op's code that looks for repeated "start
> byte". I'm just not sure what the limit is.

A packet always starts with the two bytes "\x65\xEB", everything else
resets the state machine.

The special case in the code was needed to correcly handle
"\x65\x65\xEB" (one bad character before valid start of packet)  --
the state machine needs to always look for "\xEB" after "\x65"

-Levin
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2005-12-20 10:23
(Received via mailing list)
Levin Alexander wrote:
> Because the application may be started in the middle of a packet or
>> Or is the problem that there may be a variable number of "start
>     (.)          # type
>     (.)          # length-byte
>     (.*)         # data
>     (..)         # checksum
>   /x )
>
> I would need a way to discard old or bad data from the buffer,
> probably need to think about it more

Why not just use a regexp to verify the initial sequence and use their
offsets.  Or do something like

buffer.gsub!(/\A.*?(\x65\xEB)/, '\\1')

> (btw: is there a way to use the length-byte in the regular expression
> itself?  Something like /(.)(.{\1})/)

No.

Kind regards

    robert
This topic is locked and can not be replied to.