Splitting binary data

dubstep · April 20, 2011, 10:26am

Hello

First post (i am new to ruby :-)). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element. I understand I
may need to escape the , but how would i do that for the following
message. I can split it by unpacking to Hex, and the splitting, but that
is inefficient for my needs as I use bindata to inspect the packet. Any
help is appreciated

Thanks

\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00R\x02\x00\x00\x00’@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01
\x01\x01\x01\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00A\x02\x00\x00\x00’@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8

hroyd · April 20, 2011, 7:52pm

hroyd hroyd wrote in post #993957:

Hello

First post (i am new to ruby :-)). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element.

That means your pattern isn’t matching anywhere in the string:

str = ‘abc’
p str.split(‘e’)

–output:–
[“abc”]

Here’s what happens when the pattern matches:

puts RUBY_VERSION

str = “\xFF\xFF” +
“0xE2 0x82 0xAC” +
“\xFF\xFF” +
“0xE2 0x82 0xAC” +
“\xFF\xFF” +
“0xE2 0x82 0xAC” +
“\xFF\xFF” +
“0xE2 0x82 0xAC”

pattern = “\xFF\xFF”
p str.split(pattern)

–output:–
1.9.2
["", “0xE2 0x82 0xAC”, “0xE2 0x82 0xAC”, “0xE2 0x82 0xAC”, “0xE2 0x82
0xAC”]

Because your string string starts with the delimiter, there is an empty
string to the left side of the delimiter that is split.

hroyd · April 21, 2011, 12:01pm

Thanks for the reply, that works

I was trying to split on

“\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF”

but dropping the last \ was what I was missing

“\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF”

["",
“\x00R\x02\x00\x00\x00’@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01 \x01\x01\x01\x01”,
“\x00A\x02\x00\x00\x00’@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03”,
“\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0”,
“\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8”]

Thanks for your help

hroyd · April 21, 2011, 7:13pm

“Iñaki Baz C.” [email protected] wrote in post #994264:

2011/4/20 7stud – [email protected]:

p str.split(pattern)

–output:–
[“”, “a”, “b”, “c”, “d”]

Note that this fails under Ruby1.9:

p str.split(pattern)
ArgumentError: invalid byte sequence in UTF-8
from (irb):10:in `split’

I guess you missed this part:

puts RUBY_VERSION

…
…
…

–output:–
1.9.2

hroyd · April 21, 2011, 7:18pm

hroyd hroyd wrote in post #994257:

Thanks for your help

Sure. Also, note that ruby lets you do this:

pattern = “\xFF” * 16
p pattern

–output:–
“\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF”

…so that you don’t have to write that out by hand, and suffer the
inevitable typo.

hroyd · April 21, 2011, 7:30pm

2011/4/21 7stud – [email protected]:

…

–output:–
1.9.2

Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.

hroyd · April 21, 2011, 12:59pm

2011/4/20 7stud – [email protected]:

p str.split(pattern)

–output:–
[“”, “a”, “b”, “c”, “d”]

Note that this fails under Ruby1.9:

p str.split(pattern)
ArgumentError: invalid byte sequence in UTF-8
from (irb):10:in `split’

hroyd · April 21, 2011, 7:56pm

“Iñaki Baz C.” [email protected] wrote in post #994342:

2011/4/21 7stud – [email protected]:

…

–output:–
1.9.2

Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.a

I never use irb like interfaces in any language anymore–they are
unreliable.

hroyd · April 26, 2011, 11:48am

On ruby 1.9, a String object knows the encoding of itself.
And, If a String object includes byte sequences unsuitable for the
encoding,
the String#split method raises error.

Not using the magic comment, it’s not the matter that a string literal
includes
non-ASCII characters.

example: OK!!

#-------------------------------------------------
#! ruby-1.9.2

str = “\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64”
p str.encoding #=> #Encoding:ASCII-8BIT
p str.valid_encoding? #=> true

pattern = “\xFF\xFF”
p str.split( pattern ) #=> ["", “a”, “b”, “c”, “d”]
#-------------------------------------------------

However, using the magic comment to tell the file encoding is UTF-8,
it’s the matter that a string literal includes non-ASCII characters.

example: NG

#-------------------------------------------------
#! ruby-1.9.2

coding: UTF-8

str = “\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64”
p str.encoding #=> #Encoding:UTF-8
p str.valid_encoding? #=> false

pattern = “\xFF\xFF”
p pattern.valid_encoding? #=> false
p str.split( pattern ) # ERROR OCCURS!!!
#-------------------------------------------------

Avoiding this problem, you must change the encoding of the string which
include
non-ASCII characters into ASCII-8BIT.

example: avoiding the problem

#-------------------------------------------------
#! ruby-1.9.2

coding: UTF-8

str = “\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64”

change the encoding of the string

str.force_encoding Encoding::ASCII_8BIT
p str.encoding #=> #Encoding:ASCII-8BIT
p str.valid_encoding? #=> true

pattern = “\xFF\xFF”.force_encoding Encoding::ASCII_8BIT
p pattern.valid_encoding? #=> true
p str.split( pattern ) #=> ["", “a”, “b”, “c”, “d”]
#-------------------------------------------------

Kind regards,