How to scan a hex string?

Detlef_R · January 19, 2014, 8:51am

" sss\1\2abc\xAA sss ".scan(/\1\2(.*?)\xAA/)[1]

[11] pry(main)> s= " sss\1\2abc\xAAsss "
=> “\u0001\u0002abc\xAA”
[12] pry(main)> s.scan(/\001\002(.?)\xAA/)
SyntaxError: (eval):2: invalid multibyte escape: /\001\002(.?)\xAA/

how to scan that abc string ?

it is a hex string in a device’s serial communication .

sevk · January 19, 2014, 12:39pm

It if is a hex string, then it is not a string, you are just looking at
it as a string You ‘might’ handle it as a string, but in this case
you mix Unicode and non-Unicode characters in it (if we try to treat
each byte or multi-byte as a character).

So even if I force the encoding of UTF-8, it will fail (\xaa is not a
valid Unicode character)

" sss\1\2abc\xAA sss ".force_encoding(Encoding::UTF_8).valid_encoding?
=> false

If you don’t need to recognize multi-byte characters then deep dive into
the byte representation and search there:

" sss\1\2abc\xAA sss ".unpack(“H*”)
=> [“207373730102616263aa2073737320”]

" sss\1\2abc\xAA sss ".unpack(“H*”)[0].scan(/0102(.*)aa/)
=> [[“616263”]]

You can turn the result easily to character string:

“616263”.scan(/…/).map{|x| x.to_i(16)}.pack(“c*”)
=> “abc”

sevk · January 20, 2014, 6:08am

yes , it is char* , not string .

thank you .

I’ll try unpack(“H*”) and map.pack(“c*”) .

sevk · February 17, 2014, 12:57pm

Földes László wrote in post #1133627:

" sss\1\2abc\xAA sss ".force_encoding(Encoding::UTF_8).valid_encoding?
=> false

If you don’t need to recognize multi-byte characters then deep dive into
the byte representation and search there:

" sss\1\2abc\xAA sss ".unpack(“H*”)
=> [“207373730102616263aa2073737320”]

one question from out of curiosity - How did you got to know, by looking
at the final Array or string, that “616263” is actually “abc” ?

sevk · February 17, 2014, 1:03pm

Arup R. wrote in post #1136919:

one question from out of curiosity - How did you got to know, by looking
at the final Array or string, that “616263” is actually “abc” ?

Practice. After a few years of hexdumping and byte scanning, values like
31,32,33 and 41,42,43 and 61,62,63 start to leap out at you. You also
get used to scanning for 20 and 0A (or 0D0A).

sevk · January 21, 2014, 3:10am

thanks. I found a better way :

s = " sss\1\2abcd\xAA sss "
s.force_encoding(‘ASCII-8BIT’)

p s.scan(Regexp.new ‘\001\002\004\xAA’, nil , ‘n’)[0]

sevk · March 26, 2014, 8:28am

p s.scan(Regexp.new ‘\001\002\004\xAA’, nil , ‘n’)[0]
p s.scan(Regexp.new ‘\001\002\004\xAA’, ‘m’ , ‘n’)[0]

Mutileline is better if contain 0D 0A

sevk · March 26, 2014, 10:38am

Matthew K. wrote in post #1136920:

Arup R. wrote in post #1136919:

one question from out of curiosity - How did you got to know, by looking
at the final Array or string, that “616263” is actually “abc” ?

Practice. After a few years of hexdumping and byte scanning, values like
31,32,33 and 41,42,43 and 61,62,63 start to leap out at you. You also
get used to scanning for 20 and 0A (or 0D0A).

Exactly