How to scan a hex string?

" sss\1\2abc\xAA sss ".scan(/\1\2(.*?)\xAA/)[1]

[11] pry(main)> s= " sss\1\2abc\xAAsss "
=> “\u0001\u0002abc\xAA”
[12] pry(main)> s.scan(/\001\002(.?)\xAA/)
SyntaxError: (eval):2: invalid multibyte escape: /\001\002(.
?)\xAA/

how to scan that abc string ?

it is a hex string in a device’s serial communication .

It if is a hex string, then it is not a string, you are just looking at
it as a string :slight_smile: You ‘might’ handle it as a string, but in this case
you mix Unicode and non-Unicode characters in it (if we try to treat
each byte or multi-byte as a character).

So even if I force the encoding of UTF-8, it will fail (\xaa is not a
valid Unicode character)

" sss\1\2abc\xAA sss ".force_encoding(Encoding::UTF_8).valid_encoding?
=> false

If you don’t need to recognize multi-byte characters then deep dive into
the byte representation and search there:

" sss\1\2abc\xAA sss ".unpack(“H*”)
=> [“207373730102616263aa2073737320”]

" sss\1\2abc\xAA sss ".unpack(“H*”)[0].scan(/0102(.*)aa/)
=> [[“616263”]]

You can turn the result easily to character string:

“616263”.scan(/…/).map{|x| x.to_i(16)}.pack(“c*”)
=> “abc”

yes , it is char* , not string .

thank you .

I’ll try unpack(“H*”) and map.pack(“c*”) .

Földes László wrote in post #1133627:

" sss\1\2abc\xAA sss ".force_encoding(Encoding::UTF_8).valid_encoding?
=> false

If you don’t need to recognize multi-byte characters then deep dive into
the byte representation and search there:

" sss\1\2abc\xAA sss ".unpack(“H*”)
=> [“207373730102616263aa2073737320”]

one question from out of curiosity - How did you got to know, by looking
at the final Array or string, that “616263” is actually “abc” ?

Arup R. wrote in post #1136919:

one question from out of curiosity - How did you got to know, by looking
at the final Array or string, that “616263” is actually “abc” ?

Practice. After a few years of hexdumping and byte scanning, values like
31,32,33 and 41,42,43 and 61,62,63 start to leap out at you. You also
get used to scanning for 20 and 0A (or 0D0A).

thanks. I found a better way :

s = " sss\1\2abcd\xAA sss "
s.force_encoding(‘ASCII-8BIT’)

p s.scan(Regexp.new ‘\001\002\004\xAA’, nil , ‘n’)[0]

p s.scan(Regexp.new ‘\001\002\004\xAA’, nil , ‘n’)[0]
p s.scan(Regexp.new ‘\001\002\004\xAA’, ‘m’ , ‘n’)[0]

Mutileline is better if contain 0D 0A

Matthew K. wrote in post #1136920:

Arup R. wrote in post #1136919:

one question from out of curiosity - How did you got to know, by looking
at the final Array or string, that “616263” is actually “abc” ?

Practice. After a few years of hexdumping and byte scanning, values like
31,32,33 and 41,42,43 and 61,62,63 start to leap out at you. You also
get used to scanning for 20 and 0A (or 0D0A).

Exactly :slight_smile: