Problem with unpack

rafael · July 1, 2008, 10:37pm

Hi,

I’m trying to read a binary file and would want to find within the
file a particular substring. I have done the following

file = File.open(“testFile”,“rb”)
str = String.new
file.read(512, str)

puts (str.length).to_s # how many bytes have I read?

strhex = str.unpack(“H*”)
puts strhex # just show me the string to check

if (strhex.include?(“020”)) == true then
puts “The tag was found”
else
puts “The tag was not found”
end

The tag is not found although I can see (and I know) that the
substring 020 is in there.

I have tried to find the problem and I realized that the length of
strhex is one
strhex.length # -> 1

This is certainly the problem. But why is the length only one? Can
anyone help?
I am also aware that this code is probably not the best way of solving
my problem. I’m still learning

Thanks

Cheers

Rafael

rafael · July 1, 2008, 11:47pm

On Jul 1, 2008, at 13:32 PM, Rafael wrote:

I’m trying to read a binary file and would want to find within the
file a particular substring. I have done the following

[…]

unpack is useful when you need to turn binary data into usable
structure. Just use a regular expression:

open ‘testfile’, ‘rb’ do |io|
if io.read(512) =~ /\020/ then
puts “The tag was found”
else
puts “The tag was not found”
end
end

rafael · July 2, 2008, 12:06pm

On 1 Jul., 23:43, Eric H. [email protected] wrote:

open ‘testfile’, ‘rb’ do |io|
if io.read(512) =~ /\020/ then
puts “The tag was found”
else
puts “The tag was not found”
end
end

Hi,

thanks for your response, but it didn’t solve my problem, so I guess I
didn’t explain it correctly.
I’ll give it another go.

I have a binary file.
If I look at this file using hexdump -Cn 512 testfile

it looks like this:
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|…|
*
00000080 44 49 43 4d 02 00 00 00 55 4c 04 00 c2 00 00 00 |
DICM…UL…|
00000090 02 00 01 00 4f 42 00 00 02 00 00 00 00 01 02 00
|…OB…|
000000a0 02 00 55 49 1a 00 31 2e 32 2e 38 34 30 2e 31 30 |…UI…
1.2.840.10|
000000b0 30 30 38 2e 35 2e 31 2e 34 2e 31 2e 31 2e 32 00 |
008.5.1.4.1.1.2.|
000000c0 02 00 03 00 55 49 3c 00 32 2e 31 36 2e 38 34 30 |…UI<.
2.16.840|
000000d0 2e 31 2e 31 31 33 36 36 32 2e 32 2e 31 2e 34 35 |.
1.113662.2.1.45|
000000e0 31 39 2e 34 31 35 38 32 2e 34 31 30 35 31 35 32 |
19.41582.4105152|
000000f0 2e 34 31 39 39 39 30 35 30 35 2e 34 31 30 35 32 |.
419990505.41052|
00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e |
3251…UI…1.2.|
00000110 38 34 30 2e 31 30 30 30 38 2e 31 2e 32 2e 31 00 |
840.10008.1.2.1.|
00000120 02 00 12 00 55 49 18 00 32 2e 31 36 2e 38 34 30 |…UI…
2.16.840|
00000130 2e 31 2e 31 31 33 36 36 32 2e 32 2e 31 2e 31 00 |.
1.113662.2.1.1.|
00000140 02 00 16 00 41 45 0a 00 50 48 4f 45 4e 49 58 53
|…AE…PHOENIXS|
00000150 43 50 08 00 00 00 55 4c 04 00 54 02 00 00 08 00 |
CP…UL…T…|
00000160 05 00 43 53 0a 00 49 53 4f 5f 49 52 20 31 30 30
|…CS…ISO_IR 100|
00000170 08 00 08 00 43 53 16 00 4f 52 49 47 49 4e 41 4c
|…CS…ORIGINAL|
00000180 5c 50 52 49 4d 41 52 59 5c 41 58 49 41 4c 08 00 |\PRIMARY
\AXIAL…|
00000190 12 00 44 41 0a 00 31 39 39 39 2e 30 35 2e 30 35 |…DA…
1999.05.05|
000001a0 08 00 13 00 54 4d 10 00 31 30 3a 35 32 3a 33 34 |…TM…
10:52:34|
000001b0 2e 35 33 30 30 30 30 20 08 00 16 00 55 49 1a 00 |.
530000 …UI…|
000001c0 31 2e 32 2e 38 34 30 2e 31 30 30 30 38 2e 35 2e |
1.2.840.10008.5.|
000001d0 31 2e 34 2e 31 2e 31 2e 32 00 08 00 18 00 55 49 |
1.4.1.1.2…UI|
000001e0 3c 00 32 2e 31 36 2e 38 34 30 2e 31 2e 31 31 33 |<.
2.16.840.1.113|
000001f0 36 36 32 2e 32 2e 31 2e 34 35 31 39 2e 34 31 35 |
662.2.1.4519.415|
00000200

What I want to do is to find the group 00201000 in the hexadecimal
representation. It’s in the 11th line of output
00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e |
3251…UI…1.2.|

So I thought,

I open the file and read a bit into str

file = File.open(“testFile”,“rb”)
str = String.new
file.read(512, str)

then I unpack the str into another string interpreting the bytes as
hexadecimal representations

strhex = str.unpack(“H*”)

and look for the desired group within this “transformed” string
if (strhex.include?(“02001000”)) == true then
puts “The tag was found”
else
puts “The tag was not found”
end

As I said, this doesn’t work (it reports the tag was not found!) and I
wonder whether it has something to do with the fact that

strhex.length # → 1

By the way, if after unpacking I do
puts strhex

then I get the correct string, that is exactly the same as hexdump
shows me on the left-hand side.
So strhex seems to contain what I need, but still there is some kind
of problem.

Any hints?

Thank you very much

Cheers

Rafael

rafael · July 2, 2008, 10:50pm

On 2 Jul., 12:35, Peña, Botp [email protected] wrote:

 format string, returning an array of each value extracted. The
c/c+±ism lose the “== true” portion,
or something like that

hth.

kind regards -botp

Thank you, this hint did it!
I had read the documentation about unpack and also the bit about the
return value being an array. But then the table with the different
unpack directives had confused me. The column “Returns” says “String”
at the “H” directive. Now I understand, the return type of “unpack” is
array and if one uses the directive “H”, then the elements of that
array are of type string. Got it!

And yes, I must recognize I am C++ biased. But I’m learning.

Thanks

Cheers

Rafael

rafael · July 2, 2008, 12:41pm

From: Rafael [mailto:[email protected]]

strhex = str.unpack(“H*”)

you assume the return value is a string

botp@botp-desktop:~$ qri string.unpack
---------------------------------------------------------- String#unpack
str.unpack(format) => anArray

 Decodes str (which may contain binary data) according to the
 format string, returning an array of each value extracted. The
 format string consists of a sequence ....

verify it by using #class method
eg,

p strhex.class

and look for the desired group within this “transformed” string

if (strhex.include?(“02001000”)) == true then

c/c+±ism lose the “== true” portion,

if strhex[0].include?(“02001000”)

or much better,

strhex = str.unpack(“H*”)[0] #<—note the index for 1st elem
if strhex.include?(“02001000”)
puts “The tag was found”
…

or something like that

As I said, this doesn’t work (it reports the tag was not found!) and I

wonder whether it has something to do with the fact that

strhex.length # → 1

you got a hint there. should have check it further

By the way, if after unpacking I do

puts strhex

try,

p strhex

then I get the correct string, that is exactly the same as hexdump

shows me on the left-hand side.

So strhex seems to contain what I need, but still there is some kind

of problem.

that implies that you are very close to your objective …

hth.

kind regards -botp

rafael · July 3, 2008, 2:55am

On Jul 2, 2008, at 03:02 AM, Rafael wrote:

open ‘testfile’, ‘rb’ do |io|
didn’t explain it correctly.
I’ll give it another go.

I have a binary file.

What I want to do is to find the group 00201000 in the hexadecimal
representation. It’s in the 11th line of output
00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e |
3251…UI…1.2.|

you want to find the sequence of bytes “\0\020\010\0”?

if (strhex.include?(“02001000”)) == true then

yes!

the regex you want is: /\0\020\010\0/

The big piece of code you wrote can be reduced to file.read(512) =~ /
\0\020\010\0/. There’s no need to convert all the text.

Note that there is a bug in your code, it will also match on a half-
byte:

00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e
XXXXXXXX 33 32 35 30 20 01 00 01 55 49 14 00 31 2e 32 2e
^^^^^^^^^^^^

The regex solution does not have this bug.

rafael · July 4, 2008, 5:40pm

On 3 Jul., 02:52, Eric H. [email protected] wrote:

00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e |
The big piece of code you wrote can be reduced to file.read(512) =~ /
\0\020\010\0/. There’s no need to convert all the text.

Note that there is a bug in your code, it will also match on a half-
byte:

00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e
XXXXXXXX 33 32 35 30 20 01 00 01 55 49 14 00 31 2e 32 2e
^^^^^^^^^^^^

The regex solution does not have this bug.

Hi,

thanks again for your message. You’re absolutely right about the
bug … how stupid of me!

But regarding the regular expression, shouldn’t it be something like /
\x02\x00\x10\x00/ ?

Cheers

Rafael

rafael · July 5, 2008, 9:54pm

On Jul 4, 2008, at 08:36 AM, Rafael wrote:

XXXXXXXX 33 32 35 30 20 01 00 01 55 49 14 00 31 2e 32 2e
\x02\x00\x10\x00/ ?
Yes! sorry!

Problem with unpack

strhex = str.unpack(“H*”)

botp@botp-desktop:~$ qri string.unpack ---------------------------------------------------------- String#unpack str.unpack(format) => anArray

and look for the desired group within this “transformed” string

if (strhex.include?(“02001000”)) == true then

As I said, this doesn’t work (it reports the tag was not found!) and I

wonder whether it has something to do with the fact that

strhex.length # → 1

By the way, if after unpacking I do

puts strhex

then I get the correct string, that is exactly the same as hexdump

shows me on the left-hand side.

So strhex seems to contain what I need, but still there is some kind

of problem.

botp@botp-desktop:~$ qri string.unpack
---------------------------------------------------------- String#unpack
str.unpack(format) => anArray