Newbie: working with binary files/extract png from a binary file

Hello,

I am reverse engineering a binary file that has an embedded PNG image
in it. I opened the file in a hex editor and found the png header
information, but how to use Ruby to extract the PNG to it’s own file?

Jim wrote:

Hello,

I am reverse engineering a binary file that has an embedded PNG image
in it. I opened the file in a hex editor and found the png header
information, but how to use Ruby to extract the PNG to it’s own file?

Do you know the absolute starting position of the embedded PNG?

You can get the data as a string like this:

start_pos = 5 # or whatever
end_pos = -1 # assume end of file
png_data = File.open(‘file_with_png’, “rb”) { |f|
f.read[start_pos…end_pos]
}

Note that “rb” means open for read (“r”) and treat as binary data (“b”)
(avoids munging “\r\n” on windows).

If the end_pos has to be determined by reading a field, then this will
take a little tinkering. I assume PNG has a length field somewhere? So
you’ll have to extract that (using String#unpack, perhaps) and chop off
that many bytes from the start of the png_data string.

On 1/30/08, Joel VanderWerf [email protected] wrote:

Jim wrote:

Hello,

I am reverse engineering a binary file that has an embedded PNG image
in it. I opened the file in a hex editor and found the png header
information, but how to use Ruby to extract the PNG to it’s own file?

Do you know the absolute starting position of the embedded PNG?

You can find it:

fp = File.new(“has_embedded_png.dat”,“rb”)
m= /\211PNG/.match(fp.read)
raise “no PNG” if !m
fp.seek(m.begin(0))

If the end_pos has to be determined by reading a field, then this will
take a little tinkering. I assume PNG has a length field somewhere? So
you’ll have to extract that (using String#unpack, perhaps) and chop off
that many bytes from the start of the png_data string.

There’s no total length field, just a series of ‘chunks’ each with
its own length.
Luckily, the ‘IEND’ chunk is always last, so you can just extract
chunks until you get to that one:

def extract_chunk(input, output)
lenword = input.read(4)
length = lenword.unpack(‘N’)[0]
type = input.read(4)
data = length>0 ? input.read(length) : “”
crc = input.read(4)
return nil if length<0 || !((‘A’…‘z’)===type[0,1])
#return nil if validate_crc(type+data, crc)
output.write lenword
output.write type
output.write data
output.write crc
return type
end

def extract_png(input, output)
hdr = input.read(8)
raise “Not a PNG File” if hdr[0,4]!= “\211PNG”
raise “file not in binary mode” if hdr[4,4]!=“\r\n\032\n”
output.write(hdr)
loop do
chunk_type = extract_chunk(input,output)
p chunk_type
break if chunk_type.nil? || chunk_type == ‘IEND’
end
end

ofp = File.new(“out.png”,“wb”)
extract_png(fp,ofp)

-Adam

On 1/30/08, Jim [email protected] wrote:

Thank you both for your information and effort.

Adam, your code worked out of the box. Tell me you didn’t write that
off the top of your head?

I patched it together based on wikipedia’s PNG entry and parts of a
WAV file reader I wrote earlier - they are both chunk-based file
formats. It took a few failed tests before I got it right.

m= /\211PNG/.match(fp.read)

(fp.read) is a String of binary data. Using a regular ex. to locate

the PNG header

fp.seek(m.begin(0))

I understand seek, Not sure what m.begin(0) is actually doing?

Docs say: Returns the offset of the start of the nth element of the
match array in the string.

I still don’t get it.

from the same docs:
"MatchData acts as an array. … mtch[0] is equivalent to … the
entire matched string. "

so m.begin(0) returns the offset of the start of the matched string
(in this case, the start of the png header). Since the string
contains the contents of the whole file, the string offset is the same
as the file offset we need.
-Adam

On Jan 30, 7:29 pm, Adam S. [email protected] wrote:

-Adam
OK, it’s really not that hard is it?

input.read(N) read N bytes from the stream. From the spec we know PNG
has an header (they call a signature) that is 8 bytes. Check it. If
it’s a PNG, read the chunks until IEND

The only other thing is unpack(“N”). From the docs it takes a string
of 4 bytes and returns a Fixnum.

Thanks again for the code. The file I’m extracting is a SketchUp 3d
model, btw.

Thank you both for your information and effort.

Adam, your code worked out of the box. Tell me you didn’t write that
off the top of your head?

fp = File.new(“has_embedded_png.dat”,“rb”)

fp is a file pointer

m= /\211PNG/.match(fp.read)

(fp.read) is a String of binary data. Using a regular ex. to locate

the PNG header

raise “no PNG” if !m

exception if there is not a header match

fp.seek(m.begin(0))

I understand seek, Not sure what m.begin(0) is actually doing?

Docs say: Returns the offset of the start of the nth element of the
match array in the string.

I still don’t get it.