Forum: Ruby Problem with Base64 decoding

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
7a03066e8719f4d938bf622351ce4e7b?d=identicon&s=25 alexander (Guest)
on 2007-01-29 11:28
(Received via mailing list)
hi there,
last time i accidentely posted this question as a reply to another one..
i´m really sorry for that. i will not make that mistake again.

so here´s my question again in a fresh new thread :)

i´m having a small problem with base64 decoding a string.
i´m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn´t.
here´s the scripts in question:

php:
<?

        $bytes = file_get_contents("test.rgb");
        $bitmap = base64_decode($bytes);

        $header = "";
        $header .= "\xFF\xFE";
        $header .= pack("n2",120,97);
        $header .= "\x01";
        $header .= "\xFF\xFF\xFF\xFF";

        $header .= $bitmap;

        file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes = Base64.decode64(IO.read("test.rgb"))

bitmap = "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i´m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated  :)

i´ve uploaded the test.rgb file i´m using to here:

http://rss.fork.de/test.rgb if that´s even needed  :)

thanks a lot,

alexander
97550977337c9f0a0e1a9553e55bfaa0?d=identicon&s=25 Jan Svitok (Guest)
on 2007-01-29 20:32
(Received via mailing list)
On 1/29/07, alexander <alexander@fork.de> wrote:
> doesn´t.
>         $header .= pack("n2",120,97);
> require 'fileutils'
> File.new("test_ruby.gd","w").puts(bitmap)
>
> the ruby version is one byte shorter.
>
> i´m probably missing something rather obvious here, but any pointers to
> how i can make the ruby output be like the php output would be greatly
> appreciated  :)
>
> i´ve uploaded the test.rgb file i´m using to here:
>
> http://rss.fork.de/test.rgb if that´s even needed  :)

Hi,

1. have a look at the differences in those two files. By that you
should be able to tell where's the problem: either in the decoding
part or in the assembling.

2. you are using puts that appends a newline, so it seems to me that
ruby version is one byte LONGER. if that's the problem, replace puts
with write.

3. File.open("test_ruby.gd","w") {|f| f.puts(bitmap) } should be
safer, as it doesn't rely on garbage collector for closing the file,
it is closed immediately after the block finishes. This will be
helpful when you'll work with large number of files (and you'll run
out of free descriptors)

4. I guess you don't need rubygems nor fileutils for this to work
(that's ok if you use them for some other code not posted)
7a03066e8719f4d938bf622351ce4e7b?d=identicon&s=25 alexander (Guest)
on 2007-02-09 10:22
(Received via mailing list)
hi there,
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.
inspecting a hexdump of both decoded files show that they are not even
remotely the same.

right now i use a php script that i call with system. it´s kind of an
ugly solution, but at least it works ;)

i will keep trying though to get a 100% ruby solution to this problem.

kind regards and thanks again,

alexander
97550977337c9f0a0e1a9553e55bfaa0?d=identicon&s=25 Jan Svitok (Guest)
on 2007-02-09 10:39
(Received via mailing list)
On 2/9/07, alexander <alexander@fork.de> wrote:
> hi there,
> thank you for your tips!
>
> and indeed using write instead of puts i atleast got the filesize right.
> sadly everything else is still wrong.
>
> i think the problem is definitely in the decoding, but i don´t even know
> where to start there since the resulting files vary to a great degree.
> inspecting a hexdump of both decoded files show that they are not even
> remotely the same.

If you post your code along with expected and actual output (e.g.
those hexdumps), perhaps somebody will have a look... just post as
short data file as possible (meaning that it still decodes wrong).
That reminds me: did you try decoding an empty file?
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (Guest)
on 2007-02-09 10:48
(Received via mailing list)
On Fri, Feb 09, 2007 at 06:20:28PM +0900, alexander wrote:
> thank you for your tips!
>
> and indeed using write instead of puts i atleast got the filesize right.
> sadly everything else is still wrong.
>
> i think the problem is definitely in the decoding, but i don´t even know
> where to start there since the resulting files vary to a great degree.

Firstly, use hexdump -C on both the output files.

If they both start with FF FE 00 78 00 61 01 FF FF FF FF
then you know that the headers are right and it's the base64-decoded bit
which is wrong.

> >> all_bytes = Base64.decode64(IO.read("test.rgb"))

BTW there's a built-in alternative:

   all_bytes = IO.read("test.rgb").unpack("m")[0]

But on your test file they give the same results.

> >> File.new("test_ruby.gd","w").puts(bitmap)

If this is a Windows platform, use "wb" instead of "w". However you say
that
now you're using write instead of puts, the files are the same size
anyway.

> >> i´ve uploaded the test.rgb file i´m using to here:
> >>
> >> http://rss.fork.de/test.rgb if that´s even needed  :)

I can see two issues with that file:

(1) It has no line breaks, but I don't think that matters.

(2) It starts with the three-byte sequence ef bb bf, which is a unicode
<FEFF> character according to my editor.

Stripping this off gives a completely different answer to the base64
decoding:

irb(main):027:0> a=IO.read("test.rgb"); nil
=> nil
irb(main):028:0> b=a.unpack("m")[0]; b.size
=> 46560
irb(main):029:0> c=a[3..-1].unpack("m")[0]; c.size
=> 46560
irb(main):030:0> b[0..5]
=> "\304\000\000={u"
irb(main):031:0> c[0..5]
=> "\000\365\355\326\000\342"

and perhaps this second one is the answer you're looking for.

If so, I would say that unpack("m") is badly broken. Either it should
give
an exception when presented with characters outside of the base64 set,
or it
should ignore them. According to RFC 2045 section 6.8,

   The encoded output stream must be represented in lines of no more
   than 76 characters each.  All line breaks or other characters not
   found in Table 1 must be ignored by decoding software.  In base64
   data, characters other than those in Table 1, line breaks, and other
   white space probably indicate a transmission error, about which a
   warning message or even a message rejection might be appropriate
   under some circumstances.

I would consider the unicode BOM as "white space", but in any case it
must
either be ignored or cause a warning or error; it must not cause the
data to
be decoded wrongly!

BTW, I did the above test under ruby 1.8.4 (2005-12-24) [i486-linux]
from
Ubuntu 6.06. It's possible that it has been fixed in a later version.

HTH,

Brian.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (Guest)
on 2007-02-09 10:55
(Received via mailing list)
Here's a more concise summary of the bug.

irb(main):001:0> RUBY_VERSION
=> "1.8.4"
irb(main):002:0> a = "b2s="
=> "b2s="
irb(main):003:0> b = "\xef\xbb\xbf" + a
=> "\357\273\277b2s="
irb(main):004:0> a.unpack("m")
=> ["ok"]
irb(main):005:0> b.unpack("m")
=> ["\304\000\e\332"]
97550977337c9f0a0e1a9553e55bfaa0?d=identicon&s=25 Jan Svitok (Guest)
on 2007-02-09 11:34
(Received via mailing list)
On 2/9/07, Jan Svitok <jan.svitok@gmail.com> wrote:
> If you post your code along with expected and actual output (e.g.
> those hexdumps), perhaps somebody will have a look... just post as

Sorry, I didn't read your first post properly... I guess I'm doing too
manyu things at once...
7a03066e8719f4d938bf622351ce4e7b?d=identicon&s=25 alexander (Guest)
on 2007-02-09 13:02
(Received via mailing list)
whee!
thank you!
the three byte sequence you pointed out at the start of the file was the
culprit.

i just needed to [3..-1] that out of the way and everything works
perfectly now... (crossing my fingers now that the app that´s producing
those files doesn´t put illegal characters somewhere in the middle of
the files, but that hasn´t happened yet.)

according to the rfc this still seems like a bug to me.
is there anywhere i should report that bug (if it is one)?

thank you guys again for looking into this!
really made my day that it´s solved now.

kind regards,
alexander
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (Guest)
on 2007-02-09 13:10
(Received via mailing list)
On Fri, Feb 09, 2007 at 09:01:20PM +0900, alexander wrote:
> i just needed to [3..-1] that out of the way and everything works
> perfectly now... (crossing my fingers now that the app that´s producing
> those files doesn´t put illegal characters somewhere in the middle of
> the files, but that hasn´t happened yet.)

Maybe just gsub! everything else out. Untested:

   gsub!(/[^A-Za-z0-9+\/=]/, '')
This topic is locked and can not be replied to.