Hi, I'm quite new to ruby, and I'm facing a problem I can't seem to be able to solve by myself.. I'm comparing openssl sha1 hash results from a linux command line, to ruby ones : --- cmd line : openssl dgst -sha1 my_file ruby : require 'digest/sha1' puts Digest::SHA1.hexdigest(File.read("my_file")) --- I increase the file and run it again, and again. The hashes are similars until the file size reaches 512Mo, from then they differs. I tried several sha versions (sha256..sha512) and the problem is the same. However with MD5, I have no problem. Anyone has an idea if I'm doing something wrong here ? Thanks a lot ! ChoBolT
on 2009-03-04 20:07
on 2009-03-04 22:13
Philippe Chotard wrote: > I'm comparing openssl sha1 hash results from a linux command line, to > ruby ones : > --- > cmd line : > openssl dgst -sha1 my_file > > ruby : > require 'digest/sha1' > puts Digest::SHA1.hexdigest(File.read("my_file")) > --- > I increase the file and run it again, and again. > The hashes are similars until the file size reaches 512Mo, from then > they differs. Strange. First, try doing it in two stages: str = File.read("my_file") puts str.size puts Digest::SHA1.hexdigest(str) This may give you a clue if File.read is misbehaving. However this is unlikely if Digest::MD5 is fine. But in any case, reading 512MB of data into RAM just to calculate SHA1 is very wasteful. I suggest you recode it: puts Digest::SHA1.file("my_file").hexdigest or read the file in blocks: d = Digest::SHA1.new File.open("my_file") do |f| while chunk = f.read(65536) d << chunk end end puts d.hexdigest If you *still* get the same answer, then perhaps the command-line tool you are comparing against is at fault! Most Linux systems have at least two: sha1sum <file> openssl sha1 <file> so you can see if those agree or disagree, too. On my box (Ubuntu Hardy, ruby-1.8.6p114 compiled from source): $ ls -l ubuntu-8.04-desktop-i386.iso -rw-r--r-- 1 brian brian 733079552 Apr 24 2008 ubuntu-8.04-desktop-i386.iso $ sha1sum ubuntu-8.04-desktop-i386.iso 53a07a006d791f7fddc6d53879e826934f73bc0f ubuntu-8.04-desktop-i386.iso $ openssl dgst -sha1 ubuntu-8.04-desktop-i386.iso SHA1(ubuntu-8.04-desktop-i386.iso)= 53a07a006d791f7fddc6d53879e826934f73bc0f $ irb irb(main):001:0> require 'digest/sha1' => true irb(main):002:0> Digest::SHA1.file("ubuntu-8.04-desktop-i386.iso").hexdigest => "53a07a006d791f7fddc6d53879e826934f73bc0f" irb(main):003:0> d = Digest::SHA1.new => #<Digest::SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709> irb(main):004:0> File.open("ubuntu-8.04-desktop-i386.iso") do |f| irb(main):005:1* while chunk = f.read(65536) irb(main):006:2> d << chunk irb(main):007:2> end irb(main):008:1> end => nil irb(main):009:0> d.hexdigest => "53a07a006d791f7fddc6d53879e826934f73bc0f" irb(main):010:0> So I can't see any problem. However I don't really have enough RAM to read the file all in at once without swapping badly. It's possible that Digest::SHA1 barfs when given a string > 512MB. Regards, Brian.
on 2009-03-05 13:23
Brian Candler wrote: > But in any case, reading 512MB of data into RAM just to calculate SHA1 > is very wasteful. I suggest you recode it: > > puts Digest::SHA1.file("my_file").hexdigest Thanks for your response Brian. Indeed, using this method I got the right hash. So it looks like as you said, that the problem is appearing when sha is handling 512MB+ strings. I'll do some further testing on other systems and versions (Using ruby 1.9.0 on a debian lenny) Thanks !