Openssl and SHA*

Hi,

I’m quite new to ruby, and I’m facing a problem I can’t seem to be able
to solve by myself…
I’m comparing openssl sha1 hash results from a linux command line, to
ruby ones :

cmd line :
openssl dgst -sha1 my_file

ruby :
require ‘digest/sha1’
puts Digest::SHA1.hexdigest(File.read(“my_file”))

I increase the file and run it again, and again.
The hashes are similars until the file size reaches 512Mo, from then
they differs.
I tried several sha versions (sha256…sha512) and the problem is the
same.
However with MD5, I have no problem.

Anyone has an idea if I’m doing something wrong here ?
Thanks a lot !

ChoBolT

Philippe Chotard wrote:

I’m comparing openssl sha1 hash results from a linux command line, to
ruby ones :

cmd line :
openssl dgst -sha1 my_file

ruby :
require ‘digest/sha1’
puts Digest::SHA1.hexdigest(File.read(“my_file”))

I increase the file and run it again, and again.
The hashes are similars until the file size reaches 512Mo, from then
they differs.

Strange. First, try doing it in two stages:

str = File.read(“my_file”)
puts str.size
puts Digest::SHA1.hexdigest(str)

This may give you a clue if File.read is misbehaving. However this is
unlikely if Digest::MD5 is fine.

But in any case, reading 512MB of data into RAM just to calculate SHA1
is very wasteful. I suggest you recode it:

puts Digest::SHA1.file(“my_file”).hexdigest

or read the file in blocks:

d = Digest::SHA1.new
File.open(“my_file”) do |f|
while chunk = f.read(65536)
d << chunk
end
end
puts d.hexdigest

If you still get the same answer, then perhaps the command-line tool
you are comparing against is at fault! Most Linux systems have at least
two:

sha1sum
openssl sha1

so you can see if those agree or disagree, too.

On my box (Ubuntu Hardy, ruby-1.8.6p114 compiled from source):

$ ls -l ubuntu-8.04-desktop-i386.iso
-rw-r–r-- 1 brian brian 733079552 Apr 24 2008
ubuntu-8.04-desktop-i386.iso
$ sha1sum ubuntu-8.04-desktop-i386.iso
53a07a006d791f7fddc6d53879e826934f73bc0f ubuntu-8.04-desktop-i386.iso
$ openssl dgst -sha1 ubuntu-8.04-desktop-i386.iso
SHA1(ubuntu-8.04-desktop-i386.iso)=
53a07a006d791f7fddc6d53879e826934f73bc0f
$ irb
irb(main):001:0> require ‘digest/sha1’
=> true
irb(main):002:0>
Digest::SHA1.file(“ubuntu-8.04-desktop-i386.iso”).hexdigest
=> “53a07a006d791f7fddc6d53879e826934f73bc0f”
irb(main):003:0> d = Digest::SHA1.new
=> #<Digest::SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709>
irb(main):004:0> File.open(“ubuntu-8.04-desktop-i386.iso”) do |f|
irb(main):005:1* while chunk = f.read(65536)
irb(main):006:2> d << chunk
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> d.hexdigest
=> “53a07a006d791f7fddc6d53879e826934f73bc0f”
irb(main):010:0>

So I can’t see any problem. However I don’t really have enough RAM to
read the file all in at once without swapping badly. It’s possible that
Digest::SHA1 barfs when given a string > 512MB.

Regards,

Brian.

Brian C. wrote:

But in any case, reading 512MB of data into RAM just to calculate SHA1
is very wasteful. I suggest you recode it:

puts Digest::SHA1.file(“my_file”).hexdigest

Thanks for your response Brian.
Indeed, using this method I got the right hash. So it looks like as you
said, that the problem is appearing when sha is handling 512MB+ strings.

I’ll do some further testing on other systems and versions (Using ruby
1.9.0 on a debian lenny)

Thanks !