I’m calculating md5 checksums on very large files (2 GB). This is a safe
way to do so, right? Also… is the file closed when the block exits?
I’m using ‘rb’ as this is used on Windows and Linux computers.
md5 = Digest::MD5.new()
File.open(file, ‘rb’).each {|line| md5.update(line)}
rtilley wrote:
I’m calculating md5 checksums on very large files (2 GB). This is a safe
way to do so, right? Also… is the file closed when the block exits?
I’m using ‘rb’ as this is used on Windows and Linux computers.
md5 = Digest::MD5.new()
File.open(file, ‘rb’).each {|line| md5.update(line)}
Close… try this…
require 'md5'
File.open(filename,'rb') { |f| MD5.hexdigest(f.read) }
And yes, the file is closed with the block form of open.
–Steve
On Sun, 19 Mar 2006, Stephen W. wrote:
require ‘md5’
File.open(filename,‘rb’) { |f| MD5.hexdigest(f.read) }
And yes, the file is closed with the block form of open.
–Steve
i think the OP has the right approach - note that an ‘f.read’ will
consume
2GB. but the OP’s code
harp:~ > cat a.rb
require ‘digest/md5’
md5 = Digest::MD5.new() and open(ARGV.shift, ‘rb’).each{|line| md5 <<
line}
p md5.hexdigest
will not.
regards.
-a
On Sun, 19 Mar 2006 13:49:51 +0900, [email protected]
[email protected] wrote:
i think the OP has the right approach - note that an ‘f.read’ will consume
2GB. but the OP’s code
harp:~ > cat a.rb
require ‘digest/md5’
md5 = Digest::MD5.new() and open(ARGV.shift, ‘rb’).each{|line| md5 << line}
p md5.hexdigest
will not.
In my reading of the OP, both the block-open and iteration are actually
desired:
md5 = Digest::MD5.new
File.open(file,‘rb’) do |ios|
ios.each {|line| md5 << line }
end
cheers,
andrew
From: “rtilley” [email protected]
I’m calculating md5 checksums on very large files (2 GB). This is a safe
way to do so, right? Also… is the file closed when the block exits?
I’m using ‘rb’ as this is used on Windows and Linux computers.
md5 = Digest::MD5.new()
File.open(file, ‘rb’).each {|line| md5.update(line)}
Hi - does the file really contain text lines? Or is it a file
full of binary data. If it’s a binary file, there may be no
guarantee the whole thing isn’t one very long “line”. In that
case I’d recommend reading it in chunks.
Untested:
md5 = Digest::MD5.new()
File.open(file, ‘rb’) do |io|
while (buf = io.read(4096)) && buf.length > 0
md5.update(buf)
end
end
Regards,
Bill
Andrew J. [email protected] wrote:
will not.
In my reading of the OP, both the block-open and iteration are
actually desired:
md5 = Digest::MD5.new
File.open(file,‘rb’) do |ios|
ios.each {|line| md5 << line }
end
IMHO it’s a bad idea to use line oriented reading on a binary file
because
“lines” can be arbitrary long (i.e. the whole file in worst case).
Using
IO#read is much better.
Kind regards
robert
Bill K. [email protected] wrote:
full of binary data. If it’s a binary file, there may be no
end
io.read will return nil at EOF so your test for positive length is
basically
obsolete. Also, for reasons of error checking I’d place the digest
creation
inside the block because then the digest is never created if the file
cannot
be opened:
md5 = File.open(file, ‘rb’) do |io|
dig = Digest::MD5.new
while (buf = io.read(4096))
dig.update(buf)
end
dig
end
If you want to increase efficiency, you can do this, which will prevent
new
strings to be created as buffers all the time:
md5 = File.open(file, ‘rb’) do |io|
dig = Digest::MD5.new
buf = “”
while io.read(4096, buf)
dig.update(buf)
end
dig
end
Here’s another nice variant:
md5 = File.open(file, ‘rb’) do |io|
dig = Digest::MD5.new
buf = “”
dig.update(buf) while io.read(4096, buf)
dig
end
Kind regards
robert
Robert K. wrote:
dig
end
Thank you Robert, Billy and others! Your suggestions have helped me to
solve the problem.
In article [email protected],
“Robert K.” [email protected] writes:
md5 = File.open(file, ‘rb’) do |io|
dig = Digest::MD5.new
buf = “”
while io.read(4096, buf)
dig.update(buf)
end
dig
end
Why we have no such method in the digest library?
I think it is useful enough to have in the library.
On Mon, 20 Mar 2006, Tanaka A. wrote:
Why we have no such method in the digest library?
I think it is useful enough to have in the library.
indeed. in fact this seems a good candidate to add a method to a base
class:
harp:~ > cat a.rb
require 'digest/md5'
require 'digest/rmd160'
require 'digest/sha1'
require 'digest/sha2'
#
# this in digest.rb or something equiv
#
digests = %w( MD5 RMD160 SHA1 SHA256 SHA384 SHA512 )
digests.each do |d|
digest_method = d.downcase
IO.module_eval do
define_method(digest_method) do |*argv|
bufsize = argv.shift || 8192
digest = ::Digest.const_get(d).new
buf = ''
off = pos rescue nil
begin
digest.update buf while read bufsize, buf
ensure
seek off rescue nil
end
digest
end
end
File.module_eval do
singleton_class = class << self; self; end
singleton_class.module_eval do
define_method(digest_method) do |path, *argv|
mode = argv.shift || 'r'
open(path, mode){|f| f.send digest_method}
end
end
end
end
#
# demo
#
report = {}
digests.each do |d|
digest_method = d.downcase
report.update "File##{ digest_method}" => open(__FILE__){|f|
f.send(digest_method).hexdigest}
report.update “File.#{ digest_method}” =>
File.send(digest_method, FILE).hexdigest
end
require ‘yaml’ and y report
harp:~ > ruby a.rb
---
File.md5: 2e6c1e1c3d81a871f2c6b5099ba208f3
File#md5: 2e6c1e1c3d81a871f2c6b5099ba208f3
File.rmd160: 22ad54cb48f6d00ef325f1c7ff2150cf46fd250f
File#rmd160: 22ad54cb48f6d00ef325f1c7ff2150cf46fd250f
File.sha1: 1600889b027ced6bf95dedc9803cb7c65f5aa396
File#sha1: 1600889b027ced6bf95dedc9803cb7c65f5aa396
File.sha256:
38ac0f761f16a13d2f4f51a8a8c9668656d84c29b383840579a7517b69d219a9
File#sha256:
38ac0f761f16a13d2f4f51a8a8c9668656d84c29b383840579a7517b69d219a9
File.sha384:
5882c884ea618539da50a36bfbbd0fa0cd41bfa2ee18bce5acf45965e5582e33a1a3edd269f0e3551a9c9e5cd6e77cd1
File#sha384:
5882c884ea618539da50a36bfbbd0fa0cd41bfa2ee18bce5acf45965e5582e33a1a3edd269f0e3551a9c9e5cd6e77cd1
File.sha512:
3fba99ff4d98feaf760b814e9a8f245e05881da9aa19378510172d4e7cb0a10aa98b6c1d9b22d4331f3552a5899bb5545c604dfc4620665a5b6fb0d4dc2b0b78
File#sha512:
3fba99ff4d98feaf760b814e9a8f245e05881da9aa19378510172d4e7cb0a10aa98b6c1d9b22d4331f3552a5899bb5545c604dfc4620665a5b6fb0d4dc2b0b78
comments?
-a
Why we have no such method in the digest library?
I extended the MD5 class with a class method to build an MD5
object directly from the contents of a given file.
Use it like this:
md5 = MD5.file(“foo.bar”)
gegroet,
Erik V. - http://www.erikveen.dds.nl/
require “md5”
class MD5
def self.file(file)
File.open(file, “rb”) do |f|
res = self.new
while (data = f.read(4096))
res << data
end
res
end
end
end
Erik V. wrote:
Why we have no such method in the digest library?
I extended the MD5 class with a class method to build an MD5
object directly from the contents of a given file.
Should this be done to sha1, sha2, etc?