Forum: Ruby best/better way of md5suming of really large file in ruby?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Kyle S. (Guest)
on 2009-04-22 18:02
(Received via mailing list)
I've got a script that is going through data, and in some cases,
generating md5s of the files.  Normally this isn't a problem, but I've
got a few largish (~2G) files in there, and my script is dying on it.
I ran it in a screen so I'm not sure the exact error it threw, but I'm
re-running just that part now to find out.  In the meanwhile, any
suggestions?

This is how I'm generating the md5sum right now....
Digest::MD5.hexdigest(File.read(fn))

--Kyle
Yun Huang Y. (Guest)
on 2009-04-22 18:20
(Received via mailing list)
Kyle S. wrote:
> I've got a script that is going through data, and in some cases,
> generating md5s of the files.  Normally this isn't a problem, but I've
> got a few largish (~2G) files in there, and my script is dying on it.
> I ran it in a screen so I'm not sure the exact error it threw, but I'm
> re-running just that part now to find out.  In the meanwhile, any
> suggestions?

I googled for 'md5 large files' and ended up here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

yun
Reid T. (Guest)
on 2009-04-22 18:36
(Received via mailing list)
On Wed, 2009-04-22 at 23:18 +0900, Yun Huang Y. wrote:
>
> yun
>
rthompso@raker /cpartition/hold $ ls -rlt dummyfile
-rw-r--r-- 1 rthompso staff 2147483648 2009-04-22 10:27 dummyfile
rthompso@raker /cpartition/hold $ irb
irb(main):001:0> result = %x[md5sum dummyfile]
=> "a981130cf2b7e09f4686dc273cf7187e  dummyfile\n"
irb(main):002:0> p result
"a981130cf2b7e09f4686dc273cf7187e  dummyfile\n"
=> nil
irb(main):003:0> def timeit
irb(main):004:1> tstart = Time.now
irb(main):005:1> result = %x[md5sum dummyfile]
irb(main):006:1> tend = Time.now
irb(main):007:1> elapsed = tend - tstart
irb(main):008:1> puts elapsed.to_s
irb(main):009:1> end
=> nil
irb(main):011:0> timeit
10.633416
=> nil
Reid T. (Guest)
on 2009-04-22 18:52
(Received via mailing list)
On Wed, 2009-04-22 at 23:34 +0900, Reid T. wrote:
> > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
> => nil
> => nil
>
>
more realistic...
rthompso@raker /cpartition/hold $ dd if=/dev/urandom of=dummyfile
count=4M
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB) copied, 529.518 s, 4.1 MB/s
rthompso@raker /cpartition/hold $ irb
irb(main):001:0>  def timeit
irb(main):002:1> tstart = Time.now
irb(main):003:1>  result = %x[md5sum dummyfile]
irb(main):004:1> tend = Time.now
irb(main):005:1> elapsed = tend - tstart
irb(main):006:1>  puts elapsed.to_s
irb(main):007:1> end
=> nil
irb(main):008:0> timeit
49.366641
=> nil
irb(main):009:0> timeit
48.416673
=> nil
irb(main):010:0>
Kyle S. (Guest)
on 2009-04-22 19:21
(Received via mailing list)
Thanks both of you.  I'd rather not shell out using %x[, but I may end
up doing that.  I tried the modified MD5, and it actually ran in close
to the same time on my work machine, have to see how it does against
my home one.

--Kyle
This topic is locked and can not be replied to.