Reading x bytes at a time

Hi all,

I’m trying to write a little script to read files in a directory (x
bytes at a time), do an md5 checksum of the bytes and print them to a
text file.

Everything is working fine at the moment except the reading, I have
managed to get it to read the first x bytes from the file but I’m not
sure how to get it to keep reading while the EOF hasn’t been reached.

This is what I want to achieve:

  • Specify a blocksize to use (not a problem)
  • Read a file in chunks (using the blocksize)
  • md5 checksum the bytes (i’ve worked this part out)
  • write the md5sum to a file (i’ve got this also)

How can I retrieve the chunks until the EOF, maybe returning a smaller
chunk at the end if there isn’t enough data left.

I hope this post isn’t too badly written, it’s very late at night and
i’ve been googling this for ages :stuck_out_tongue:

Any help much appreciated.

Matt

I’ve just played around and found this seems to work:

File.open(path, “r”) do |fh|
while (chunk = fh.read(blocksize))
outFH.puts Digest::MD5.hexdigest(chunk) + “\n”
end
end

Is this a good way to do it?

Thanks

Matt

On 19 Aug., 02:21, Matt H. [email protected] wrote:

I’ve just played around and found this seems to work:

File.open(path, “r”) do |fh|
while (chunk = fh.read(blocksize))
outFH.puts Digest::MD5.hexdigest(chunk) + “\n”
end
end

Is this a good way to do it?

Somehow my posting from today morning neither made it to Google news
nor the mailing list. Strange…

To sum it up: yes, that’s a good way to do it. Few remarks:

You do not need + “\n” because #puts will do this already.

I prefer to open with “rb” instead of “r” in these cases. Makes
scripts more portable plus helps documenting that this is really a
binary stream.

You can preallocate the buffer, this saves a bit of GC:

File.open(path, “rb”) do |fh|
chunk = “”
while fh.read(blocksize, chunk)
outFH.puts Digest::MD5.hexdigest(chunk)
end
end

Kind regards

robert