Comparing files

Hello folks,

I’m writing some tests for file upload code. The files are binary,
images mostly. I’m futzing around a bit, trying to figure out how to
assert that the uploaded file is the same as some golden master. If I do
this:

File.read(uploaded_file_path).should == File.read(path_to_expected_file)

Then when it fails, I get an ugly diff of the difference between the
binary files. So I’m about to invent something of my own. Has anyone got
a good pattern for doing this already?

cheers,
Matt

[email protected]
07974 430184

On 10 Dec 2010, at 15:56, Matt W. wrote:

[email protected]
07974 430184

Too slow people. Jeez!

I invented something: https://gist.github.com/736421

cheers,
Matt

[email protected]
07974 430184

I would just compare the size. Or you can get the MD5 and compare that
to be 100% sure, and avoid the ugly diff too.

On 10/12/2010 15:56, Matt W. wrote:

Hello folks,

I’m writing some tests for file upload code. The files are binary, images
mostly. I’m futzing around a bit, trying to figure out how to assert that the
uploaded file is the same as some golden master. If I do this:

File.read(uploaded_file_path).should == File.read(path_to_expected_file)

Then when it fails, I get an ugly diff of the difference between the binary
files. So I’m about to invent something of my own. Has anyone got a good pattern
for doing this already?

In the past I’ve generated a checksum for each file and compared that.

– Joseph W. http://blog.josephwilk.net http://www.songkick.com +44
(0)7812 816431


Joseph W.

+44 (0)7812 816431

On 10/12/2010 15:56, Matt W. wrote:

Hello folks,

I’m writing some tests for file upload code. The files are binary, images
mostly. I’m futzing around a bit, trying to figure out how to assert that the
uploaded file is the same as some golden master. If I do this:

File.read(uploaded_file_path).should == File.read(path_to_expected_file)

Then when it fails, I get an ugly diff of the difference between the binary
files. So I’m about to invent something of my own. Has anyone got a good pattern
for doing this already?

In the past I’ve generated a checksum for each file and compare that.


Joseph W. http://blog.josephwilk.net http://www.songkick.com +44
(0)7812 816431


Joseph W.

+44 (0)7812 816431

Cyclic redundancy check (crc)

Sent from my iPhone

On Fri, Dec 10, 2010 at 3:56 PM, Matt W. [email protected] wrote:

Hello folks,

I’m writing some tests for file upload code. The files are binary, images
mostly. I’m futzing around a bit, trying to figure out how to assert that the
uploaded file is the same as some golden master. If I do this:

File.read(uploaded_file_path).should == File.read(path_to_expected_file)

Then when it fails, I get an ugly diff of the difference between the binary
files. So I’m about to invent something of my own. Has anyone got a good pattern
for doing this already?

In the past I’ve generated a checksum for each file and compare that.


Joseph W.

+44 (0)7812 816431

Matt,

On 12/10/10 9:56 AM, Matt W. wrote:

Then when it fails, I get an ugly diff of the difference between the
binary files. So I’m about to invent something of my own. Has anyone
got a good pattern for doing this already?

I don’t already have it, but I’ve long wanted to implement a visual
image diff based on an exclusive-or of the two images. This would give
a quick visual demonstration of the differences.

I once saw (but now cannot find) and image compare library that did a
“fuzzy compare” that wasn’t fooled by pixel differences. I’ve looked
for this several times, but haven’t been able to turn it up again. It’s
out there, though, in the scientific community (IIRC) rather than the
software testing community.

  • George


Dec. 14 - Agile Richmond in Glen Allen, VA
http://georgedinwiddie.eventbrite.com/


On 12/10/10 8:56 AM, Matt W. wrote:

Hello folks,

I’m writing some tests for file upload code. The files are binary, images
mostly. I’m futzing around a bit, trying to figure out how to assert that the
uploaded file is the same as some golden master. If I do this:

File.read(uploaded_file_path).should == File.read(path_to_expected_file)

Then when it fails, I get an ugly diff of the difference between the binary
files. So I’m about to invent something of my own. Has anyone got a good pattern
for doing this already?

cheers,
Matt

I would compare the file’s MD5 (or other) hash. It won’t tell you what
is different… just that they aren’t identical which is what I think you
want. So… something like:

Digest::MD5.hexdigest(File.read(uploaded_file_path)).should ==
Digest::MD5.hexdigest(File.read(path_to_expected_file))

-Ben

On 12/10/10 12:07 PM, Josh C. wrote:

On Fri, Dec 10, 2010 at 4:30 PM, George Dinwiddie
[email protected] wrote:

I once saw (but now cannot find) and image compare library that did a “fuzzy
compare” that wasn’t fooled by pixel differences. I’ve looked for this
several times, but haven’t been able to turn it up again. It’s out there,
though, in the scientific community (IIRC) rather than the software testing
community.

I think some teams at the BBC have used this tool for that kind of
fuzzy image comparison: http://pdiff.sourceforge.net/

Josh, thank you for that!

  • George


Dec. 14 - Agile Richmond in Glen Allen, VA
http://georgedinwiddie.eventbrite.com/


On Fri, Dec 10, 2010 at 4:30 PM, George Dinwiddie
[email protected] wrote:

I once saw (but now cannot find) and image compare library that did a “fuzzy
compare” that wasn’t fooled by pixel differences. I’ve looked for this
several times, but haven’t been able to turn it up again. It’s out there,
though, in the scientific community (IIRC) rather than the software testing
community.

I think some teams at the BBC have used this tool for that kind of
fuzzy image comparison: http://pdiff.sourceforge.net/

Josh

On 10 Dec 2010, at 16:21, Ben M. wrote:

Matt

I would compare the file’s MD5 (or other) hash. It won’t tell you what is
different… just that they aren’t identical which is what I think you want.
So… something like:

Digest::MD5.hexdigest(File.read(uploaded_file_path)).should ==
Digest::MD5.hexdigest(File.read(path_to_expected_file))

-Ben

Great minds, Ben :slight_smile:

I ended up with this:

cheers,
Matt

[email protected]
07974 430184

On 12/11/10 2:03 AM, Matt W. wrote:

https://gist.github.com/736421

cheers,
Matt

Yeah, that is a keeper… I don’t know why my last email took several
hours to reach the mailing list. Very odd…

-Ben