Compare 2 text files - check for difference - Please help


#1

Hi. I want to take two files that are supposed to be identical, then ook
for any difference in the two.

Text1.txt Text2.txt
aaaaaa aaaaab

For example, the above text files comparison would return that ‘b’ was
found disimilar during the comparison. I have tried to upload new gems
such as Diff:LCS however this did not work. I am not sure what I am
doing wrong. Can you please point me in the right directions? Thanks.
MC


#2

Mmcolli00 Mom wrote:

I have tried to upload new gems
such as Diff:LCS however this did not work. I am not sure what I am
doing wrong.

“This did not work” is not a useful problem description.

Post your code (preferably a minimal test program), and post exactly
what you see when you run it.

An alternative approach might be to write the data to two temporary
files (if not already in files) and run diff -u file1 file2 (e.g. using
system() or IO.popen)


#3

require ‘diff/lcs/Array’

file = File.open(“1.txt”,‘r’)
@row = {}
file.each_line do |@line|
key, val = @line.chomp.split(",",0)
@row[key] = val
end

file2 = File.open(“2.txt”,‘r’)
@row2 = {}
file2.each_line do |@line2|
key2, val2 = @line2.chomp.split(",",0)
@row2[key2] = val2
end

puts diffs = Diff::LCS.diff(@line,@line2)

testing files contents…
1.txt 2.txt
1,2 1,2,3

outputed when files different as above:
#Diff::LCS::Change:0x2e5d95c
#Diff::LCS::Change:0x2e5da60
#Diff::LCS::Change:0x2e5d90c

outputed when files are identicial
#Diff::LCS::Change:0x2e5da88

I want to be able to say that 2.txt contained ‘3’ therefore the file was
not the same on that line. Thanks. MC


#4

I want to get the difference. I need the difference to prove what is
changing in some files and not in others. I’ll run it routinely to check
file content.


#5

Are you wanting to get the actual differences, or just know if they are
different?

– Josh
http://iammrjoshua.com

Mmcolli00 Mom wrote:

require ‘diff/lcs/Array’

file = File.open(“1.txt”,‘r’)
@row = {}
file.each_line do |@line|
key, val = @line.chomp.split(",",0)
@row[key] = val
end

file2 = File.open(“2.txt”,‘r’)
@row2 = {}
file2.each_line do |@line2|
key2, val2 = @line2.chomp.split(",",0)
@row2[key2] = val2
end

puts diffs = Diff::LCS.diff(@line,@line2)

testing files contents…
1.txt 2.txt
1,2 1,2,3

outputed when files different as above:
#Diff::LCS::Change:0x2e5d95c
#Diff::LCS::Change:0x2e5da60
#Diff::LCS::Change:0x2e5d90c

outputed when files are identicial
#Diff::LCS::Change:0x2e5da88

I want to be able to say that 2.txt contained ‘3’ therefore the file was
not the same on that line. Thanks. MC


#6

I see.

I’ve never used Diff::LCS personally, but I can tell that to get more
info about what those diff objects contain you could try something like
this:

diffs = Diff::LCS.diff(@line,@line2)

diffs.map(&:inspect)

That will output the contents of the object to a string so you can see
what’s there.

– Josh
http://iammrjoshua.com

Mmcolli00 Mom wrote:

I want to get the difference. I need the difference to prove what is
changing in some files and not in others. I’ll run it routinely to check
file content.


#7

Mmcolli00 Mom wrote:

require ‘diff/lcs/Array’

file = File.open(“1.txt”,‘r’)
@row = {}
file.each_line do |@line|
key, val = @line.chomp.split(",",0)
@row[key] = val
end

file2 = File.open(“2.txt”,‘r’)
@row2 = {}
file2.each_line do |@line2|
key2, val2 = @line2.chomp.split(",",0)
@row2[key2] = val2
end

puts diffs = Diff::LCS.diff(@line,@line2)

You are using a very odd way of iterating, which is explicitly forbidden
in ruby 1.9. If you write

foo.each do |@bar| …

then for each element of foo, the instance variable @bar is set to that
element. So the net result here is that after the loops have finished,
@line remains set to the last line of 1.txt, and @line2 remains set to
the last line of 2.txt. You can demonstrate this by adding

p @line, @line2

just before calling Diff::LCS.

(In ruby 1.9, block parameters must be local variables, and are always
local to the block - they always drop out of scope afterwards)

So, your test program simplifies to the following:

require ‘rubygems’
require ‘diff/lcs/array’

@line = “1,2\n”
@line2 = “1,2,3\n”
puts diffs = Diff::LCS.diff(@line,@line2)

The question is, what do you expect this to do?

If you replace ‘puts’ with ‘p’ in the last line, you get the following
more detailled output:

[[#<Diff::LCS::Change:0xb7be613c @element=",", @action="+",
@position=3>, #<Diff::LCS::Change:0xb7be60d8 @element=“3”, @action="+",
@position=4>], [#<Diff::LCS::Change:0xb7be5fac @element="", @action="-",
@position=4>]]

My guess is that Diff::LCS is treating the string as a sequence of
bytes. The first change is [add “,” at pos 3, add “3” at pos 4], which
is correct. The second change is strange, as it seems to say [“remove
nothing from pos 4”]

However, since all the examples in the README show two arrays being
passed, and here you’re passing in two strings, I’m not sure this is
even a supported way of working with this library.

Your code also builds two hashes, @row and @row2, but doesn’t seem to
use them at all. Were you trying to do something with them?

Finally, your use of split may not behave the way you expect:

irb(main):002:0> key2, val2 = “1,2,3\n”.chomp.split(",",0)
=> [“1”, “2”, “3”]
irb(main):003:0> key2
=> “1”
irb(main):004:0> val2
=> “2”

That is, you’re ignoring everything after the second field.

I can strongly recommend playing around with expressions in irb, and
adding snippets of “p …expression…” within your code, to get a feel
for what’s happening.

HTH,

Brian.


#8

I want to be able to say that 2.txt contained ‘3’ therefore the file was
not the same on that line. Thanks. MC

Then perhaps you want to feed in the lines as arrays:

require ‘rubygems’
require ‘diff/lcs/array’

lines1 = lines2 = nil
File.open(“1.txt”) { |f| lines1 = f.readlines }
File.open(“2.txt”) { |f| lines2 = f.readlines }

p diffs = Diff::LCS.diff(lines1, lines2)

This gives me the following output:

[[#<Diff::LCS::Change:0xb7c17fc0 @element=“1,2\n”, @action="-",
@position=0>, #<Diff::LCS::Change:0xb7c180d8 @element=“1,2,3\n”,
@action="+", @position=0>]]

That is:

  • there was a single change (first element of the array)
  • this change had two parts (two elements to inner array)
    • remove “1,2\n” at pos 0, i.e. the first line
    • add “1,2,3\n” at pos 0

It gets more interesting if you do other changes. For example, if 1.txt
contains

1,2
3,4
5,6
7,8
9,10
11,12
13,14
15,16

and 2.txt contains

1,2
3,4,5
9,9,9
5,6
7,8
9,10
11,12
13,14
15,16
17,18

Then the output becomes:(*)

[[#<Diff::LCS::Change:0xb7c19a3c @action="-", @element=“3,4\n”,
@position=1>,
#<Diff::LCS::Change:0xb7c199d8 @action="+", @element=“3,4,5\n”,
@position=1>,
#<Diff::LCS::Change:0xb7c19988 @action="+", @element=“9,9,9\n”,
@position=2>],
[#<Diff::LCS::Change:0xb7c19578 @action="+", @element=“17,18\n”,
@position=9>]]

That is: first change bundle is remove the 3,4\n at the second line (#1,
counting from zero), add 3,4,5\n, and add 9,9,9\n. The third change
bundle is to add 17,18\n at the tenth line.

HTH,

Brian.

(*) You can also change p to pp, and add ‘require “pp”’ to the top of
the file, to get alternative pretty-print formatting.


#9

Mmcolli00 Mom wrote:

I just used this on my txt files however, I can’t get the script to
output only when a line was disimillar. Right now it shows everything.

p sdiff = lines1.sdiff(lines2)

I want to output the following if sdiff returns the @action=!
[#<Diff::LCS::ContextChange:24309500 @action=! positions=0,0
elements=“1,2\n”,“1,2,3\n”>

I don’t understand.

If the two files are identical, you should get an empty array. Is that
not the case? You can test for an empty array using sdiff.empty?

If the two files are not identical, you will get a series of elements
telling you chunks which are different, and for each chunk what is
different. This is similar to the “diff” command at the shell.


#10

I have a huge xml file that I am reading in after a submit routine. The
routine fails on specific xml elements. I want to be able find every
element that was changed by the routine through after the second submit
routine.

I can already check if the file is different. I just want to pinpoint
the value that is different. I have noticed that if the context has
changed then the output will show @action=! and then the value such as
([#<Diff::LCS::ContextChange:24309500 @action=! positions=0,0

elements=“1,2\n”,“1,2,3\n”>) My problem is that one line can be so long that I don’t see exactly which element is dissimiar. So I wanted to break each line up then search on anything returning this ‘@action=!’


#11

I just used this on my txt files however, I can’t get the script to
output only when a line was disimillar. Right now it shows everything.

p sdiff = lines1.sdiff(lines2)

I want to output the following if sdiff returns the @action=!
[#<Diff::LCS::ContextChange:24309500 @action=! positions=0,0
elements=“1,2\n”,“1,2,3\n”>

Do you know if there is a way to output only the disimilarities when the
@action=! is outputed by sdiff? Right now, it shows every line
similar/disimilar beginning with #<Diff:

#************************************************************
require ‘rubygems’
require ‘diff/lcs/array’

lines1 = lines2 = nil
File.open(“1.txt”) { |f| lines1 = f.readlines }
File.open(“2.txt”) { |f| lines2 = f.readlines }

p sdiff = lines1.sdiff(lines2)

if sdiff =~ /@action!/
then puts sdiff
end


#12

If anything, do you know how to get this into a new text file?

require ‘rubygems’
require ‘diff/lcs/array’

lines1 = lines2 = nil
File.open(“xml1.txt”) { |f| lines1 = f.readlines}
File.open(“xml2.txt”) { |f| lines2 = f.readlines }

diffs = Diff::LCS.diff(lines1, lines2)
sdiff = Diff::LCS.sdiff(lines1,lines2)

p sdiff = Diff::LCS.sdiff(lines1, lines2)

File.open('log.txt', 'w') do |f1|
    f1.puts Diff::LCS.sdiff(lines1, lines2)
    f1.close
 end

(this is what it outputs…it doesn’t show @action, element like the p
sdiff creates)

#Diff::LCS::ContextChange:0x2e2c424
#Diff::LCS::ContextChange:0x2e2c30c
#Diff::LCS::ContextChange:0x2e2c1f4
#Diff::LCS::ContextChange:0x2e2c0b4