I have two CSV files and i want to find the duplicates records.
For ex.
Sheet1 Sheet2
Vipul Anthony
John Wayne
Mac Bill
smith randy
Nick thalia
Trishi
ricky
sachin
Nick
So i want to check the appearance of each record of Sheet1 in the
sheet2, like i want to check if vipul exist in sheet2 or not.
Is there any automated tool for it or do i have to do it manually.
Thanks in advance.
Vipul jindal
on 21.08.2008 22:05
on 21.08.2008 22:33
On 21.08.2008 22:02, viupljindal@gmail.com wrote: > Trishi > ricky > sachin > Nick > > > So i want to check the appearance of each record of Sheet1 in the > sheet2, like i want to check if vipul exist in sheet2 or not. > > Is there any automated tool for it or do i have to do it manually. Not sure what you want (or rather, why you are asking here if you just want a quick solution) but there is of course >> diff -u <(sort Sheet1) <(sort Sheet2) Note, you need a decent shell and OS for this (bash on Linux will do). Otherwise you need temporary files. Cheers robert
on 22.08.2008 02:52
On Thu, Aug 21, 2008 at 1:01 PM, <viupljindal@gmail.com> wrote: > Trishi > ricky > sachin > Nick if it's just one word per line and the files aren't huge a = IO.readlines('sheet1') b = IO.readlines('sheet2') puts a & b martin
on 22.08.2008 06:27
I'm not a master of the language yet, but based on what I've learned, can't you create two arrays out of those files, let's say sheet1, sheet2. Then simply do a "sheet2 - sheet1" ~Mayuresh
on 22.08.2008 09:22
To identify duplicates the 'array1 & array2' solution given above is perfect. See examples below: # setting up two arrays with your names irb(main):004:0> sheet1 = %w[vipul john mac smith nick] => ["vipul", "john", "mac", "smith", "nick"] irb(main):005:0> sheet2 = %w[anthony wayne bill randy thalia trishi ricky sachin nick] => ["anthony", "wayne", "bill", "randy", "thalia", "trishi", "ricky", "sachin", "nick"] # finding common elements, note the order is inconsequential irb(main):006:0> sheet2 & sheet1 => ["nick"] irb(main):007:0> sheet1 & sheet2 => ["nick"] # Determining what items are unique to the first array. (What items are in the first list that are not in the second?) Note order matters here. irb(main):008:0> sheet2 - sheet1 => ["anthony", "wayne", "bill", "randy", "thalia", "trishi", "ricky", "sachin"] irb(main):009:0> sheet1-sheet2 => ["vipul", "john", "mac", "smith"] irb(main):010:0> -Tim