Finding Dublicates in CSV File

Hi I have csv file that contain partno, price, quantity like below with
duplicates. but I need to find duplicates and added quantity and price.
Output like below. How can I do this?

“PARTNO”, “PRICE”,“QUANTITY”
600157,402.20,1
600187,516.69,2
600163,424.16,1
600181,492.01,3
600169,445.00,2
600175,468.96,1
600181,492.01,3
600187,516.69,2

I WANT OUT PUT LIKE BELOW

“PARTNO”, “PRICE”,“QUANTITY”
600157,402.20,1
600163,424.16,1
600169,445.00,2
600175,468.96,1
600181,984.02,6
600187,1033.38,4

====================================
txt= <<EEND
“PARTNO”,“PRICE”,“QUANTITY”
600157,402.20,1
600187,516.69,2
600163,424.16,1
600181,492.01,3
600169,445.00,2
600175,468.96,1
600181,492.01,3
600187,516.69,2
EEND

def convert(txt)
txt.split(/\r?\n/)[1…-1].
map {|line| line.split(",").map {|f| f.to_f }}.
each_with_object({}) { |line,h|
h[line.first] = (h[line.first]||[]) << line[1…2]
}.
map { |k,v| [k]+v.each_with_object([0,0]) { |(i1,i2),a|
a[0],a[1]=a[0]+i1,a[1]+i2 }
}.
each_with_object([]) {|(k,i1,i2),a|
a << [k.to_i,i1,i2.to_i].join(",")
}.join("\n")
end

puts convert(txt)

600157,402.2,1
600187,1033.38,4
600163,424.16,1
600181,984.02,6
600169,445.0,2
600175,468.96,1

Thanks for reply. Could you explain your code. what is a purpose of
following line. I have report.csv file. I have to open then I have sort
it out and print it.

txt= <<EEND

EEND

Nithi anand wrote in post #1167354:

Thanks for reply. Could you explain your code.
txt.split(/\r?\n/)[1…-1].
explode the string content in a array of lines, excluding
the first line (titles)

 map {|line| line.split(",").map {|f| f.to_f }}.
    transform each line in a array of float

 each_with_object({}) { |line,h|
    h[line.first] = (h[line.first]||[]) << line[1..2]
 }.
    create a hash, create an entry for each product id,
    put in it a list of all occurences of record of same id

 map { |k,v| [k]+v.each_with_object([0,0]) { |(i1,i2),a|
   a[0],a[1]=a[0]+i1,a[1]+i2 }
 }.
    for each entry in the hash, make the sum of datas, column by
    column

 each_with_object([]) {|(k,i1,i2),a|
   a << [k.to_i, i1, i2.to_i].join(",")
 }.join("\n")
     create a list of string containing hash data (hash key id,
     price and count) join field with ',' , then join lines with
     EOL char

if you whant get a traces of what append at each step, you can insert
this code :
tap {|o| p o}.

I have report.csv file. I have to open then I have sort
it out and print it.

txt= <<EEND

EEND

for make a response with tested code, i prefer put data and code
in same source.

so yo can do, in place of <<EEND…EEND :

txt=File.read(filename)

Regis d’Aubarede wrote in post #1167356:

Nithi anand wrote in post #1167354:

Thanks for reply. Could you explain your code.
txt.split(/\r?\n/)[1…-1].
explode the string content in a array of lines, excluding
the first line (titles)

 map {|line| line.split(",").map {|f| f.to_f }}.
    transform each line in a array of float

 each_with_object({}) { |line,h|
    h[line.first] = (h[line.first]||[]) << line[1..2]
 }.
    create a hash, create an entry for each product id,
    put in it a list of all occurences of record of same id

 map { |k,v| [k]+v.each_with_object([0,0]) { |(i1,i2),a|
   a[0],a[1]=a[0]+i1,a[1]+i2 }
 }.
    for each entry in the hash, make the sum of datas, column by
    column

 each_with_object([]) {|(k,i1,i2),a|
   a << [k.to_i, i1, i2.to_i].join(",")
 }.join("\n")
     create a list of string containing hash data (hash key id,
     price and count) join field with ',' , then join lines with
     EOL char

if you whant get a traces of what append at each step, you can insert
this code :
tap {|o| p o}.

I have report.csv file. I have to open then I have sort
it out and print it.

txt= <<EEND

EEND

for make a response with tested code, i prefer put data and code
in same source.

so yo can do, in place of <<EEND…EEND :

txt=File.read(filename)

What if I have csv file like following

“LOTNO”,“GROUPNO”,“PARTNO”, “PRICE”,“QUANTITY”

00,11,600157,402.20,1
01,22,600187,516.69,2
02,33,600163,424.16,1
03,44,600181,492.01,3
04,55,600169,445.00,2
05,66,600175,468.96,1
06,77,600181,492.01,3
07,88,600187,516.69,2

OUT PUT SHOULD BE LIKE THIS

“LOTNO”,“GROUPNO”,“PARTNO”, “PRICE”,“QUANTITY”

00,11,600157,402.20,1
01,22,600187,516.69,4
02,33,600163,424.16,1
03,44,600181,492.01,6
04,55,600169,445.00,2
05,66,600175,468.96,1

Also, MY Partnumber has letters mix with number. Please see my attached
csv file.

Hi Regis,
Thanks a lot for explain step by step.

First off all: Your CSV isn’t well formed. There are duplicate tabs,
more tabs than headers, …

If you can get the CSV well formed (see attachment), then it’s quite
simple:

require “csv”
report = CSV.read(“report1_well-formed.csv”)
puts report.uniq

You can print only the duplicate like this:

puts report.group_by{ |e| e }.select { |k, v| v.size > 1 }.map(&:first)

(Source:

)

F. E. wrote in post #1167453:

First off all: Your CSV isn’t well formed. There are duplicate tabs,
more tabs than headers, …

If you can get the CSV well formed (see attachment), then it’s quite
simple:

require “csv”
report = CSV.read(“report1_well-formed.csv”)
puts report.uniq

You can print only the duplicate like this:

puts report.group_by{ |e| e }.select { |k, v| v.size > 1 }.map(&:first)

(Source:

)

I need to find duplicates then add quantity and price. Also, My part
number contain alpha numeric.

I see. Not duplicate lines, but lines with the same part numbers.
If you are interested in exactly identical part numbers, it doesn’t
matter if letters and digits are mixed.

You could go from here:

require “csv”
report = CSV.read(“report1_well-formed.csv”, headers: true, col_sep:
“\t”, converters: :numeric)
report_sums = CSV.generate( headers: %w(PARTNUMBER QUANTITY PRICE),
write_headers: true,
col_sep: “\t”
) do |csv|
report.group_by{ |row| row[“PARTNUMBER”] }.each do |partnumber, rows|
quantity = rows.inject(0){ |sum, row| sum + row[“QUANTITY”] }
price = rows.inject(0){ |sum, row| sum + row[“PRICE”] }
csv << [partnumber, quantity, price]
end
end
puts report_sums

Beware, there are a few problems:
First of all you should add tests, clean up the code, remove duplication
and define proper classes and functions for your use case.

Then there are problems with adding Floats. You can see it in the sum
for 602712SD.
You can find more details about this quirk over here:

To calculate with currency you should use BigDecimal.
http://qugstart.com/blog/ruby-and-rails/ruby-floats-bigdecimals-and-money-currency/

Hi Regis,

Your code works great,but how do I do if my part number mixed with alpha
numeric. I tried F.E.'s code doesn’t make it work. I am newbie to csv.
please help.

Nithi anand wrote in post #1167571:

Hi Regis,

Your code works great,but how do I do if my part number mixed with alpha
numeric. I tried F.E.'s code doesn’t make it work. I am newbie to csv.
please help.

def convert(txt)
txt.split(/\r?\n/)[1…-1].
map {|line| id,k,s=line.split(","); [id,k.to_f,s.to_i]}.
group_by {|a| a.first}.
map { |k,v| [k]+v.each_with_object([0,0]) { |(k,i1,i2),a|
a[0],a[1]=a[0]+i1,a[1]+i2 }
}.
each_with_object([]) {|(k,i1,i2),a|
a << [k,i1,i2.to_i].join(",")
}.join("\n")
end

One day, you should learn ruby, and resolve problem by yourself :slight_smile:

Thanks. I will.