Comparing elements in same array

luislavena · October 6, 2011, 3:06am

Hello,

I have an (name-sorted) Excel spreadsheet that looks like the following

Name Location AA /opt/files/1 AA /opt/files/2 AB /opt/files/3 AB
/opt/files/4 BB /opt/files/5 BB /opt/files/6 BB /opt/files/7 CC
/opt/files/8 CC /opt/files/9
I want to open the spreadsheet and compare the contents of the files (in
column Location) that have the same name.

I am using the gem ‘roo’ to push all file names into an array. My plan
is to
traverse the array to find the same names and open their respective
files (
I’ve created a Name => Location hash for this purpose.)

I am stuck at finding files with same names from the array that I’ve
created. any help/pointer would be appreciated.

TIA.

Thomas_Le · October 6, 2011, 3:19am

On Thu, Oct 6, 2011 at 9:06 AM, Thomas Le [email protected] wrote:

I am stuck at finding files with same names from the array that I’ve
created. any help/pointer would be appreciated.

post sample code pls
thanks -botp

Thomas_Le · October 6, 2011, 4:26am

On Thu, Oct 6, 2011 at 10:02 AM, Thomas Le [email protected] wrote:

require ‘roo’
ORGFILE = ‘organize.xlsx’
xcel = Excelx.new(“#{ORGFILE}”)

lose the inlining hash, just do
xcel = Excelx.new(ORGFILE)

same w others below…

xcel.default_sheet = xcel.sheets.first
startingRow = 2
nameArray = []
nameProfile = Hash.new
startingRow.upto(xcel.last_row) do |row|
name = xcel.cell(row,‘A’)
location = xcel.cell(row,‘B’)
nameProfile[“#{name}”] = “#{location}”

at this point, since you are associating the name to the location, you
are assumming that the name is unique, ie each name has one and only
one location. now what if you’re wrong?

nameArray.push(“#{name}”)
startingRow += 1
end

hmm, finding duplicates names in nameArray is so obvious. why do you
want it? to remove duplicate rows in your excel file? … or maybe,
you may also want to know duplicate names but different location, no?

kind regards -botp

Thomas_Le · October 6, 2011, 4:03am

Sorry … this is my sample code

require ‘roo’

ORGFILE = ‘organize.xlsx’

xcel = Excelx.new("#{ORGFILE}")
xcel.default_sheet = xcel.sheets.first

startingRow = 2
nameArray = []
nameProfile = Hash.new

startingRow.upto(xcel.last_row) do |row|

name = xcel.cell(row,'A')
location = xcel.cell(row,'B')

nameProfile["#{name}"] = "#{location}"

nameArray.push("#{name}")



startingRow += 1

end

Thanks

Thomas_Le · October 6, 2011, 6:05am

I want to find (thousands of) duplicate names in the spreadsheet so I
can
compare the corrresponding files to see if they produce the same results
…

my logic is like this “if name in column A row 2 is same as that in
colum A
row 3, open files in column B row 2 and column B row 3,…do something
else”
…

Thanks,
TL

Thomas_Le · October 6, 2011, 7:56am

On Thu, Oct 6, 2011 at 1:23 PM, botp [email protected] wrote:

nameProfile.delete_if{|k,v| v.unique.size>1}

sorry, that should be the opposite, ie,

nameProfile.delete_if{|k,v| v.unique.size<2}

thanks -botp

Thomas_Le · October 6, 2011, 4:28pm

hi Thomas,

well, if i understand correctly what you want to do, i think this…

nameProfile["#{name}"] = "#{location}"

…is going to be a problem. if the names repeat, the location will
be overwritten each time the name is found, and you’ll wind up with only
the last one found as the value for your ‘name’ key.

i might instead first set up an array of arrays with your names and
locations, so that you can account for the same name being in multiple
locations. then i would step through the array and create a hash in
which each name was a key, and the value was an array of the locations
associated with that name. it would then be simple to do whatever you
wanted with the location values stored in the array of each name key.

a dumb example:

########

nl_array = [
[“Hector”, “Barcelona”],
[“Bill”, “New York”],
[“Ivan”, “Moscow”],
[“Hector”, “San Francisco”],
[“Bill”, “Boston”]
]

nl_hash = Hash.new

nl_array.each{|entry|
unless nl_hash.has_key?(entry[0])
nl_hash[entry[0]] = [entry[1]]
else
nl_hash[entry[0]] << entry[1]
end
}

p nl_hash

########

=> {“Ivan”=>[“Moscow”], “Bill”=>[“New York”, “Boston”],
“Hector”=>[“Barcelona”, “San Francisco”]}

hth,

j

Thomas_Le · October 6, 2011, 9:59pm

This works for me beautifully … my code looks like this … still more
to
go to complete my task but I’ve passed the first hurdle.

#—start code

require ‘roo’

ORGFILE = ‘organize.xlsx’
xcel = Excelx.new(ORGFILE)
xcel.default_sheet = xcel.sheets.first

startingRow = 2
nameArray = []

startingRow.upto(xcel.last_row) do |row|
name = xcel.cell(row,‘A’)
location = xcel.cell(row,‘B’)
nameArray << [name,location]

startingRow += 1
end

nameHash = Hash.new

nameArray.each { |entry|
unless nameHash.has_key?(entry[0])
nameHash[entry[0]] = [entry[1]]
else
nameHash[entry[0]] << entry[1]
end
}

–end code

Thanks to Jake & Botp. I’ve learned a lot from your suggestions.

TL

Thomas_Le · October 7, 2011, 12:48am

glad to hear it… happy ruby-ing!

j

Thomas_Le · October 6, 2011, 7:24am

On Thu, Oct 6, 2011 at 12:04 PM, Thomas Le [email protected] wrote:

I want to find (thousands of) duplicate names in the spreadsheet so I can
compare the corrresponding files to see if they produce the same results …

ok

my logic is like this “if name in column A row 2 is same as that in colum A
row 3, open files in column B row 2 and column B row 3,…do something else”

column A is already name column. files of the same pathname are the
same. compare only if they have different pathnames.

you can try something like,

#—start code
require ‘roo’

ORGFILE = ‘organize.xlsx’
xcel = Excelx.new(ORGFILE)
xcel.default_sheet = xcel.sheets.first

startingRow = 2
nameProfile = Hash.new { |h, k| h[k] = [] }

startingRow.upto(xcel.last_row) do |row|

name = xcel.cell(row,‘A’)
location = xcel.cell(row,‘B’)

nameProfile[name] << location

startingRow += 1
end

nameProfile.delete_if{|k,v| v.unique.size>1}

#—end code

nameProfile should only contain hash of names with multiple pathnames

note, code above untested since i do not have your actual file.
try it in irb.

kind regards -botp