Trouble with Pushing Arrays to Arrays

Hello everyone! I’m having some trouble with pushing an array to another
array. Maybe this is an easy fix that I’m just not seeing? Here’s my
code:

require ‘nokogiri’
require ‘open-uri’

url = ‘C:\users\derek\desktop\Schedule2.html’
doc = Nokogiri::HTML(open(url))

raw_course_list = Array.new
temp = Array.new

doc.css(“tr”).each { |row| # Search through every row
row.css(“td”).each { |column|
temp.push(column.text.strip)
}
# puts temp.class #=> Array
# puts temp.size #=> 20
# puts temp[7] # E.g. => “Intro to Financial Accounting”
raw_course_list.push(temp)
# puts "raw_course_list[0]: " + raw_course_list[0].to_s # ****
Always returns the last temp pushed, not what should be at [0] (the
first array ever pushed) ****
# print "raw_course_list.size: ", raw_course_list.size, “\n” #
Correctly increases +1 each push
temp.clear
}

puts raw_course_list.each { |i| puts i.to_s} # **** Prints out line
after line of empty arrays ****

Oh and a brief explanation:

The code goes through HTML containing courses listed in tables, where
each table has 20 columns of data. This segment of code ultimately is
supposed to create an array of arrays – i.e. each element in
raw_course_list is an array containing each of the 20 columns in one row
of the HTML file.

And one last thing: As you can see from the commented out code, the temp
array is working as intended – every time the row.css(“td”).each
completes, it’s filled with data from all the column in the HTML… The
problem occurs (I’m guessing) when I push temp to raw_course_list.

The problem is that you’re reusing temp, and clearing it.

Doh! I knew it was something easy. The raw_course_list is pushing a
pointer to temp, not creating a copy of temp to push. Is this correct?

raw_course_list = doc.css(“tr”).map { |row|
row.css(“td”).map { |column| column.text.strip }
}

Also, this coding worked perfectly. It’s very elegant; thanks! I had to
look up map in the API to fully understand what was going on there :slight_smile:

Hi –

On Sat, 17 Apr 2010, Derek C. wrote:

The problem is that you’re reusing temp, and clearing it.

Doh! I knew it was something easy. The raw_course_list is pushing a
pointer to temp, not creating a copy of temp to push. Is this correct?

Yes, essentially, though the more standard word in Ruby is
“reference”.

David


David A. Black, Senior Developer, Cyrus Innovation Inc.

THE Ruby training with Black/Brown/McAnally
COMPLEAT Coming to Chicago area, June 18-19, 2010!
RUBYIST http://www.compleatrubyist.com

Hi –

On Sat, 17 Apr 2010, Derek C. wrote:

raw_course_list = Array.new
# puts "raw_course_list[0]: " + raw_course_list[0].to_s # ****
Always returns the last temp pushed, not what should be at [0] (the
first array ever pushed) ****
# print "raw_course_list.size: ", raw_course_list.size, “\n” #
Correctly increases +1 each push
temp.clear
}

puts raw_course_list.each { |i| puts i.to_s} # **** Prints out line
after line of empty arrays ****

The problem is that you’re reusing temp, and clearing it.

temp = []
result = []

%w{ one two three }.each do |word|
temp.push(word)
result.push(temp)
temp.clear
end

p result # => [[], [], []]
p result.map {|obj| obj.object_id } # => [606420, 606420, 606420]

I end up with three copies of temp inside result, and temp is empty.

Try creating the temp array inside the loop, and not clearing it. Or
let Ruby do more of the work:

raw_course_list = doc.css(“tr”).map { |row|
row.css(“td”).map { |column| column.text.strip }
}

(if I’ve got the logic right – if not, tweak as needed :slight_smile:

David