Export to CSV has array items under a single row

aris · September 10, 2012, 9:35pm

I posted this issue over on StackOverflow but haven’t received a reply,
so I figure I’d try here too.

I have a number of HTML files that I’m trying to pull the content from a
table containing about 20-30 rows. I can pull the data I need but the
output to CSV has all items in the array under a single cell/row in the
CSV. I can’t figure out how to get each item in it’s own row. Might I
ask if someone can give me a little guidance on what I’m missing? I’m
still too new to Ruby to figure it out and I’ve searched all over Google
and can’t find out what I’m doing wrong.

allan_a · September 11, 2012, 1:55am

On Mon, Sep 10, 2012 at 12:35 PM, Allan A. [email protected] wrote:

output to CSV has all items in the array under a single cell/row in the
CSV. I can’t figure out how to get each item in it’s own row.

What method are you calling and what arguments does it expect?
Does the object or objects you’re supplying meet those expectations?

I’m guessing not

allan_a · September 11, 2012, 3:56am

Hassan S. wrote in post #1075385:

What method are you calling and what arguments does it expect?
Does the object or objects you’re supplying meet those expectations?

I’m guessing not

I suppose you’re correct, since I don’t really know what I’m doing. =)
To be honest, I’ve just tried following some sites & tutorials on site
scraping as I’m just trying to grab some info from a vBulletin forum.
I’ve been able to pull single items from multiple pages fine but this is
pulling multiple items on the page.

You can see the sample of what I’m pulling on the above link but here is
my code:

require ‘nokogiri’
require ‘open-uri’
require ‘csv’

@thread = Array.new
@thumb = Array.new

files = CSV.read(“urls.csv”)
(0…files.length - 1).each do |index|
puts files[index][0]

#load HTML to Nokogiri
doc = Nokogiri::HTML(open(files[index][0]))

#find the hyperlink
threadtd = doc.css(‘tbody#threadbits_forum_406 tr td.alt2 a’).map {
|link| link[‘href’] }

#find the div with the background img
thumbtd = doc.css(‘tbody#threadbits_forum_406 tr td.alt2 a div’)

@thread << threadtd
@thumb << thumbtd

end

CSV.open(“thumbs.csv”, “wb:UTF-8”) do |row|
row << [“Thread”, “Thumbnail”]
(0…files.length - 1).each do |index|
row << [
@thread[index],
@thumb[index]]
end
end

The output that I see is fine (though, I’d like to pull the
‘background:url()’ value rather than the whole empty div) but in the
CSV, it’s grouped in the one row and the div’s aren’t comma separated:

Thread,Thumbnail
“[”“http://LINK1"”, ““http://LINK2"”, ““http://LINK3"”,
““http://LINK4"”]”,”<div 1><div 2><div 3><div
4>”

Again, I’m sure I’m screwing up or leaving out something important and I
apologize if it’s something that should be obvious!

allan_a · September 11, 2012, 4:42am

On Mon, Sep 10, 2012 at 6:56 PM, Allan A. [email protected] wrote:

TL;DR

The output that I see is fine (though, I’d like to pull the
‘background:url()’ value rather than the whole empty div) but in the
CSV, it’s grouped in the one row and the div’s aren’t comma separated:

Thread,Thumbnail
“[”“http://LINK1"”, ““http://LINK2"”, ““http://LINK3"”,
““http://LINK4"”]”,”<div 1><div 2><div 3><div
4>”

If your issue is that

“<div 1><div 2><div 3>

4>

”

is a string, rather than an array – figure out why you’re not
delivering
it as one. That should be fairly straightforward. Break the processing
down into discrete steps and see where it’s failing.