Entering data into Excel, in specific iterated rows/columns

I have a script that spiders a page looking for elements. In this
specific case, I’m looking for links (innerText and href).

I have this script working so far to output data to a text file, but
I’d like to output it to an Excel file for organization. I’m getting
stuck however trying to think of the correct method to put all of the
link.innerText values into A1, A2, A3, etc and the corresponding
link.href into B1, B2, etc.

Here is the code I have so far (with the normal text file code in place
of the excel code I have yet to think of):

ef grab_links(url)
$ie.goto(url)
excel = WIN32OLE.new(“excel.application”)
excel.visible = true
workbook = excel.workbooks.add
workbook.saveas("#{Project}_links.xls")
workbook.close

    linkfile.puts "\n#{url}"
$ie.links.each{|link|
         linkfile.puts("#{link.innerText} - #{link.href}") unless
    link.innerText.empty?
    }

    linkfile.puts "\n\n"

end

One big file with it formatted like this would be fine, or even a
different worksheet for each URL would be great. I’m not really
concered with that. Can you guys give me hand?

Thanks!

Shoot, pardon the “ef” (def) and the excel.close in there, those are
mistakes.

I’d probably recommend FasterCSV instead. You can use Excel through
WIN32OLE (google both terms for tutorials on how), but it’s much Much
MUCH simpler to just use FasterCSV.

$ gem install fastercsv
$ irb

items = []
items << [ ‘text1’, ‘link1’ ]
items << [ ‘text2’, ‘link2’ ]
require ‘faster_csv’
FasterCSV::open(‘links.csv’, ‘wb+’) do |csv|
items.each { |item| csv << item }
end
exit
$ type links.csv
text1, link1
text2, link2

(It will handle quoting, escaping, etc for you).

You can open a CSV file with Excel just by double-clicking it usually
(Excel should be the default extension handler for CSV files unless
you’ve registered something else to the extension), and you can save
the file in Excel with “Save As…” if you need a true Excel file with
all the formatting options (CSV files can’t remember column widths for
example if you want the widths auto-expanded to the size of the
content).

Even if you still wanted to go with a real Excel file, I’d probably
still use FasterCSV to generate an intermediary file since it’s much
easier to do than messing with Excel.Application, Excel.Workbook,
Excel.Worksheet, Excel.Range, etc. Then, you can just instantiate an
Excel.Application, open the CSV as a workbook, and do a “Save As…”
programmatically. With a lot less code, and a lot less debugging effort
than trying to use Excel directly for everything. For example, if
anything goes wrong in your process while working with an
Excel.Application, the Excel process gets orphaned. So you have to wrap
the entire process in a rescue basically so you can make sure and
“ensure” “excel.Quit()” gets called.

If you want to do it the hard way though, then there’s an excellent
starter at RubyGarden:
http://wiki.rubygarden.org/Ruby/page/show/ScriptingExcel

Good luck!

[email protected] wrote:

I have a script that spiders a page looking for elements. In this
specific case, I’m looking for links (innerText and href).

I have this script working so far to output data to a text file, but
I’d like to output it to an Excel file for organization. I’m getting
stuck however trying to think of the correct method to put all of the
link.innerText values into A1, A2, A3, etc and the corresponding
link.href into B1, B2, etc.

Here is the code I have so far (with the normal text file code in place
of the excel code I have yet to think of):

ef grab_links(url)
$ie.goto(url)
excel = WIN32OLE.new(“excel.application”)
excel.visible = true
workbook = excel.workbooks.add
workbook.saveas("#{Project}_links.xls")
workbook.close

    linkfile.puts "\n#{url}"

$ie.links.each{|link|
linkfile.puts("#{link.innerText} - #{link.href}") unless
link.innerText.empty?
}

    linkfile.puts "\n\n"

end

One big file with it formatted like this would be fine, or even a
different worksheet for each URL would be great. I’m not really
concered with that. Can you guys give me hand?

Thanks!

xl = WIN32OLE.new(“Excel.Application”)
xl.Visible = 1
wb = xl.Workbooks.Add
ws = wb.Worksheets(1)
row = 0
$ie.links.each do |link|
next if link.innerText.empty?
row = row + 1
ws.Cells(row, 1).Value = link.innerText
ws.Cells(row, 2).Value = link.href
end

[ Not tested, and posted before I’ve had my morning tea… ]

Mully

[email protected] wrote:

Here is the code I have so far (with the normal text file code in place
linkfile.puts “\n#{url}”
concered with that. Can you guys give me hand?

Thanks!

Try with Spreadsheet::Excel

It let you create worksheets, write in a defined column - row, …

Regards!

Wow David, that’s dedication – coding before tea? Were you at least
wearing a helmet or other protection?

I really do appreciate the reply, that’s exactly what I was looking
for. Even if it happens to not fully work, that’s the logic I was
looking for so the commands should quickly follow.

Thanks again –

Thanks Sam, I appreciate the reply. That tutorial is what I have been
using, but nobody really talks about handling an unknown amount of
variable pairs.

Where exactly in the FasterCSV code are you stating, “first part”,
“second part”. Would it be the same as just saying “#{firstitem},
#{seconditem}\n”?

That would give me:

“first link name, first link url
second link name, second link url”

Would that type of comma-separated file work for Excel output?

Thanks again.

Where exactly in the FasterCSV code are you stating, “first part”,
“second part”. Would it be the same as just saying “#{firstitem},
#{seconditem}\n”?

The “csv << [ ‘first part’, ‘second part’ ]”

The csv object passed into the block by FasterCSV::open just expects an
array of values. It handles the escaping of those values internally.
Yes, it’s fully compatible with Excel, produces a lot smaller files, is
faster, simpler, and works with pretty much any text editor if you
don’t have Excel handy.

I used to try to do everything with Excel Automation, but then it
dawned on me… unless you want colored headers, custom fonts, multiple
sheets, etc, there just isn’t good reason I should be targetting Excel
and excluding everyone that doesn’t have it, when I can just go with
the simpler CSV and it’s available to everyone then (including Excel
users of course).

That’s just my rationale tho’. Sometimes people do want the colored
headers, multiple sheets, etc. :wink:

can you not use the worksheet.Range method?

ie
require ‘win32ole’
excel = WIN32OLE::new(‘excel.Application’)
excel.visible = true
workbook = excel.workbooks.add
worksheet = workbook.Worksheets(1)
i = 1
“a”.upto(“z”) {|s|
worksheet.Range(“A#{i.to_s}:B#{i.to_s}”).value = [s, s.next]
i+=1
}

x1 wrote:

can you not use the worksheet.Range method?

That works too. I just appreciate a good 1 to ~3 liner. :slight_smile: FasterCSV
does that. Excel Automation doesn’t.