Forum: Ruby Nokogiri - Data output not as expected

Posted by Barry Kavanagh (ninja2k)
on 2013-02-04 16:39
Attachment: asus_nokogiri.rb (926 Bytes)
Sorry for bothering the forum about this but it is extremely
important.

My Nokogiri output is being clobbered when being written to
excel, the output should be going row to row but its getting
overwritten.

Anybody know why? :(
Posted by "Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> (Guest)
on 2013-02-04 16:47
(Received via mailing list)
On Mon, Feb 4, 2013 at 4:39 PM, Barry Kavanagh <lists@ruby-forum.com> 
wrote:
> http://www.ruby-forum.com/attachment/8106/asus_nokogiri.rb
>
>
> --
> Posted via http://www.ruby-forum.com/.
>

require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new

sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'

#Specify our URI
%w[
  http://www.asus.com/Notebooks_Ultrabooks/S56CA/#sp...
  http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAIC...
].each do |url|

  doc = Nokogiri::HTML(open(url))

      #Grab our product specifications
    data = doc.css('div#specifications div#spec-area ul.product-spec 
li')

    #Modify our data
    lines = data.map(&:text)

    #Output our data  to the Spreadsheet
    lines.each.with_index do |line, i|
    a = 0
    a+=1
      sheet1[i, a] = line
    end
end

book.write 'C:/Users/Barry/Desktop/output.xls'

I don't know the spreadsheet API, but when you call this:

sheet1[i,a] = line

a is always 1, because the two previous lines:
a= 0
a += 1

always set a to 1. Maybe this means the first worksheet or whatever
and it's fine, but in that case you should just write a 1:

sheet1[i, 1] = line

or at least have a more descriptive variable (or constant).

And, also, for every URL, the lines.each is executed again, with i
starting at 0. So every url will start at 0 and overwrite the previous
lines.

Jesus.
Posted by Love U Ruby (my-ruby)
on 2013-02-04 16:49
Nice explanations!
Posted by Barry Kavanagh (ninja2k)
on 2013-02-04 17:00
Thank you for your explanation Jesus but I am just not sure where to 
reposition the code to fix this, sorry I am new to ruby.
Posted by "Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> (Guest)
on 2013-02-04 17:16
(Received via mailing list)
On Mon, Feb 4, 2013 at 5:01 PM, Barry Kavanagh <lists@ruby-forum.com> 
wrote:
> Thank you for your explanation Jesus but I am just not sure where to
> reposition the code to fix this, sorry I am new to ruby.

I don't know the spreadsheet API, but if you need to pass the row
number, you will need to count how many lines you filled in in the
previous step. So create a variable added_lines = 0, outside of all
loops, and increment it with the number of lines after adding them to
the spreadsheet.
Also, change sheet[i,a] to sheet1[added_lines + i, a]

Jesus.
Posted by Barry Kavanagh (ninja2k)
on 2013-02-04 17:27
Attachment: asus_nokogiri.rb (886 Bytes)
"Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> wrote in post 
#1095159:
> On Mon, Feb 4, 2013 at 5:01 PM, Barry Kavanagh <lists@ruby-forum.com>
> wrote:
>> Thank you for your explanation Jesus but I am just not sure where to
>> reposition the code to fix this, sorry I am new to ruby.
>
> I don't know the spreadsheet API, but if you need to pass the row
> number, you will need to count how many lines you filled in in the
> previous step. So create a variable added_lines = 0, outside of all
> loops, and increment it with the number of lines after adding them to
> the spreadsheet.
> Also, change sheet[i,a] to sheet1[added_lines + i, a]
>
> Jesus.

Ok I made those changes but getting no output to excel, only blank with 
no error code. :(
Posted by "Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> (Guest)
on 2013-02-04 17:45
(Received via mailing list)
On Mon, Feb 4, 2013 at 5:27 PM, Barry Kavanagh <lists@ruby-forum.com> 
wrote:
> "Jess Gabriel y Galn" <jgabrielygalan@gmail.com> wrote in post
> #1095159:
>> I don't know the spreadsheet API, but if you need to pass the row
>> number, you will need to count how many lines you filled in in the
>> previous step. So create a variable added_lines = 0, outside of all
>> loops, and increment it with the number of lines after adding them to
>> the spreadsheet.
>> Also, change sheet[i,a] to sheet1[added_lines + i, a]

> Ok I made those changes but getting no output to excel, only blank with
> no error code. :(

require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new

sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'

COLUMN = 1
added_lines = 0

#Specify our URI
%w[
  http://www.asus.com/Notebooks_Ultrabooks/S56CA/#sp...
].each do |url|

  doc = Nokogiri::HTML(open(url))

      #Grab our product specifications
    data = doc.css('div#specifications div#spec-area ul.product-spec 
li')

    #Modify our data
    lines = data.map(&:text)

    #Output our data  to the Spreadsheet
    lines.each.with_index do |line, i|
        sheet1[added_lines + i, COLUMN] = line
    end
                added_lines += lines.size
end

book.write 'C:/Users/Barry/Desktop/output.xls'

Hope this helps,

Jesus.
Posted by Barry Kavanagh (ninja2k)
on 2013-02-04 17:53
Your a savious Jesus, been working on this all week with limited 
success.

thank you!
Posted by Joel Pearson (virtuoso)
on 2013-02-04 18:32
Alternative:

require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new

sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'

lines = []

#Specify our URI
%w[
  http://www.asus.com/Notebooks_Ultrabooks/S56CA/#sp...
].each do |url|

  doc = Nokogiri::HTML(open(url))

  lines <<  doc.css('div#specifications div#spec-area ul.product-spec 
li').map(&:text)

end

lines.flatten!

lines.each_with_index do |line, idx|
sheet1.row(idx).push line
end

book.write('C:/Users/Barry/Desktop/output.xls')
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.