Reading and using urls from txtfile

Hi all,

I’ve been at this for a few hours now, and not getting much further:

I want to read a bunch of urls, listed on each seperate line in a txt
file.
I then want to use these seperate urls and do a DO EACH method (a
nokogiri parse) for all these urls.

What i’m trying now is:

f = File.open(“file.txt”, “r”)
f.each_line do |lijn|
searchableurl = Nokogiri::HTML (lijn)

It is not giving an error, nor is it working.

Another I’ve been trying:

f = File.open(“file.txt”, “r”)
while !f.eof?
line = f.readline
searchableurl = Nokogiri::HTML(line)

Is this the wrong way of getting and then using each url?
Does it have to do with linebreaks?

thanks.

On Tue, Oct 9, 2012 at 4:51 PM, Sybren K. [email protected]
wrote:

f = File.open(“file.txt”, “r”)
f.each_line do |lijn|

Better to use the block forms:

File.foreach(“file.txt”) do |line|

end

This way the file is properly closed.

searchableurl = Nokogiri::HTML (lijn)

That method receives the HTML string, not the URL. You need to read
its contents first:

require ‘open-uri’

Nokogiri::HTML(open(line))

Jesus.

Jesus, thanks a million.

It works.

One more question: since I’m going to use this method on a txt file with
hundreds of thousands of urls, is this the best method, or does it save
to much in memory?

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs