Reading and using urls from txtfile

Hi all,

I’ve been at this for a few hours now, and not getting much further:

I want to read a bunch of urls, listed on each seperate line in a txt
file.
I then want to use these seperate urls and do a DO EACH method (a
nokogiri parse) for all these urls.

What i’m trying now is:

f = File.open(“file.txt”, “r”)
f.each_line do |lijn|
searchableurl = Nokogiri::HTML (lijn)

It is not giving an error, nor is it working.

Another I’ve been trying:

f = File.open(“file.txt”, “r”)
while !f.eof?
line = f.readline
searchableurl = Nokogiri::HTML(line)

Is this the wrong way of getting and then using each url?
Does it have to do with linebreaks?

thanks.

On Tue, Oct 9, 2012 at 4:51 PM, Sybren K. [email protected]
wrote:

f = File.open(“file.txt”, “r”)
f.each_line do |lijn|

Better to use the block forms:

File.foreach(“file.txt”) do |line|

end

This way the file is properly closed.

searchableurl = Nokogiri::HTML (lijn)

That method receives the HTML string, not the URL. You need to read
its contents first:

require ‘open-uri’

Nokogiri::HTML(open(line))

Jesus.

Jesus, thanks a million.

It works.

One more question: since I’m going to use this method on a txt file with
hundreds of thousands of urls, is this the best method, or does it save
to much in memory?