Forum: Ruby reading and using urls from txtfile

Posted by Sybren Kooistra (sybrenkooistra)
on 2012-10-09 16:51
Hi all,

I've been at this for a few hours now, and not getting much further:

I want to read a bunch of urls, listed on each seperate line in a txt
file.
I then want to use these seperate urls and do a DO EACH method (a
nokogiri parse) for all these urls.

What i'm trying now is:

f = File.open("file.txt", "r")
f.each_line do |lijn|
searchableurl = Nokogiri::HTML (lijn)

It is not giving an error, nor is it working.

Another I've been trying:

f = File.open("file.txt", "r")
while !f.eof?
line = f.readline
searchableurl = Nokogiri::HTML(line)


Is this the wrong way of getting and then using each url?
Does it have to do with linebreaks?


thanks.
Posted by "Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> (Guest)
on 2012-10-09 17:07
(Received via mailing list)
On Tue, Oct 9, 2012 at 4:51 PM, Sybren Kooistra <lists@ruby-forum.com> 
wrote:
>
> f = File.open("file.txt", "r")
> f.each_line do |lijn|

Better to use the block forms:

File.foreach("file.txt") do |line|

end

This way the file is properly closed.

> searchableurl = Nokogiri::HTML (lijn)

That method receives the HTML string, not the URL. You need to read
its contents first:

require 'open-uri'

Nokogiri::HTML(open(line))

Jesus.
Posted by Sybren Kooistra (sybrenkooistra)
on 2012-10-09 17:22
Jesus, thanks a million.

It works.


One more question: since I'm going to use this method on a txt file with 
hundreds of thousands of urls, is this the best method, or does it save 
to much in memory?
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.