Hi all,
I've been at this for a few hours now, and not getting much further:
I want to read a bunch of urls, listed on each seperate line in a txt
file.
I then want to use these seperate urls and do a DO EACH method (a
nokogiri parse) for all these urls.
What i'm trying now is:
f = File.open("file.txt", "r")
f.each_line do |lijn|
searchableurl = Nokogiri::HTML (lijn)
It is not giving an error, nor is it working.
Another I've been trying:
f = File.open("file.txt", "r")
while !f.eof?
line = f.readline
searchableurl = Nokogiri::HTML(line)
Is this the wrong way of getting and then using each url?
Does it have to do with linebreaks?
thanks.
on 2012-10-09 16:51
on 2012-10-09 17:07
On Tue, Oct 9, 2012 at 4:51 PM, Sybren Kooistra <lists@ruby-forum.com> wrote: > > f = File.open("file.txt", "r") > f.each_line do |lijn| Better to use the block forms: File.foreach("file.txt") do |line| end This way the file is properly closed. > searchableurl = Nokogiri::HTML (lijn) That method receives the HTML string, not the URL. You need to read its contents first: require 'open-uri' Nokogiri::HTML(open(line)) Jesus.
on 2012-10-09 17:22
Jesus, thanks a million. It works. One more question: since I'm going to use this method on a txt file with hundreds of thousands of urls, is this the best method, or does it save to much in memory?
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.