Hi all, I've been at this for a few hours now, and not getting much further: I want to read a bunch of urls, listed on each seperate line in a txt file. I then want to use these seperate urls and do a DO EACH method (a nokogiri parse) for all these urls. What i'm trying now is: f = File.open("file.txt", "r") f.each_line do |lijn| searchableurl = Nokogiri::HTML (lijn) It is not giving an error, nor is it working. Another I've been trying: f = File.open("file.txt", "r") while !f.eof? line = f.readline searchableurl = Nokogiri::HTML(line) Is this the wrong way of getting and then using each url? Does it have to do with linebreaks? thanks.
on 2012-10-09 16:51
on 2012-10-09 17:07
On Tue, Oct 9, 2012 at 4:51 PM, Sybren Kooistra <email@example.com> wrote: > > f = File.open("file.txt", "r") > f.each_line do |lijn| Better to use the block forms: File.foreach("file.txt") do |line| end This way the file is properly closed. > searchableurl = Nokogiri::HTML (lijn) That method receives the HTML string, not the URL. You need to read its contents first: require 'open-uri' Nokogiri::HTML(open(line)) Jesus.
on 2012-10-09 17:22
Jesus, thanks a million. It works. One more question: since I'm going to use this method on a txt file with hundreds of thousands of urls, is this the best method, or does it save to much in memory?