Using another program (Lynx) from within Ruby

I’m trying to write a script to read a list of URLS, get the HTTP
response
headers and (if there is a page there) from each URL, and output
to
a CSV file in this format:
URL, header,

I’ve started with something like this, using Lynx to get the headers.
The
part that doesn’t seem to work is this:
lynx -dump -head "#{line}" – it doesn’t want to put the url into the
#{line} within the backticks.

How do you insert a variable from Ruby into the shell command? I’m
ordering
4 Ruby books by mail today… I haven’t seen anything like this in the
ones
that I’ve browsed though.

Here is a larger section of the script:

print "Enter the location of the input file: "
infile = gets.chomp

# open file
File.open(infile, "r") do |f|
  # get HTTP headers with Lynx
  output = f.each_line { |line| `lynx -dump -head "#{line}" |

grep “HTTP”` }

puts output to CVS file

TODO

Il giorno 03/nov/06, alle ore 15:20, z ha scritto:

lynx -dump -head "#{line}" – it doesn’t want to put the url
Here is a larger section of the script:

TODO

If you really want to use an external program, you could use
something like open(“|program”) in order to get an IO object
connected to its output.
Anyway I think the best way to do that is by using Net::HTTP ( http://
Network and Web Libraries ), give it a
look, you could find it useful :slight_smile:

z wrote:

I’m trying to write a script to read a list of URLS, get the HTTP response
headers and (if there is a page there) from each URL, and output to
a CSV file in this format:
URL, header,

snip

have you tried Mechanize yet?

Mike

On 11/3/06, z [email protected] wrote:

How do you insert a variable from Ruby into the shell command? I’m ordering
File.open(infile, “r”) do |f|
# get HTTP headers with Lynx
output = f.each_line { |line| lynx -dump -head "#{line}" | grep "HTTP" }

puts output to CVS file

TODO

Hi,

the #{} should work. Try replacing the command with echo to see to
what exactly it is expanded.

On a side note, you can use Net::HTTP for this task without calling
external program:
(from rdoc):

response = nil
Net::HTTP.start('some.www.server', 80) {|http|
  response = http.get('/index.html')
}
p response['content-type']
p response.body

Jan S. wrote:

}
p response['content-type']
p response.body

I tried using Net::HTTP but I’m not sure how to get the HTTP response
code.
I tried the following and don’t see the response code (200, 301, 302,
etc.) – sorry I might not have mentioned that I only need the response
code. I’m going to try to use Net::HTTP because I saw that it can
follow
redirects. That would be useful.

Not enough info here:
response.each {|key, value| puts "#{key} is #{value}\n\n}

On Fri, 3 Nov 2006, z wrote:

I’m trying to write a script to read a list of URLS, get the HTTP response
headers and (if there is a page there) from each URL, and output to
a CSV file in this format:
URL, header,

You could do all that within ruby of course, but it’s almost certainly
quicker like you’re doing it…

I’ve started with something like this, using Lynx to get the headers. The
part that doesn’t seem to work is this:
lynx -dump -head "#{line}" – it doesn’t want to put the url into the
#{line} within the backticks.

I don’t think that’s the problem from the code.

# open file
File.open(infile, "r") do |f|
  # get HTTP headers with Lynx
  output = f.each_line { |line| `lynx -dump -head "#{line}" |

grep “HTTP”` }
# each line won’t return anything from the block
=begin ri output
----------------------------------------------------------- IO#each_line
ios.each(sep_string=$/) {|line| block } => ios
ios.each_line(sep_string=$/) {|line| block } => ios


 Executes the block for every line in _ios_, where lines are
 separated by _sep_string_. _ios_ must be opened for reading or an
 +IOError+ will be raised.

    f = File.new("testfile")
    f.each {|line| puts "#{f.lineno}: #{line}" }

 _produces:_

    1: This is line one
    2: This is line two
    3: This is line three
    4: And so on...

=end

    output = []
    f.each_line {|line| output << `lynx -dump -head "#{line}" | grep 

“HTTP”` }

puts output to CVS file

    # See if you have anything that makes sense...
    $stdout.puts output.join("\n")

TODO

    Hugh

Gabriele M. wrote:

If you really want to use an external program, you could use
something like open(“|program”) in order to get an IO object
connected to its output.
Anyway I think the best way to do that is by using Net::HTTP ( http://
Network and Web Libraries ), give it a
look, you could find it useful :slight_smile:

Thanks, that example looks like it has exactly what I need… going to
try
it now.

barjunk wrote:

z wrote:

I’m trying to write a script to read a list of URLS, get the HTTP
response headers and (if there is a page there) from each URL,
and output to a CSV file in this format:
URL, header,

snip

have you tried Mechanize yet?

No, but I will look into it. Thanks.