Open-uri error


#1

I am writing a script to download warcraft 3 replays for me. It got a
few (which work) then had an error:

URI::InvalidURIError: bad URI(is not URI?): http://ftp.replays.net/
w3g/060607/060606_mYm]Lucifer(UD)_vs_mTw-LasH(Hum)_TwistedMeadows_RN.w3g

The URL works in Safari, so I’m not sure what’s going on. My wild
guess is that Safari accepts technically invalid URLs. Hopefully
someone knowledgeable can tell me what the issue is. Here’s my code:

require “open-uri”
path = “/Applications/Warcraft\ III/Replay/auto/”
files = Dir.glob “#{path}*”
count = 0

urls = lynx -dump http://war3.replays.net/.split “\n”
urls = urls.select { |url| url =~ %r-\d{1,3}. http://ftp.replays.net/
w3g- }
urls = urls.collect { |url| url.sub(%r-\s*\d{1,3}.\s*-, “”)}

urls.each do |url|
filename = url.sub(%r-http://ftp.replays.net/w3g/\d*/-, “”)
if not files.include?(filename)
open(url) do |remote_file|
File.open(path + filename, “w”) do |local_file|
local_file.write remote_file.read
count += 1
end
end
end
end

puts “I got #{count} files!”

– Elliot T.


#2

On Jun 6, 2006, at 11:47 PM, Elliot T. wrote:

I am writing a script to download warcraft 3 replays for me. It got
a few (which work) then had an error:

URI::InvalidURIError: bad URI(is not URI?): http://ftp.replays.net/
w3g/060607/060606_mYm]Lucifer(UD)_vs_mTw-LasH(Hum)
_TwistedMeadows_RN.w3g

I fixed my problem. The key change is:

url = URI.escape(url)

Here’s the current version of the code:

require “open-uri”
path = “/Applications/Warcraft\ III/Replay/auto/”
Dir.chdir path
files = Dir.glob “*”
count = 0

urls = lynx -dump http://war3.replays.net/.split “\n”
urls = urls.select {|url| url =~ %r-\d{1,3}. http://ftp.replays.net/
w3g-}
urls = urls.collect {|url| url.sub(%r-\s*\d{1,3}.\s*-, “”)}.uniq

puts “I found #{urls.length} replays!”

urls.each do |url|
filename = url.sub(%r-http://ftp.replays.net/w3g/\d*/-, “”)
url = URI.escape(url)
if not files.include?(filename)
puts “Count is #{count}. Getting #{url}”
open(url) do |remote_file|
File.open(path + filename, “w”) do |local_file|
local_file.write remote_file.read
count += 1
end
end
end
end

puts “I got #{count} files!”

– Elliot T.


#3

On Jun 7, 2006, at 11:23 AM, Elliot T. wrote:

I fixed my problem. The key change is:

url = URI.escape(url)

oops. that didn’t work for URLS with [] in them. now i’ve added this
code:

 begin
   get_replay url, filename
 rescue URI::InvalidURIError
   url = url.scan(%r-http://ftp.replays.net/w3g/\d*/-)[0] +

CGI.escape(filename)
begin
get_replay url, filename
rescue URI::InvalidURIError
STDERR.puts $!
end
end

the CGI.escape changes [] but isn’t safe to do on the entire URL (it
changes slashes as well). observe:

irb(main):013:0> x = CGI.escape “http://www.google.com
=> “http%3A%2F%2Fwww.google.com
irb(main):014:0> open(x)
Errno::ENOENT: No such file or directory - http%3A%2F%2Fwww.google.com
from /usr/local/lib/ruby/1.8/open-uri.rb:88:in initialize' from /usr/local/lib/ruby/1.8/open-uri.rb:88:inopen’
from (irb):14
irb(main):015:0> open “http://www.google.com
=> #StringIO:0x585b74

I don’t know if I’m doing this the correct way, but it’s working so
far (got about 60 files).

Elliot