Forum: Ruby open-uri error

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Bbc4b3fca1ae3161257a8636145b424d?d=identicon&s=25 Elliot Temple (Guest)
on 2006-06-07 08:50
(Received via mailing list)
I am writing a script to download warcraft 3 replays for me. It got a
few (which work) then had an error:

URI::InvalidURIError: bad URI(is not URI?): http://ftp.replays.net/
w3g/060607/060606_mYm]Lucifer(UD)_vs_mTw-LasH(Hum)_TwistedMeadows_RN.w3g

The URL works in Safari, so I'm not sure what's going on. My wild
guess is that Safari accepts technically invalid URLs. Hopefully
someone knowledgeable can tell me what the issue is. Here's my code:


require "open-uri"
path = "/Applications/Warcraft\ III/Replay/auto/"
files = Dir.glob "#{path}*"
count = 0

urls = `lynx -dump http://war3.replays.net/`.split "\n"
urls = urls.select { |url| url =~ %r-\d{1,3}\. http://ftp.replays.net/
w3g- }
urls = urls.collect { |url| url.sub(%r-\s*\d{1,3}\.\s*-, "")}

urls.each do |url|
   filename = url.sub(%r-http://ftp.replays.net/w3g/\d*/-, "")
   if not files.include?(filename)
     open(url) do |remote_file|
       File.open(path + filename, "w") do |local_file|
         local_file.write remote_file.read
         count += 1
       end
     end
   end
end

puts "I got #{count} files!"



-- Elliot Temple
http://www.curi.us/blog/
Bbc4b3fca1ae3161257a8636145b424d?d=identicon&s=25 Elliot Temple (Guest)
on 2006-06-07 20:26
(Received via mailing list)
On Jun 6, 2006, at 11:47 PM, Elliot Temple wrote:

> I am writing a script to download warcraft 3 replays for me. It got
> a few (which work) then had an error:
>
> URI::InvalidURIError: bad URI(is not URI?): http://ftp.replays.net/
> w3g/060607/060606_mYm]Lucifer(UD)_vs_mTw-LasH(Hum)
> _TwistedMeadows_RN.w3g

I fixed my problem. The key change is:

url = URI.escape(url)


Here's the current version of the code:


require "open-uri"
path = "/Applications/Warcraft\ III/Replay/auto/"
Dir.chdir path
files = Dir.glob "*"
count = 0

urls = `lynx -dump http://war3.replays.net/`.split "\n"
urls = urls.select {|url| url =~ %r-\d{1,3}\. http://ftp.replays.net/
w3g-}
urls = urls.collect {|url| url.sub(%r-\s*\d{1,3}\.\s*-, "")}.uniq

puts "I found #{urls.length} replays!"

urls.each do |url|
   filename = url.sub(%r-http://ftp.replays.net/w3g/\d*/-, "")
   url = URI.escape(url)
   if not files.include?(filename)
     puts "Count is #{count}. Getting #{url}"
     open(url) do |remote_file|
       File.open(path + filename, "w") do |local_file|
         local_file.write remote_file.read
         count += 1
       end
     end
   end
end

puts "I got #{count} files!"

-- Elliot Temple
http://www.curi.us/blog/
Bbc4b3fca1ae3161257a8636145b424d?d=identicon&s=25 Elliot Temple (Guest)
on 2006-06-08 10:19
(Received via mailing list)
On Jun 7, 2006, at 11:23 AM, Elliot Temple wrote:

> I fixed my problem. The key change is:
>
> url = URI.escape(url)

oops. that didn't work for URLS with [] in them. now i've added this
code:

     begin
       get_replay url, filename
     rescue URI::InvalidURIError
       url = url.scan(%r-http://ftp.replays.net/w3g/\d*/-)[0] +
CGI.escape(filename)
       begin
         get_replay url, filename
       rescue URI::InvalidURIError
         STDERR.puts $!
       end
     end

the CGI.escape changes [] but isn't safe to do on the entire URL (it
changes slashes as well). observe:

irb(main):013:0> x = CGI.escape "http://www.google.com"
=> "http%3A%2F%2Fwww.google.com"
irb(main):014:0> open(x)
Errno::ENOENT: No such file or directory - http%3A%2F%2Fwww.google.com
         from /usr/local/lib/ruby/1.8/open-uri.rb:88:in `initialize'
         from /usr/local/lib/ruby/1.8/open-uri.rb:88:in `open'
         from (irb):14
irb(main):015:0> open "http://www.google.com"
=> #<StringIO:0x585b74>

I don't know if I'm doing this the correct way, but it's working so
far (got about 60 files).

Elliot
This topic is locked and can not be replied to.