_ in URI subdomain problem

varac · August 15, 2006, 1:22am

Hi,

I am currently working with some code for fetching webpages, and I have
run into a problem. The current implementation does not fetch webpages
with _ in the subdomain, for ex http://a_b.google.com.

I have poked around the forum posts and read that the _ in the subdomain
violates an RFC standand, but in my case it is necessary to retrieve
those pages regardless. Before I dive a bit more into this code that I
inherited, has anyone successfully retrieved such pages?

The code uses URI.parse for URI parsing and Net::HTTP for page
retrieval. Currently the code breaks at the URI.parse. Will it suffice
just to rewrite the URI.parse or do I need to find an alternative to
Net::HTTP as well?

Thanks in advance.

varac · August 15, 2006, 2:20pm

The code uses URI.parse for URI parsing and Net::HTTP for page
retrieval. Currently the code breaks at the URI.parse. Will it suffice
just to rewrite the URI.parse or do I need to find an alternative to
Net::HTTP as well?

Should be able to extend URI.parse and have it work there. Good luck!

varac · August 15, 2006, 7:52pm

I think I found the answer in URI.escape after discovering the goldmine
that is the searchable Ruby list.

http://blade.nagaokaut.ac.jp/ruby/ruby-talk/index.shtml#ruby_talk

Aredridel wrote:

The code uses URI.parse for URI parsing and Net::HTTP for page
retrieval. Currently the code breaks at the URI.parse. Will it suffice
just to rewrite the URI.parse or do I need to find an alternative to
Net::HTTP as well?

Should be able to extend URI.parse and have it work there. Good luck!