Justin C. points out the greatest difficulty with the situation,
i.e. that when dealing with a country code TLD, one may well have a
different number of parts (e.g. example.co.uk) than when dealing with
a gTLD (example.com).
The only solution that has occurred to me is to have a list of known
TLDs and second level domains (e.g. co.uk) that are insufficiently
specific, requiring a subdomain for additional specificity. The
problem is that this requires maintenance as well as initial research.
Justin C. points out the greatest difficulty with the situation,
i.e. that when dealing with a country code TLD, one may well have a
different number of parts (e.g. example.co.uk) than when dealing with
a gTLD (example.com).
The only solution that has occurred to me is to have a list of known
TLDs and second level domains (e.g. co.uk) that are insufficiently
specific, requiring a subdomain for additional specificity. The
problem is that this requires maintenance as well as initial research.
As the poster pointed out, matching everything leads to a huge regex,
which is likely to cause maintenance problems (though he indicated
that they started generating the regex from other data to address
that) and would make me concerned about resource allocation, though I
couldn’t find anything in the core Ruby Doc about a max length for a
regex.
On the other hand it might be more performant than looping through a
bunch of substring matches or matching against database records. I
sense some testing in my future.
For a minute, I thought your reply was generated by a porn spam bot
until I saw github in the URL.
For those reading the thread, this is a gem that uses http://publicsuffix.org/ to parse domain names and identify the suffix
(e.g. “com” or “co.uk”), domain, subdomains, etc. It extends
Addressable.URI.
On Mon, Aug 23, 2010 at 10:40 AM, Charles C. [email protected]
wrote:
http://mxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat
As the poster pointed out, matching everything leads to a huge regex,
which is likely to cause maintenance problems (though he indicated
that they started generating the regex from other data to address
that) and would make me concerned about resource allocation, though I
couldn’t find anything in the core Ruby Doc about a max length for a
regex.
On the other hand it might be more performant than looping through a
bunch of substring matches or matching against database records. Â I
sense some testing in my future.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.