Extract Domain name (url)

Hello Friends,
I need to write a regular expression which will extract and return the
domain name.

for example
if a user parse any of the below mention url it should save only
foo.com

http://www.foo.com/
http://www.foo.com/something
http://foo.com/
https://something.foo.com/

Thanks for any help…

Thanks
abhis

Good way to Start is trying it to learn on online Regular Expression
Editor

http://rubular.com

Hey srinivas,

Thanks for reply.

Somehow I am able to get the outpout, but the only problem is that i
have to
define all the uk|com|net|org|in

So just trying to figure out which will be the best way to get the
output.

url_pattern =
/^(?:.+?.)+(.+?.(?:co.uk|com|net|org|in))(:[0-9]{2,5})?/.$/is
url = “http://www.foo.com
url_pattern.match(url)
$1 #=> “foo.com

Thanks
Abhishek

On Wed, Nov 11, 2009 at 10:25 PM, Abhishek shukla
[email protected]wrote:

https://something.foo.com/

Thanks for any help…

Thanks
abhis

require ‘uri’

urls = [ “http://www.foo.com/”, “http://www.foo.com/something”, "
http://foo.com/", “https://something.foo.com/” ]

urls.each { |url| puts URI::parse( url ).host.split( “.”
)[-2,2].join(“.”) }

Good luck,

-Conrad

Hi Abhishek

You can try using Addressable gem for your requirement .

Step 1 : Install Addressable gem with the following command .

      $sudo gem install  addressable

Step 2 : Will be explaining with IRB u can try and integrate with
your rails application .

        $ irb
         > require 'rubygems'
         > require 'addressable/uri'
         >  uri = Addressable::URI.parse("http://google.com")
              => #<Addressable::URI:0xfdb9aee5c 

URI:http://google.com>

Step 3 : You can extract only the host with the following command

        > uri.host
         => "google.com"

There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html

Hope this helps !

Best regards,
Srinivas I.

http://twitter.com/srinivasiyermv

On Thu, 2009-11-12 at 02:31 -0800, Conrad T. wrote:

irb(main):002:0> require ‘addressable/uri’
=> true
irb(main):003:0> uri =
Addressable::URI.parse(“http://www.usc.edu/home.html” )
=> #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html>
irb(main):004:0> uri.host
=> “www.usc.edu”


uri.host.split(‘.’)[0]

Craig


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Given a URL, return a domain

def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww./, “”)
rescue
“”
end
end

On Wed, Nov 11, 2009 at 11:46 PM, Srinivas I. [email protected]
wrote:

your rails application .
=> “google.com

There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html

Hope this helps !

Best regards,
Srinivas I.
http://talkonsomething.com
http://twitter.com/srinivasiyermv

Hi, the addressable gem doesn’t produce the domain part of the web
address.
For example,

irb(main):002:0> require ‘addressable/uri’
=> true
irb(main):003:0> uri =
Addressable::URI.parse(“http://www.usc.edu/home.html
)
=> #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html>
irb(main):004:0> uri.host
=> “www.usc.edu”

-Conrad

Oops, forgot to add the other function i was using:

Prepend URL with http if necessary

def self.fix_url(u)
!!( u !~ /\A(?:http://|https://)/i ) ? “http://#{u}” : u
end

Note that you need to require uri:

require ‘uri’

I put this in a module called Utilities so the whole thing is:

require ‘uri’

module Utilities

Given a URL, return a domain

def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww./, “”)
rescue
“”
end
end

Prepend URL with http if necessary

def self.fix_url(u)
!!( u !~ /\A(?:http://|https://)/i ) ? “http://#{u}” : u
end

end

And you call it with Utilities::url_to_domain(u)