Get title from URL?

Anybody have a code snippet that extracts the title from the tag
from a given URL?

require ‘Hpricot’
require ‘open-uri’

url=“http://www.ruby-lang.org/
doc=Hpricot(open(url))
title=(doc/“title”).inner_text

=> “Ruby P.ming Language”

2009/4/24 Cisco Ri [email protected]:

Anybody have a code snippet that extracts the title from the tag
from a given URL?

require ‘rubygems’
require ‘mechanize’
title = WWW::Mechanize.new.get(‘http://google.com’).title
=> “Google”

Regards,
Park H.

Thanks, both work great.

Cisco Ri schrieb:

Anybody have a code snippet that extracts the title from the tag
from a given URL?

Without installing special things:

require ‘open-uri’
open(‘http://google.com/’).read =~ /(.*?)</title>/
p $1

Heesob P. wrote:

2009/4/24 Cisco Ri [email protected]:

Anybody have a code snippet that extracts the title from the tag
from a given URL?

require ‘rubygems’
require ‘mechanize’
title = WWW::Mechanize.new.get(‘http://google.com’).title
=> “Google”

Regards,
Park H.

I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.

I haven’t tried out the open-uri only method yet.

Thanks for the help everyone.

2009/4/28 Cisco Ri [email protected]:

Regards,
Park H.

I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.
You can work around like this:

require ‘rubygems’
require ‘mechanize’
agent = WWW::Mechanize.new
agent.user_agent_alias = ‘Mac Safari’
title = agent.get(‘http://wikipedia.org’).title

Regards,
Park H.