Anybody have a code snippet that extracts the title from the tag
from a given URL?
require ‘Hpricot’
require ‘open-uri’
url=“http://www.ruby-lang.org/”
doc=Hpricot(open(url))
title=(doc/“title”).inner_text
=> “Ruby P.ming Language”
2009/4/24 Cisco Ri [email protected]:
Anybody have a code snippet that extracts the title from the tag
from a given URL?
require ‘rubygems’
require ‘mechanize’
title = WWW::Mechanize.new.get(‘http://google.com’).title
=> “Google”
Regards,
Park H.
Thanks, both work great.
Cisco Ri schrieb:
Anybody have a code snippet that extracts the title from the tag
from a given URL?
Without installing special things:
require ‘open-uri’
open(‘http://google.com/’).read =~ /(.*?)</title>/
p $1
Heesob P. wrote:
2009/4/24 Cisco Ri [email protected]:
Anybody have a code snippet that extracts the title from the tag
from a given URL?require ‘rubygems’
require ‘mechanize’
title = WWW::Mechanize.new.get(‘http://google.com’).title
=> “Google”Regards,
Park H.
I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.
I haven’t tried out the open-uri only method yet.
Thanks for the help everyone.
2009/4/28 Cisco Ri [email protected]:
Regards,
Park H.I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.
You can work around like this:
require ‘rubygems’
require ‘mechanize’
agent = WWW::Mechanize.new
agent.user_agent_alias = ‘Mac Safari’
title = agent.get(‘http://wikipedia.org’).title
Regards,
Park H.