Forum: Ruby Get title from URL?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
408016ab4c062f421c418946185aa232?d=identicon&s=25 Cisco Ri (ciscor)
on 2009-04-24 10:03
Anybody have a code snippet that extracts the title from the <title> tag
from a given URL?
Ad97b577f331ae29ed90da5751f2e44f?d=identicon&s=25 Dan Diebolt (dandiebolt)
on 2009-04-24 10:17
(Received via mailing list)
require 'Hpricot'
require 'open-uri'

url="http://www.ruby-lang.org/"
doc=Hpricot(open(url))
title=(doc/"title").inner_text

=> "Ruby Programming Language"
666b4e17b4bb0e2d999037a25f65a7cb?d=identicon&s=25 Heesob Park (phasis)
on 2009-04-24 10:27
(Received via mailing list)
2009/4/24 Cisco Ri <cisco.riordan@gmail.com>:
> Anybody have a code snippet that extracts the title from the <title> tag
> from a given URL?

require 'rubygems'
require 'mechanize'
title = WWW::Mechanize.new.get('http://google.com').title
=> "Google"


Regards,
Park Heesob
408016ab4c062f421c418946185aa232?d=identicon&s=25 Cisco Ri (ciscor)
on 2009-04-24 10:31
Thanks, both work great.
20228e1a65c9717555f80030a6e8779f?d=identicon&s=25 Rüdiger Brahns (Guest)
on 2009-04-27 14:07
(Received via mailing list)
Cisco Ri schrieb:
> Anybody have a code snippet that extracts the title from the <title> tag
> from a given URL?

Without installing special things:

require 'open-uri'
open('http://google.com/').read =~ /<title>(.*?)<\/title>/
p $1
408016ab4c062f421c418946185aa232?d=identicon&s=25 Cisco Ri (ciscor)
on 2009-04-27 17:29
Heesob Park wrote:
> 2009/4/24 Cisco Ri <cisco.riordan@gmail.com>:
>> Anybody have a code snippet that extracts the title from the <title> tag
>> from a given URL?
>
> require 'rubygems'
> require 'mechanize'
> title = WWW::Mechanize.new.get('http://google.com').title
> => "Google"
>
>
> Regards,
> Park Heesob

I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.

I haven't tried out the open-uri only method yet.

Thanks for the help everyone.
666b4e17b4bb0e2d999037a25f65a7cb?d=identicon&s=25 Heesob Park (phasis)
on 2009-04-28 02:55
(Received via mailing list)
2009/4/28 Cisco Ri <cisco.riordan@gmail.com>:
>>
>> Regards,
>> Park Heesob
>
> I used this method for a while, and it was fine for most sites.
> However, with wikipedia.org it errored out with a 403 Forbidden error.
> The Hpricot/open-uri method works for most sites, including
> wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
> errors out with a 500 Internal Server error.
You can work around like this:

require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
title = agent.get('http://wikipedia.org').title


Regards,
Park Heesob
This topic is locked and can not be replied to.