Forum: Ruby Get title from URL?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Cisco R. (Guest)
on 2009-04-24 12:03
Anybody have a code snippet that extracts the title from the <title> tag
from a given URL?
Dan D. (Guest)
on 2009-04-24 12:17
(Received via mailing list)
require 'Hpricot'
require 'open-uri'

url="http://www.ruby-lang.org/"
doc=Hpricot(open(url))
title=(doc/"title").inner_text

=> "Ruby P.ming Language"
Heesob P. (Guest)
on 2009-04-24 12:27
(Received via mailing list)
2009/4/24 Cisco Ri <removed_email_address@domain.invalid>:
> Anybody have a code snippet that extracts the title from the <title> tag
> from a given URL?

require 'rubygems'
require 'mechanize'
title = WWW::Mechanize.new.get('http://google.com').title
=> "Google"


Regards,
Park H.
Cisco R. (Guest)
on 2009-04-24 12:31
Thanks, both work great.
Rüdiger Brahns (Guest)
on 2009-04-27 16:07
(Received via mailing list)
Cisco Ri schrieb:
> Anybody have a code snippet that extracts the title from the <title> tag
> from a given URL?

Without installing special things:

require 'open-uri'
open('http://google.com/').read =~ /<title>(.*?)<\/title>/
p $1
Cisco R. (Guest)
on 2009-04-27 19:29
Heesob P. wrote:
> 2009/4/24 Cisco Ri <removed_email_address@domain.invalid>:
>> Anybody have a code snippet that extracts the title from the <title> tag
>> from a given URL?
>
> require 'rubygems'
> require 'mechanize'
> title = WWW::Mechanize.new.get('http://google.com').title
> => "Google"
>
>
> Regards,
> Park H.

I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.

I haven't tried out the open-uri only method yet.

Thanks for the help everyone.
Heesob P. (Guest)
on 2009-04-28 04:55
(Received via mailing list)
2009/4/28 Cisco Ri <removed_email_address@domain.invalid>:
>>
>> Regards,
>> Park H.
>
> I used this method for a while, and it was fine for most sites.
> However, with wikipedia.org it errored out with a 403 Forbidden error.
> The Hpricot/open-uri method works for most sites, including
> wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
> errors out with a 500 Internal Server error.
You can work around like this:

require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
title = agent.get('http://wikipedia.org').title


Regards,
Park H.
This topic is locked and can not be replied to.