Forum: Ruby regexp html scraping

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
19eb75164135659a8fae98101b1c250e?d=identicon&s=25 Arun Kumar (arun_nss)
on 2009-03-18 06:56
Hi,
I've to extract the full html from a website url using regular
expressions or 'net-http'. Can anybody help me with the code to extract
the full html content of a website. I need to use only regexp or
'net:http'

Thanks
Arun Kumar
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2009-03-18 10:41
Arun Kumar wrote:
> Hi,
> I've to extract the full html from a website url using regular
> expressions or 'net-http'. Can anybody help me with the code to extract
> the full html content of a website. I need to use only regexp or
> 'net:http'
>

require 'net/http'

Net::HTTP.start("www.google.com") do |http|
  resp = http.get("/")
  puts resp.body[0..100]
end

--output:--
<html><head><meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1"><title>Google</ti
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-03-18 11:13
(Received via mailing list)
2009/3/18 Arun Kumar <arunkumar@innovaturelabs.com>:
> I've to extract the full html from a website url using regular
> expressions or 'net-http'.

What kind of question is that?  Use net-http OR regular expressions -
I mean, both serve totally different purposes.  You cannot exchange
one for the other.  You'll have difficulties to obtain the content
using regular expressions only...

Wondering...

robert
This topic is locked and can not be replied to.