Forum: Ruby trying to use regex

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Dc616063228a211102c1dc44a1765294?d=identicon&s=25 merrittr (Guest)
on 2007-06-20 10:15
(Received via mailing list)
hi i am trying to strip out text between body tags but when  run it i
get:

rob@rob-laptop:~/ruby$ ./html2.rb
./html2.rb:14: unknown regexp options - bdy
./html2.rb:14: unterminated string meets end of file
./html2.rb:14: parse error, unexpected tSTRING_END, expecting
tSTRING_CONTENT or tREGEXP_END or tSTRING_DBEG or tSTRING_DVAR




#! /usr/bin/ruby

 @h = File.open "test.html"
 @response = @h.gets

 text = @response.scan(/<body[^>]*>(.+?)</body>/)[0]
 puts text
807270f56f26ad90755eef71f2c228fe?d=identicon&s=25 Alex Gutteridge (Guest)
on 2007-06-20 10:42
(Received via mailing list)
On 20 Jun 2007, at 17:15, merrittr wrote:

>
>
> #! /usr/bin/ruby
>
>  @h = File.open "test.html"
>  @response = @h.gets
>
>  text = @response.scan(/<body[^>]*>(.+?)</body>/)[0]
>  puts text

You need to escape the '/' in your regexp, and unless your html file
is one line you may need to also add the multiline option:

text = @response.scan(/<body[^>]*>(.+?)<\/body>/m)[0]

Alex Gutteridge

Bioinformatics Center
Kyoto University
Ef3aa7f7e577ea8cd620462724ddf73b?d=identicon&s=25 Rob Biedenharn (Guest)
on 2007-06-20 16:37
(Received via mailing list)
On Jun 20, 2007, at 4:41 AM, Alex Gutteridge wrote:
>> #! /usr/bin/ruby
> text = @response.scan(/<body[^>]*>(.+?)<\/body>/m)[0]
>
> Alex Gutteridge
>
> Bioinformatics Center
> Kyoto University

Or you can use the %r{} form of a Regexp literal:

text = @response.scan(%r{<body\b.*?>(.*?)</body>}mi)[0]

\b matches a "word boundary"
m is the multi-line option that causes . to match newlines, too
i is the case insensitive option (so BODY would also be matched)

-Rob

Rob Biedenharn    http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
94cc3e46cfc5bc361e409e2e884ecfa4?d=identicon&s=25 Drew Olson (dfg59)
on 2007-06-20 16:42
merrittr wrote:
> hi i am trying to strip out text between body tags but when  run it i
> get:

HTML parsing can get quite complicated, why not use a library? I've
heard great things about http://code.whytheluckystiff.net/hpricot/
This topic is locked and can not be replied to.