Regular expression

arunvoip · March 23, 2009, 7:52am

Hi,
I know that what i’m going to ask is for the solution for a simple
problem. But as I’m new to Ruby I have not learnt a lot about regular
expressions in Ruby.

Can anybody tell me how to extract all the contents which are included
inside the ‘’ and ‘’ tag and also to extract the text given
in between the ‘’ and ‘’ tag using regular expression. I know it
can be extracted using the ‘scan’ method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me

Regards
Arun

arunvoip · March 23, 2009, 9:21am

Ryan D. wrote:

On Mar 22, 2009, at 23:49 , Arun K. wrote:

can be extracted using the ‘scan’ method but I dont know what should
be
the matching patterns or expressions. Can anybody pls help me

regexps are about the worst thing to use in this case. Look at this
instead:

http://mechanize.rubyforge.org/files/GUIDE_txt.html

I know that using mechanize or hpricot is a far better option in this
case. But i’m just asking as a matter of curiosity to know about regexps

Regards
ArunKumar

arunvoip · March 23, 2009, 8:24am

On Mar 22, 2009, at 23:49 , Arun K. wrote:

can be extracted using the ‘scan’ method but I dont know what should
be
the matching patterns or expressions. Can anybody pls help me

regexps are about the worst thing to use in this case. Look at this
instead:

http://mechanize.rubyforge.org/files/GUIDE_txt.html

arunvoip · March 23, 2009, 10:40am

Arun K. wrote:

Hi,
I know that what i’m going to ask is for the solution for a simple
problem. But as I’m new to Ruby I have not learnt a lot about regular
expressions in Ruby.

Can anybody tell me how to extract all the contents which are included
inside the ‘’ and ‘’ tag and also to extract the text given
in between the ‘’ and ‘’ tag using regular expression. I know it
can be extracted using the ‘scan’ method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me

Regards
Arun

s = “hello world”
new_s = s.gsub(/<.*?>/, “”)
puts new_s

–output:–
hello world

html = DATA.read()
regex = Regexp.new("(.*)", Regexp::MULTILINE)
puts html[regex, 1]

END

html page

hello

world

goodbye

–output:–

html page

hello

world

goodbye

In the expression:

html[regex, 1]

The 1 says to return the first parenthesized group in the regex.

arunvoip · March 23, 2009, 10:44am

7stud – wrote:

regex = Regexp.new("(.*)", Regexp::MULTILINE)

…oh, yeah. Normally, a . matches any character except a newline. The
regex .* matches any character 0 or more times–but to get it to match
newlines as well, you have to specify Regxp::MULTILINE.

arunvoip · March 23, 2009, 11:03am

On Mon, Mar 23, 2009 at 9:49 AM, Arun K.
[email protected]wrote:

Can anybody tell me how to extract all the contents which are included
inside the ‘’ and ‘’ tag and also to extract the text given
in between the ‘’ and ‘’ tag using regular expression. I know it
can be extracted using the ‘scan’ method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me

Let’s assume we have the following content:

Want a Ruby regular expression editor? Check out Rubular.

Here are two quick and dirty regexps:

/(.*)</html>/m
This regexp will capture anything between an opening html tag and a
closing
one. the /m option specifies "Multiline Mode: “.” will match any
character
including a newline.
For our content, it will capture:

Want a Ruby regular expression editor? Check out Rubular.

/<a.>(.)</a>/
This regexp will capture the text between an opening anchor element and
a
closing one. The first “.*” is there to deal with href and any other
attribute. You might wanna throw the /m option in there too.
For our content, it will capture:
Rubular

On Mon, Mar 23, 2009 at 11:18 AM, Arun K.
[email protected]
wrote:

I know that using mechanize or hpricot is a far better option in this
case. But i’m just asking as a matter of curiosity to know about regexps

Dare I say, a man should use regexps if only to satisfy his curiosity.

Regards,
Yaser

arunvoip · March 23, 2009, 10:47am

7stud – wrote:

In the expression:

html[regex, 1]

The 1 says to return the first parenthesized group in the regex.

To be a little clearer, the 1 says to return whatever matched the first
parenthesized group in the regex.

arunvoip · March 23, 2009, 1:22pm

Check out the site http://www.rubular.com/
It is very helpful in solving RegEx problems

ciao,
Arjun

twitter.com/arjunghosh

On Mon, Mar 23, 2009 at 12:19 PM, Arun K.