Ruby regexpresion

woow · September 17, 2010, 8:39pm

(21:15:32:873) SMART>> Starting Process ‘/abs/nc/qwcy/xyz’. pid 11560

I want to parse the following few things from the above string

(21:15:32:873)
xyz
pid 11560

and output as below,

(21:15:32:873), xyz, pid 11560

I am stuck and I would appreciate if someone could give me some help. I
am newbie to ruby…thanks

woow · September 17, 2010, 9:20pm

Ruwan B. wrote:

(21:15:32:873) SMART>> Starting Process ‘/abs/nc/qwcy/xyz’. pid 11560

I want to parse the following few things from the above string

(21:15:32:873)
xyz
pid 11560

and output as below,

(21:15:32:873), xyz, pid 11560

I am stuck and I would appreciate if someone could give me some help. I
am newbie to ruby…thanks

http://www.ruby-doc.org/docs/ProgrammingRuby/ is a good start. Read the
bits about regular expressions.

You can match the line with a regular expression, and use capture groups
(in parentheses) to pick out the bits you want.

str = “hello/to/you”
=> “hello/to/you”
str =~ %r{(.)/(.)}
=> 0
$1
=> “hello/to”
$2
=> “you”

woow · September 17, 2010, 9:27pm

On 9/17/2010 1:39 PM, Ruwan B. wrote:

(21:15:32:873), xyz, pid 11560

I am stuck and I would appreciate if someone could give me some help. I
am newbie to ruby…thanks

The following example will work for your singular example, but it’s
likely that it won’t work for your most general case since it assumes a
fair bit about the contents of the lines to parse. You’ll need to read
up on regular expressions in general if you need to edit this.
Fortunately, there are plenty of decent references online:

or just google for ruby regexp.

Example:

Assume str holds the string you want to parse.

str = …

str =~ %r{((.))./(.)'…(.)}
puts [$1, $2, $3].join(', ')

This regexp uses the parenthesis around the first field of interest as
the first marker and captures the text within and including the
parenthesis. Then it chews up everything following that until the last
/, which is the second marker. It captures the following text until the
', the fourth marker. Finally it throws away the next 2 characters and
captures the rest of the string.

It has captured 3 matches which become available in the special
variables $1, $2, and $3 based on the order in which they were matched
in the string. For convenience, those are put into an array, joined
with ', ', and then printed out.

HTH.

-Jeremy

woow · September 17, 2010, 10:10pm

Ruwan B. wrote:

(21:15:32:873) SMART>> Starting Process ‘/abs/nc/qwcy/xyz’. pid 11560

I want to parse the following few things from the above string

(21:15:32:873)
xyz
pid 11560

and output as below,

(21:15:32:873), xyz, pid 11560

I am stuck and I would appreciate if someone could give me some help. I
am newbie to ruby…thanks

another answer is

s = “(21:15:32:873) SMART>> Starting Process ‘/abs/nc/qwcy/xyz’. pid
11560”
s =~ /^((.?)).’/.?/.?/.?/(.?)’.(pid \d)/
puts $1
puts $2
puts $3

the parentheses that capture the text are call “capturing parentheses”.

so the regular expression /what(.)is(.)this/ would sucessfully match
the string “what mamamea this is crazy this” and the first captured
group would be " mamamea " and the second captured group would be "
crazy ".

Hopefully that makes sense.

woow · September 17, 2010, 10:13pm

sorry, I lied, the first captured group would be " mamamea this "

woow · September 17, 2010, 9:50pm

Jeremy thanks so much for your help…it works.

could you please explain further how %r{((.))./(.)'…(.)}

captures the 3 interested part from the string…

I know you have explained but if you could kindly explain more with an
easy way I think it will help me to learn…

thanks again

Jeremy B. wrote:

On 9/17/2010 1:39 PM, Ruwan B. wrote:

(21:15:32:873), xyz, pid 11560

I am stuck and I would appreciate if someone could give me some help. I
am newbie to ruby…thanks

The following example will work for your singular example, but it’s
likely that it won’t work for your most general case since it assumes a
fair bit about the contents of the lines to parse. You’ll need to read
up on regular expressions in general if you need to edit this.
Fortunately, there are plenty of decent references online:

Ruby Regexp Class - Regular Expressions in Ruby
Ruby - Regular Expressions
http://www.zenspider.com/Languages/Ruby/QuickRef.html#12
or just google for ruby regexp.

Example:

Assume str holds the string you want to parse.

str = …

str =~ %r{((.))./(.)'…(.)}
puts [$1, $2, $3].join(', ')

This regexp uses the parenthesis around the first field of interest as
the first marker and captures the text within and including the
parenthesis. Then it chews up everything following that until the last
/, which is the second marker. It captures the following text until the
', the fourth marker. Finally it throws away the next 2 characters and
captures the rest of the string.

It has captured 3 matches which become available in the special
variables $1, $2, and $3 based on the order in which they were matched
in the string. For convenience, those are put into an array, joined
with ', ', and then printed out.

HTH.

-Jeremy

woow · September 17, 2010, 10:23pm

On 9/17/2010 2:50 PM, Ruwan B. wrote:

Jeremy thanks so much for your help…it works.

could you please explain further how %r{((.))./(.)’…(.)}

captures the 3 interested part from the string…

I know you have explained but if you could kindly explain more with an
easy way I think it will help me to learn…

thanks again

Sure thing. Especially when you’re learning about regular expressions
it really helps to start by breaking them down. You often have to do
that even when you’re experienced but looking at someone else’s work.
Be aware that I’m going to gloss over quite a bit here, so you
still need to learn some basics.

We can break down this regexp into 5 parts:
((.)) <- The first capture
./ <- Noise
(.) <- The second capture
'… <- More noise
(.) <- The last capture

Anything within the unescaped parenthesis will be captured and assigned
to one of the $1, $2, etc. global variables. In the first capture, we
need to capture the literal parenthesis which appear in the string, so
we escape these with backslashes in order to remove their special
meaning to the regular expression engine. Then we want to slurp up any
text between the parenthesis, so we use “.*”.

The first bit of noise absorbs the uninteresting bits of the string
which follow up to and including the forward slash. The * operator is
greedy and dot (.) matches any single character, so .* will match
everything including forward slashes. It only stops because we
specifically said that this chunk of the regexp must end with a forward
slash. Since it would be impossible to consume the last forward slash
and still match one more, this noise accumulation stops. It’s not
within parenthesis, so it’s not captured either.

The second capture cooperates with the following noise section. The
capture part would greedily consume everything much like the first noise
section; however, the second noise section requires that a single quote
be available to start it. There must be at least 1 single quote left in
the string, so the second capture stops when there is only 1 left in
this case.

The second noise section goes on to consume 2 more characters after the
single quote. This is because of the dot dot (…) in it. Each matches
1 of any single character.

Finally the last capture section is left to consume the rest of the
string exactly as the second capture would have done had nothing else
followed it in the regexp.

When building regexps, it’s almost always a good idea to start small and
build up. I like using irb for this since it gives me rapid results.
You’ll need more basics than I can provide here though if you want to
become proficient. Good luck!

-Jeremy