Regular expressions question

akoSS · December 14, 2005, 10:04pm

hello,

i need to capture all matches for a group. for example if

‘ab c’ =~ /^(.)*$/

i would like to get array [ ‘a’, ‘b’, ’ ', ‘c’ ]

could not figure out how to do it in ruby. String#scan did not seem to
be the right thing. please help.

thanks
konstantin

akoSS · December 14, 2005, 10:17pm

On Dec 14, 2005, at 3:02 PM, ako… wrote:

hello,

i need to capture all matches for a group. for example if

‘ab c’ =~ /^(.)*$/

i would like to get array [ ‘a’, ‘b’, ’ ', ‘c’ ]

could not figure out how to do it in ruby. String#scan did not seem to
be the right thing. please help.

When using scan(), you need to remove the anchoring:

“ab c”.scan(/./)
=> [“a”, “b”, " ", “c”]

Hope that helps.

James Edward G. II

akoSS · December 14, 2005, 10:35pm

On Wed, 14 Dec 2005 21:00:56 -0000, ako… [email protected] wrote:

i need to capture all matches for a group. for example if

‘ab c’ =~ /^(.)*$/

i would like to get array [ ‘a’, ‘b’, ’ ', ‘c’ ]

You could try:

irb(main):001:0> “ab c”.split(‘’) # split on nothing
=> [“a”, “b”, " ", “c”]

irb(main):002:0> “ab c”.split(//) # same again
=> [“a”, “b”, " ", “c”]

irb(main):003:0> “ab c”.scan(/./) # scan on any single char
=> [“a”, “b”, " ", “c”]

akoSS · December 14, 2005, 10:38pm

thank you. this was just an example. in general, is it possible to get
a collection of captures for a group without having to write custom
code?

akoSS · December 14, 2005, 11:05pm

thank you. the question is general.

if i wanted to parse a list of letters separated by spaces and commas:

‘a , b,c’ =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get [‘a’,‘b’] in group 1 and [‘c’] in group 2. yes, i know i
can split, then massage the result some more and get the final result.
is there a way to get to groups’ captures after a regex match? like in
microsoft’s .net?

akoSS · December 14, 2005, 11:05pm

On Wed, 14 Dec 2005 21:34:52 -0000, ako… [email protected] wrote:

thank you. this was just an example. in general, is it possible to get
a collection of captures for a group without having to write custom
code?

Have to admit I’m not exactly a regex wiz, but I imagine it can be done
somehow. I assume you mean having a repeated capturing group append to
an
array any number of times?

But, I still think scan is a good tool for the job, it can do any regexp
anyway. I don’t think a single regexp is really intended for doing
variable numbers of captures anyway (?) ).

irb(main):054:0> “ab c”.scan(/\w|\s/)
=> [“a”, “b”, " ", “c”]

or

irb(main):052:0> “this is a test”.scan(/\w+/)
=> [“this”, “is”, “a”, “test”]

or even

irb(main):053:0> “this is a test”.scan(/\w+|\s/)
=> [“this”, " ", “is”, " ", “a”, " ", “test”]

Cheers,
Ross

akoSS · December 14, 2005, 11:23pm

On Dec 14, 2005, at 4:03 PM, ako… wrote:

thank you. the question is general.

if i wanted to parse a list of letters separated by spaces and commas:

‘a , b,c’ =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get [‘a’,‘b’] in group 1 and [‘c’] in group 2. yes, i know i
can split, then massage the result some more and get the final result.
is there a way to get to groups’ captures after a regex match? like in
microsoft’s .net?

Perl-style variables:

“abc” =~ /(.)(.)(.)/
=> 0

p [$1, $2, $3]
[“a”, “b”, “c”]
=> nil

Or object oriented:

md = “abc”.match(/(.)(.)(.)/)
=> #MatchData:0x325dc8

p [md[1], md[2], md[3]]
[“a”, “b”, “c”]
=> nil

Hope that helps.

James Edward G. II

akoSS · December 15, 2005, 1:00am

On Wed, 14 Dec 2005 21:59:27 -0000, ako… [email protected] wrote:

I don’t really get what you mean. I don’t understand the rules that got
a
and b into one group and c into another. When you say it’s a general
question, do you mean you just want access to the captures from some
regexp match?

irb(main):009:0> “a , b,c” =~ /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/
=> 0
irb(main):010:0> $1
=> “a , b”
irb(main):011:0> $2
=> “c”
irb(main):012:0> $~[1]
=> “a , b”
irb(main):013:0> $~[2]
=> “c”
irb(main):014:0> md = /(\w\s*?,\s*?\w)\s*?,\s*?(\w)/.match(“a, b,c”)
=> #MatchData:0xb7a47860
irb(main):015:0> md[1]
=> “a, b”
irb(main):016:0> md.captures[1]
=> “c”
irb(main):017:0> $~.inspect
=> “#MatchData:0xb7a47860”

(and others…)

Hope that helps,
Ross

akoSS · December 15, 2005, 1:18am

You should be able to tell who this message is meant for:

PLEASE stop sending out code that uses any of the perl ${x} variables
…

They are ugly and have no place in Ruby … they are only provided to
make the transition of Perl people easier …

Please teach people to use MatchData objects …

my_regex = /(\w\s*?.\s*?\w)\s*?.\s*?(\w)/

matches = my_regex.match( “a , b,c” )

element 0 of the matches object will contain the complete matched
string.

each element after that will map to one of the groups you defined …

so:

matches[0] will be the whole string
“a , b,c”
matches[1] will be your first group
“a , b”
matches[2] will be your second group
“c”

… seriously, we’re not helping people make cleaner code when we show
approval for the ugly/evil ${x} warts we’ve kept from Perl.

… show people the beauty and cleanliness of using an OOP solution …

I hope you agree.

j.

On 12/14/05, Ross B. [email protected] wrote:

is there a way to get to groups’ captures after a regex match? like in
irb(main):010:0> $1
=> “a, b”
–
Ross B. - [email protected]
“\e[1;31mL”

–
“Remember. Understand. Believe. Yield! → http://ruby-lang.org”

Jeff W.

akoSS · December 15, 2005, 2:03am

ako… wrote:

thank you. the question is general.

if i wanted to parse a list of letters separated by spaces and commas:

‘a , b,c’ =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get [‘a’,‘b’] in group 1 and [‘c’] in group 2. yes, i know i
can split, then massage the result some more and get the final result.
is there a way to get to groups’ captures after a regex match? like in
microsoft’s .net?

t = ‘a , b,c’.split( /\s*,\s*/ )
group1 = t[0…-2]
group2 = t[-1,1]

akoSS · December 15, 2005, 2:27am

From: “Jeff W.” [email protected]

PLEASE stop sending out code that uses any of the perl ${x} variables …

They are ugly and have no place in Ruby … they are only provided to
make the transition of Perl people easier …

Thankfully, this is Ruby, and not Python with its rigid
Only One Way mentality.

Myself, though I’ve been aware of MatchData for going on
five years now, I find I don’t use it that often. The
$1…$n variables are perfectly legible to me. They have
a fine history too: not just Perl but awk, and Unix shell
programming . . .

Regards,

Bill

akoSS · December 15, 2005, 2:09am

On Thu, 15 Dec 2005 00:16:52 -0000, Jeff W. [email protected]
wrote:

You should be able to tell who this message is meant for:

Why not just address me directly?

PLEASE stop sending out code that uses any of the perl ${x} variables …

Well, okay. No need to shout though, is there?

Just trying to put a bit back, you know?

akoSS · December 15, 2005, 2:30am

Ross B. wrote:

Well, okay. No need to shout though, is there?

Just trying to put a bit back, you know?

Ross, don’t pay too much attention to unreasonable fanatics.

The first edition of the Pickaxe says:

“Having said all this, we have to 'fess up. Andy and Dave normally
use the $-variables rather than worrying about MatchData objects.
For everyday use, they just end up being more convenient.
Sometimes we just can’t help being pragmatic.”

akoSS · December 15, 2005, 2:54am

On 12/14/05, Jeff W. [email protected] wrote:

my_regex = /(\w\s*?.\s*?\w)\s*?.\s*?(\w)/
“a , b,c”
I hope you agree.

‘a , b,c’ =~ /^(?:(\w)\s*,\s*)*(\w)$/
question, do you mean you just want access to the captures from some
irb(main):013:0> $~[2]
(and others…)

–
“Remember. Understand. Believe. Yield! → http://ruby-lang.org”

Jeff W.

Regular expressions is the only area I still use Perl magic variables
because it’s concise, readable, and works well in that context. It feels
like a regexp standard to me.

The other magic variables I’ve dispensed with.

Nick

akoSS · December 15, 2005, 3:06am

Hi,

From: “ako…” [email protected]

i give up. there seems to be no way to get all the captures for a
group. the corresponding $ variable just has the last one.

Could you help us to understand why #scan didn’t meet your needs?

Called without a block, #scan returns an array of matches:

“abc--------abc--------abc”.scan(/(a)(b)(c)/)
=> [[“a”, “b”, “c”], [“a”, “b”, “c”], [“a”, “b”, “c”]]

Called with a block, #scan calls your block each time a match is
found:

“abc--------abc--------abc”.scan(/(a)(b)(c)/) { puts “#$1, #$2, #$3” }
a, b, c
a, b, c
a, b, c

Hope this helps,

Bill

akoSS · December 15, 2005, 3:15am

Bill,

scan does not help because it can match a portion of the source string,
and what is in between the matches is skipped. so scan is just a
special case of the functionality that i was looking for. i need to
make sure the whole string has a defined structure and get parts of it
as groups.

konstantin

akoSS · December 15, 2005, 3:00am

i give up. there seems to be no way to get all the captures for a
group. the corresponding $ variable just has the last one. thanks to
everyone who responded. sorry, did not mean to start a war over
people’s coding styles.

konstantin

akoSS · December 15, 2005, 3:24am

William J. wrote:

Ross, don’t pay too much attention to unreasonable fanatics.

The first edition of the Pickaxe says:

“Having said all this, we have to 'fess up. Andy and Dave normally
use the $-variables rather than worrying about MatchData objects.
For everyday use, they just end up being more convenient.
Sometimes we just can’t help being pragmatic.”

How convenient that you quote that without quoting the drawbacks listed
first…

Sheesh, if you want perl, use perl.

–
Neil S. - [email protected]

‘A republic, if you can keep it.’ – Benjamin Franklin

akoSS · December 15, 2005, 3:46am

From: “ako…” [email protected]

scan does not help because it can match a portion of the source string,
and what is in between the matches is skipped. so scan is just a
special case of the functionality that i was looking for. i need to
make sure the whole string has a defined structure and get parts of it
as groups.

Ah, OK thanks. From your earlier post:

if i wanted to parse a list of letters separated by spaces and commas:

‘a , b,c’ =~ /^(?:(\w)\s*,\s*)*(\w)$/

i need to get [‘a’,‘b’] in group 1 and [‘c’] in group 2.

What about:

‘a , b,c’ =~ /^((?:\w\s*,\s*)*)(\w)$/
last_match = $2
first_matches = $1.scan(/\w/)

Since we first verified the whole string conforms to the required
pattern, we can then safely perform the scan on the captured group
to obtain the individual matches.

Or we could write the scan using look-ahead assertions, as another
way to prevent the skipping of in-between parts:

str = ‘a , b,c’

first verify whole pattern matches, and get final match group

if str =~ /^(?:\w\s*,\s*)(\w)$/
last_match = $1
first_matches =
str.scan(/(?:(\w)\s,\s*)(?=(?:\w\s*,\s*)*\w$)/).flatten
end

last_match => “c”

first_matches => [“a”, “b”]

HTH,

Bill

akoSS · December 15, 2005, 4:37am

On Dec 14, 2005, at 6:16 PM, Jeff W. wrote:

You should be able to tell who this message is meant for:

Yes, I recognize that you are probably speaking at least in part to
me, since I did that in this very thread. You can call me by name if
you like. I’m a big boy and I can take it.

PLEASE stop sending out code that uses any of the perl ${x}
variables …

Hang on there Mr. Code Police. Let’s not lay down the law down too
heavily before we get into this…

They are ugly and have no place in Ruby … they are only provided to
make the transition of Perl people easier …

I seriously doubt those variables were invented in Perl. They are a
common feature to many Regular Expression implementation and I’m not
sure they are even that ugly. $1 holds what was grabbed by the first
set of parenthesis. Fairly logical.

Please teach people to use MatchData objects …

I also showed a MatchData example.

I’ve used them a time or two, but honestly, they just don’t feel
right to me. I’ve stopped using the default variable, I’m using a
two-space tab, etc. I’m Ruby assimilated, but I just like the Regexp-
linked variables.

I see a lot of code running the Ruby Q. and I feel quite confident
saying that the Regexp variables are far more common than MatchData.
I don’t think that says anything bad about the latter, but it does
tell me that you are in the minority.

We won’t yell at you for using MatchData, if you’ll provide the same
consideration…

James Edward G. II