Forum: Ruby Regex returning less number of groups - where is the error?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
2bd620052d3538722c3390d906a6d154?d=identicon&s=25 Kimie Nakahara (knakahara)
on 2009-03-16 03:13
Hello!

I'm trying to build a regex to count the number of modified files by
commit. For example, for the following svn log output:

------------------------------------------------------------------------
r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
Changed paths:
   A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
   M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties

[MNG-3451] [MNG-3790] German localization for maven-core
Submitted by: Christian Schulte

o Applied with minor modifications
------------------------------------------------------------------------
r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2 lines
Changed paths:
   M /maven/components/branches/maven-2.0.x
   M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

[MNG-1830] use ISO 8601 format (not combined for readability)

------------------------------------------------------------------------

It need to return the revision number and the number of modified files,
so the result of match would be something like :

Result 1:
     1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) |
2 lines
     2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
     3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties"]
Result 2:
    1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) |
2 lines
     2. M /maven/components/branches/maven-2.0.x
     3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

I tried many variations of the following regular expression:

/(^r\d+.*?)(?:^Changed paths:\n)(^\s*[MDA]\s(?:\/[\w.-]+)+/m

but either it returns incomplete results, as:

Result 1

1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties

Result 2

1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
lines
2. M /maven/components/branches/maven-2.0.x

Would someone be able to help me on this? Thanks in advance!!

Kimie
666b4e17b4bb0e2d999037a25f65a7cb?d=identicon&s=25 Heesob Park (phasis)
on 2009-03-16 04:05
(Received via mailing list)
Hi,

2009/3/16 Kimie Nakahara <kikacilda@yahoo.com.br>:
> 
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
>   M /maven/components/branches/maven-2.0.x
> Result 1:
>     3. M
> 1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
> Would someone be able to help me on this? Thanks in advance!!
>
I guess you want this:
/(^r\d+.*?)(?:^Changed paths:\n)((?:^\s*[MDA]\s\/[\w.\/-]+\n)+)/m

Regards,

Park Heesob
2bd620052d3538722c3390d906a6d154?d=identicon&s=25 Kimie Nakahara (knakahara)
on 2009-03-16 04:51
Hi Park, thank you for the quick answer. Using your regex, I get the
following result:

Result 1

1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties

Result 2

1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
lines
2. M /maven/components/branches/maven-2.0.x
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

I returns all modified lines, but in the same group. So, for example,
to know how many files were modified in revision 727998, I would have to
work on the result[0][1] and result[1][1] ( (or 1.2 and 2.2 as in
example above) to extract the number of modified files.  Maybe splitting
the text and counting how many files were modified (1 per line).
Actually I tried to do it splitting by \n, but it didn't work. So what I
was wondering if it possible to return each modified line as a diferent
group, as I said in the first email.

Or, if you can suggest a way to break the lines of results[0][1] and
results[1][1], it would make it too!

Thank you!
Kimie
666b4e17b4bb0e2d999037a25f65a7cb?d=identicon&s=25 Heesob Park (phasis)
on 2009-03-16 05:47
(Received via mailing list)
2009/3/16 Kimie Nakahara <kikacilda@yahoo.com.br>:
> 
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties
> to know how many files were modified in revision 727998, I would have to
> work on the result[0][1] and result[1][1] ( (or 1.2 and 2.2 as in
> example above) to extract the number of modified files.  Maybe splitting
> the text and counting how many files were modified (1 per line).
> Actually I tried to do it splitting by \n, but it didn't work. So what I
> was wondering if it possible to return each modified line as a diferent
> group, as I said in the first email.
>
> Or, if you can suggest a way to break the lines of results[0][1] and
> results[1][1], it would make it too!
>
Or try with the more simple regex:
/(^r\d+.*?)(?:^Changed paths:\n)|(^\s*[MDA]\s\/[\w.\/-]+\n)/m

Regards,

Park Heesob
2bd620052d3538722c3390d906a6d154?d=identicon&s=25 Kimie Nakahara (knakahara)
on 2009-03-17 05:56
Yep, the simpler regex does the job. Thanks a lot!
This topic is locked and can not be replied to.