Regex returning less number of groups - where is the error?


#1

Hello!

I’m trying to build a regex to count the number of modified files by
commit. For example, for the following svn log output:


r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
lines
Changed paths:
A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties

[MNG-3451] [MNG-3790] German localization for maven-core
Submitted by: Christian Schulte

o Applied with minor modifications

r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2 lines
Changed paths:
M /maven/components/branches/maven-2.0.x
M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

[MNG-1830] use ISO 8601 format (not combined for readability)


It need to return the revision number and the number of modified files,
so the result of match would be something like :

Result 1:
1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) |
2 lines
2. A
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties"]
Result 2:
1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) |
2 lines
2. M /maven/components/branches/maven-2.0.x
3. M
/maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

I tried many variations of the following regular expression:

/(^r\d+.?)(?:^Changed paths:\n)(^\s[MDA]\s(?:/[\w.-]+)+/m

but either it returns incomplete results, as:

Result 1

  1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
    lines
  2. A
    /maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties

Result 2

  1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
    lines
  2. M /maven/components/branches/maven-2.0.x

Would someone be able to help me on this? Thanks in advance!!

Kimie


#2

Hi,

2009/3/16 Kimie N. removed_email_address@domain.invalid:

/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
 M /maven/components/branches/maven-2.0.x
Result 1:
  3. M

  1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
    Would someone be able to help me on this? Thanks in advance!!

I guess you want this:
/(^r\d+.?)(?:^Changed paths:\n)((?:^\s[MDA]\s/[\w./-]+\n)+)/m

Regards,

Park H.


#3

Hi Park, thank you for the quick answer. Using your regex, I get the
following result:

Result 1

  1. r727998 | bentmann | 2008-12-19 21:50:48 +1100 (Fri, 19 Dec 2008) | 4
    lines
  2. A
    /maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_de.properties
    M
    /maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties

Result 2

  1. r727871 | brett | 2008-12-19 11:48:07 +1100 (Fri, 19 Dec 2008) | 2
    lines
  2. M /maven/components/branches/maven-2.0.x
    M
    /maven/components/branches/maven-2.0.x/maven-core/src/main/java/org/apache/maven/cli/MavenCli.java

I returns all modified lines, but in the same group. So, for example,
to know how many files were modified in revision 727998, I would have to
work on the result[0][1] and result[1][1] ( (or 1.2 and 2.2 as in
example above) to extract the number of modified files. Maybe splitting
the text and counting how many files were modified (1 per line).
Actually I tried to do it splitting by \n, but it didn’t work. So what I
was wondering if it possible to return each modified line as a diferent
group, as I said in the first email.

Or, if you can suggest a way to break the lines of results[0][1] and
results[1][1], it would make it too!

Thank you!
Kimie


#4

2009/3/16 Kimie N. removed_email_address@domain.invalid:

/maven/components/branches/maven-2.0.x/maven-core/src/main/resources/org/apache/maven/messages/messages_en.properties
to know how many files were modified in revision 727998, I would have to
work on the result[0][1] and result[1][1] ( (or 1.2 and 2.2 as in
example above) to extract the number of modified files. Â Maybe splitting
the text and counting how many files were modified (1 per line).
Actually I tried to do it splitting by \n, but it didn’t work. So what I
was wondering if it possible to return each modified line as a diferent
group, as I said in the first email.

Or, if you can suggest a way to break the lines of results[0][1] and
results[1][1], it would make it too!

Or try with the more simple regex:
/(^r\d+.?)(?:^Changed paths:\n)|(^\s[MDA]\s/[\w./-]+\n)/m

Regards,

Park H.


#5

Yep, the simpler regex does the job. Thanks a lot!