Problem modifying captured regexp results

casper_the_ghost · October 20, 2006, 6:00pm

Hello,

I’m using ruby to automatically generate Fortran95 code and I’m using a
regular expression
to parse the following type of definition line:

REAL(fp), DIMENSION(Dim1,Dim2) :: Arr2 ! Description of Arr2

The regexp I’m using works fine and I build a array of hashes for each
definition, i.e.

 if line =~ componentRegexp
   # We have matched an array component definition
  arrayList<<{"type"=>$1,
              "param"=>$2,
              "dimlist"=>$3,
              "name"=>$4,
              "description"=>$5}
   puts(arrayList.last.inspect)
 else
   # No match, so raise an error
   raise StandardError, "Invalid array definition, #{$~}"
 end

which works fine. The inspect o/p gives me:

{“name”=>“Arr2”, “type”=>“REAL”, “description”=>“Description of Arr2”,
“param”=>“fp”,
“dimlist”=>“Dim1,Dim2”}

However, what I want to do is modify the dimlist in the hash so it is a
string array
“dimlist”=>[“Dim1”,“Dim2”]
rather than a single string,
“dimlist”=>“Dim1,Dim2”

Because the number of dimensions in the dimlist can vary from 1 to 7,
rather than do the
splitting in the regexp, I tried doing it in the arrayList concatenation
using the split
method like so,

   arrayList<<{"type"=>$1,
               "param"=>$2,
               "dimlist"=>$3.split(/\s*,\s*/), # <--- split dimlist

on “,”
“name”=>$4,
“description”=>$5}

but I’ve found that the above operation on the $3 captured result
appears to “wipe” the
subsequent entries $4 (name) and $5 (description). For example, the
output of
puts(arrayList.last.inspect)
on the above gives me,

{“name”=>nil, “type”=>“REAL”, “description”=>nil, “param”=>“fp”,
“dimlist”=>[“Dim1”, “Dim2”]}

Note that the “dimlist” is how I want it, but “name” and “description”
entries are now nil.

So can someone elaborate on why the above split operation on captured
regexp results seems
to bugger up the other captured results? Does this issue extend to any
operation on
captured regexp results?

I’ve looked through the pickaxe and cookbook, but no information on this
was immediately
apparent.

Thanks for any info.

cheers,

paulv

casper_the_ghost · October 20, 2006, 6:24pm

Paul van Delst schrieb:

but I’ve found that the above operation on the $3 captured result
appears to “wipe” the subsequent entries $4 (name) and $5 (description).

Paul, the problem is that #split with a Regexp internally executes some
Regexp matches which change $1, $2 etc. You have to capture the results
of the first match before executing the split.

Regards,
Pit

casper_the_ghost · October 20, 2006, 7:06pm

Pit C. wrote:

but I’ve found that the above operation on the $3 captured result
appears to “wipe” the subsequent entries $4 (name) and $5 (description).

Paul, the problem is that #split with a Regexp internally executes some
Regexp matches which change $1, $2 etc. You have to capture the results
of the first match before executing the split.

Aha! That is the answer to the question (see my other post).

Bewdy. Thanks Pit and Gavin.

cheers,

paulv

casper_the_ghost · October 21, 2006, 9:22pm

On 10/20/06, Pit C. [email protected] wrote:

but I’ve found that the above operation on the $3 captured result
appears to “wipe” the subsequent entries $4 (name) and $5 (description).

Paul, the problem is that #split with a Regexp internally executes some
Regexp matches which change $1, $2 etc. You have to capture the results
of the first match before executing the split.

In this case, it’s really easy: just reorder the lines so that the one
containing split will be the last (hash changes the order anyway):

   arrayList<<{"type"=>$1,
               "param"=>$2,
               "name"=>$4,
               "description"=>$5,
               "dimlist"=>$3.split(/\s*,\s*/)}

This will work fine as in the moment split messes up those $x, you
don’t need them any more. Obviously this would not work if there were
more splits.