String gsub to replace named capture groups

I have a regexp that looks like this (simplified a bit):

 pattern = /
 (?<lq>
   %7B                        # single left brace, followed by
   %(?:25|(?![A-Fa-f0-9]{2})) # plaintext '%' or encoded as %25
 )
 |                            # or
 (?<rq>
   %(?:25|(?![A-Fa-f0-9]{2})) # plaintext '%' or encoded as %25
   %7D                        # followed by right brace
 )
 /x

(I know this looks similar to CGI.unescape, but there are special
cases.)

I want to use this to replace the captured values in a string. Using
gsub with a hash seems like the closest thing, but I actually want to
replace based on the capture group names, not the captured values.

 string = "%7B% test %%7D"
 replacements = {'\k<lq>' => '{{', '\k<rq>' => '}}'}
 expected = "{% string %}"

 string.gsub(pattern, replacements)

Is it possible?

Andrew

On 14-04-15, 16:28, Andrew V. wrote:

 replacements = {'\k<lq>' => '{{', '\k<rq>' => '}}'}
 expected = "{% string %}"

Oops: replacements = {’\k’ => ‘{%’, ‘\k’ => ‘%}’}

Andrew V.

You could take advantage that the match sets the (thread local) $~ (also
known as $LAST_MATCH_INFO if you use the English module), so this seems
to work.

#!/usr/bin/env ruby

require ‘English’

pattern = /
(?
%7B # single left brace, followed by
%(?:25|(?![A-Fa-f0-9]{2})) # plaintext ‘%’ or encoded as %25
)
| # or
(?
%(?:25|(?![A-Fa-f0-9]{2})) # plaintext ‘%’ or encoded as %25
%7D # followed by right brace
)
/x

string = “%7B% test %%7D”
replacements = { ‘lq’ => ‘{%’, ‘rq’ => ‘%}’ }

puts string.gsub(pattern) {
matched_name = $LAST_MATCH_INFO.names.find { |n| $LAST_MATCH_INFO[n] }
replacements[matched_name]
}

produces:

~/tmp ∙ ruby try.rb
{% test %}

Hope this helps,

Mike

On Apr 15, 2014, at 7:32 PM, Andrew V. [email protected] wrote:

On 14-04-15, 16:28, Andrew V. wrote:

replacements = {'\k<lq>' => '{{', '\k<rq>' => '}}'}
 expected = "{% string %}"

Oops: replacements = {‘\k’ => ‘{%’, ‘\k’ => ‘%}’}

Andrew V.

Mike S. [email protected]
http://www.stok.ca/~mike/

The “`Stok’ disclaimers” apply.

On 14-04-15, 18:01, Mike S. wrote:

puts string.gsub(pattern) {
matched_name = $LAST_MATCH_INFO.names.find { |n| $LAST_MATCH_INFO[n] }
replacements[matched_name]
}

+1000 ruby points!

I was actually just arriving at a very similar solution myself:

 replacements = { 'lq' => '{%', 'rq' => '%}' }

 string.gsub(pattern) {
   i = $~.captures.index { |i| !i.nil? }
   k = $~.names[i]
   replacements[k]
 }

(I wonder why there is no better syntax for the MatchData#find…)

My previous attempt used a case statement instead of the lookup hash.
I’m not sure which is more performant yet:

 string.gsub(pattern) {
   case
   when $~['ll'] then '{{'
   when $~['rr'] then '}}'
   when $~['lq'] then '{%'
   when $~['rq'] then '%}'
   when $~['sp'] then ' '
   end
 }

Thanks!
Andrew V.

On Wed, Apr 16, 2014 at 3:43 AM, Andrew V. [email protected] wrote:

I was actually just arriving at a very similar solution myself:
(I wonder why there is no better syntax for the MatchData#find…)
Just a caveat: this approach works only because match of your
capturing extends to the whole expression match. It will break if you
have text matched before or after the group which should stay as is.
If that is the case you would have to either use lookaround if
possible OR introduce groups for the prefix and / or suffix and use
them to construct the replacement.

If there was only one match then you could use String#[] for that, e.g.

irb(main):001:0> s=“foobar”
=> “foobar”
irb(main):002:0> s[/fo+(b)/, 1]=“X”
=> “X”
irb(main):003:0> s
=> “fooXar”

I guess this does not apply in your case.

  end
}

You’d have to benchmark. This is fairly easy with module Benchmark. :slight_smile:

Kind regards

robert

On 14-04-16, 2:20, Robert K. wrote:

Just a caveat: this approach works only because match of your
capturing extends to the whole expression match. It will break if you
have text matched before or after the group which should stay as is.
If that is the case you would have to either use lookaround if
possible OR introduce groups for the prefix and / or suffix and use
them to construct the replacement.

I guess this does not apply in your case.

Right, that makes sense; in this case my pattern is just a list of named
captures union’d together.

The previous implementation used a series of gsub’s to replace each
pattern one at a time (same as your String[/x/,‘y’] suggestion), but
this was slow because it had to traverse a large string several times,
and I suspect also allocating a whole new string for each iteration.

Thanks!
Andrew V.

If I understand the problem correctly I believe you can solve it passing
a
block to gsub.

You can do that with regular captures indeed:

str.gsub(regexp) do |_|
   if $1
    # the first group matched
  else
    # the second group matched
  end
end

That is based on the fact that groups are strictly numbered left to
right
as their open parens appear in the regexp, and if the group didn’t match
as
it may happen in an alternation, then it evaluates to nil.

With named captures you’d check $~[:lq] and $~[:rq] instead, same
principle
re nil.