Concatenating Regex Smartly

sshaikh · April 8, 2009, 11:46am

Is there a way of quickly concatenating two full string patterns in a
way that takes into account the boundaries? So for example:

\A\d+\Z and \A[a-z]+\Z

would give:

\A\d+[a-z]+\Z

?

Or is this a context sensitive situation where I’d have to parse and
join it myself? If so, what is the best way to “tokenise” a pattern?

Shak

sshaikh · April 8, 2009, 12:01pm

On 08.04.2009 11:47, Shak Shak wrote:

Is there a way of quickly concatenating two full string patterns in a
way that takes into account the boundaries? So for example:

\A\d+\Z and \A[a-z]+\Z

IIRC the “Z” must be lower case.

would give:

\A\d+[a-z]+\Z

?

Or is this a context sensitive situation where I’d have to parse and
join it myself? If so, what is the best way to “tokenise” a pattern?

Why do you have to parse them? There is a bit of context missing but
without further facts I would recommend to keep individual patterns
without the start and end anchors and only apply those after
constructing the full regexp that you want to use. My 0.02 EUR…

Kind regards

robert

sshaikh · April 8, 2009, 3:23pm

Shak Shak wrote:

\A\d+\Z and \A[a-z]+\Z

These are two regular expressions both anchored to the start and end of
the string.

If you want to match one or the other:

re1 = /\A\d+\z/
re2 = /\A[a-z]+\z/

re3 = /#{re1}|#{re2}/
=> /(?-mix:\A\d+\z)|(?-mix:\A[a-z]+\z)/

But to “concatenate” in the sense of making a regexp which matches
digits followed by letters, you need to remove the anchors.

re1 = /\d+/
re2 = /[a-z]+/

re3 = /\A#{re1}#{re2}\z/
=> /\A(?-mix:\d+)(?-mix:[a-z]+)\z/

Note that #{re1} and #{re2} are each surrounded by a non-capturing group
(?..) when they are interpolated into re3. So it should also work
properly for more complex REs, e.g.

re1 = /a|b/
re2 = /c|d/
re3 = /\A#{re1}#{re2}\z/

But if you want to be extra-certain that it’s done correctly, you can
always add your own additional layer of grouping:

re3 = /\A(?:#{re1}#{re2})\z/

sshaikh · April 8, 2009, 3:30pm

On Wed, Apr 8, 2009 at 12:00 PM, Robert K.
[email protected] wrote:

On 08.04.2009 11:47, Shak Shak wrote:

Is there a way of quickly concatenating two full string patterns in a
way that takes into account the boundaries? So for example:

\A\d+\Z and \A[a-z]+\Z
Not that I am aware of, their semantics however is slightly different:

irb(main):001:0> “abc\n” =~ /.\Z/ # \Z matches the \n
=> 2
irb(main):002:0> “abc\n” =~ /.\z/ # \z does not match the \n and neither
does .
=> nil
irb(main):003:0> “abc\n” =~ /.\z/m # Now, in multiline mode, the .
matches the \n
=> 3

Now this is for 1.9 maybe this does not hold for 1.8.
Cheers
R.