I’m trying to treat LaTeX files - is there a trick to remove occurrences
of {\large … } from strings like this one:
{\large this is a fa\c{c}ade,} this too: {\large façade}
so that it becomes:
this is a fa\c{c}ade, this too: façade
I’m trying to treat LaTeX files - is there a trick to remove occurrences
of {\large … } from strings like this one:
{\large this is a fa\c{c}ade,} this too: {\large façade}
so that it becomes:
this is a fa\c{c}ade, this too: façade
2012/3/2 Wybo D. [email protected]:
I’m trying to treat LaTeX files - is there a trick to remove occurrences
of {\large … } from strings
I assume that curly braces can be arbitrarily nested within the \large
sections. In this case the problem in general form can’t be solved
using simply regular expressions - you’d have to actually parse the
LaTeX, or use a crazy flavor such as .NET’s one that actually allows
it.
However. If there’s a limit in the nesting - for example you can tell
that nowhere in the text curlies within \large are nested more than
one deep, there are solutions. Let’s start with simplest regex:
/{\large [^{}]+}/. This works for your second example, but fail for
first. But we can fix it: consider this regex: /{\large
(?:[^{}]+|{[^}]+})+}/. The parenthesized part will match either a
lot of non-brace characters, or a lot of non-brace characters
surrounded by one pair of braces, and then repeat it - and it works
for both of your examples. If you wan’t to go deeper ;), you can once
again stick another alternative into the alt. with curlies - this
works to any depth (I’ll leave it as exercise to the reader ). Of
course, there can always be constructed input on which this method
will fail, so you have to be ready to either accept that it stops
working correctly sometimes, or somehow pre-control your input.
When constructing regexes like this, you have to be careful to avoid
catastrophic backtracking, or matching your regex in worst case will
have exponential complexity.
Here’s a nice tool to try these regexps: http://regexpal.com/
– Matma R.
On 2012-03-02 15:47, Bartosz Dziewoński wrote:
Let’s start with simplest regex:
/{\large [^{}]+}/. This works for your second example, but fail for
first. But we can fix it:
consider this regex: /{\large(?:[^{}]+|{[^}]+})+}/.
Thanks! This is very useful, albeit a braintwister…
And I like the http://regexpal.com/
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs