Regex simplifier?

rogerdpack · October 5, 2009, 2:28pm

Question.
Currently I am somewhat of a novice to regex’s.
For example, I can’t remember what \d means versus \D – which one is a
digit, and which one isn’t?

I’m wondering if there’s any tool out there anyone knows about that
simplifier this, like
DIGIT* (in the regex) or something so that it’s easier to understand
what you’re doing.

Thanks.
-r

rogerdpack · October 5, 2009, 3:08pm

2009/10/5 Roger P. [email protected]:

Question.
Currently I am somewhat of a novice to regex’s.
For example, I can’t remember what \d means versus \D – which one is a
digit, and which one isn’t?

I’m wondering if there’s any tool out there anyone knows about that
simplifier this, like
DIGIT* (in the regex) or something so that it’s easier to understand
what you’re doing.

I tried to cook something up in the past. You can find it when
searching the archives for subject “Alternate Regular Expressions?”.
HTH

Kind regards

robert

rogerdpack · October 5, 2009, 3:27pm

Roger P. wrote:

Question.
Currently I am somewhat of a novice to regex’s.
For example, I can’t remember what \d means versus \D – which one is a
digit, and which one isn’t?

Thanks.
-r

Roger,

After you play with it for awhile, it starts to follow a fairly
consistent pattern.

Usually, a capital implies a negation… for example
\D <- NOT a number
\S <- NOT a space(like) character
\W <- NOT a word character…

etc…

rogerdpack · October 5, 2009, 3:28pm

On Monday 05 October 2009 09:28:55 Roger P. wrote:

Question.
Currently I am somewhat of a novice to regex’s.
For example, I can’t remember what \d means versus \D – which one
is a
digit, and which one isn’t?

I’m wondering if there’s any tool out there anyone knows about that
simplifier this, like
DIGIT* (in the regex) or something so that it’s easier to understand
what you’re doing.

Thanks.
-r

From man perlre

\d Match a digit character
\D Match a non-digit character

RE typically uses lowercase characters to mean direct sense, and
uppercase characters to mean the contrary, for example, \s matches
whitespace, \S matches non-whitespace, \w word, \W non-word, etc.

–
angico

Site: angico.org
Blog: angico.org/blog

Gnu/Linux, FLOSS, Espiritismo, e eu por mim mesmo 8^I

== CooperaÃ§Ã£o Ã© muito melhor que competiÃ§Ã£o ==

rogerdpack · October 5, 2009, 6:49pm

I tried to cook something up in the past. You can find it when
searching the archives for subject “Alternate Regular Expressions?”.
HTH

Very interesting thread. Did anything come of it? (or florian gross’
Regexp::English looks nice [2]-- Florian?)

I like this syntax (this example matches things like “-2.718 + 3.14i”):

PAT.float[‘re’] + REP0.whitespace + ALT(“+”, “-”)[‘op’] +
REPO.whitespace + PAT.float[‘im’] + ‘i’ [1]

I’m not sure how to do nested matches or optionals or what not however.

I suppose that’s the equivalent of (in 1.9)
float = /[-+]?\d+.\d+/
whitespace = /\s+/

%r{(?#{float})#{whitespace}(?[±])#{whitespace}(?#{float})i}

But the kicker is still how to remember that \s is white space, not \w.
And we’ve been forced to do some tip toeing around the complexities of
regex in order to make it readable.

I had thought of a regex creator helper

float = reg { optional(/[-+]/) + ‘DIGIT+ . DIGIT+’ }

The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means

Then you can mix it into 1.9 style regex’s the same way.

Thoughts?
-r

[1]

[2]
http://markmail.org/message/rzudqptkuls7dncy?q=Regexp::English+gross&page=1&refer=cuj6ru2rprrvh2sm

rogerdpack · October 5, 2009, 7:23pm

Hi –

On Tue, 6 Oct 2009, Roger P. wrote:

REPO.whitespace + PAT.float[‘im’] + ‘i’ [1]
And we’ve been forced to do some tip toeing around the complexities of
regex in order to make it readable.

I had thought of a regex creator helper

float = reg { optional(/[-+]/) + ‘DIGIT+ . DIGIT+’ }

The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means

I’m going to put in a plug for learning regular expression syntax
itself in a reasonably thorough way. I won’t try to make a case about
what’s more readable, since that clearly depends on the reader, but I
do strongly recommend that everyone take the time to become regex
literate. The programming world in general is not going to convert to
English-language-based regex wrappers (which, though some such
projects are interesting, is a mercy, because such wrappers could
easily start proliferating and competing with each other, turning the
whole thing into yet another notation soup), so the only way to
participate fully in the use of regular expressions is to be
conversant with the actual notation.

David

rogerdpack · October 5, 2009, 7:05pm

Roger P. wrote:

The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means

Roger,

Do you have to remember how to talk every day? or how to add 2
numbers?.. Regexs’ will just stick in your memory like everything else
and there will be no need to be worried about remembering… just use
them and they will stick… just like you already knew that 5 + 3 = 8
without cross referencing a table.

Remembering the name of the “help” utility however would be a pain in
the but…

ilan

rogerdpack · October 5, 2009, 8:30pm

[…]

so the only way to

participate fully in the use of regular expressions is to be

conversant with the actual notation.

signed.

I really liked that site: http://www.regular-expressions.info/
helped me understand how stuff works around regular
expressions. And of course my text editor with regular
expression search and replace. As soon as you know
how it works, you won’t stop using it

And by the way, I still have to
man perlre
everytime I want to use look around or similarly complex
stuff, so it’s good to know where to get the answers.
Yet I’d never use something like
lookahead(…)
simply because looking it up in the man page is just about
as time consuming as typing “lookahead” all the time;
but (?=) is much more compact

Greetz!

rogerdpack · October 5, 2009, 10:16pm

On 10/05/2009 06:49 PM, Roger P. wrote:

I tried to cook something up in the past. You can find it when
searching the archives for subject “Alternate Regular Expressions?”.
HTH

Very interesting thread. Did anything come of it? (or florian gross’
Regexp::English looks nice [2]-- Florian?)

Yes, one of my postings had a file attached which contained an
implementation. I believe Ari also created a project on rubyforge. We
certainly did some more polishing of the code but unfortunately I don’t
have the latest version handy.

Actually it’s available as gem but it’s definitive not the latest
version that I wrote because it does not contain the optimization for
multiple fixed strings in an alternative. I have to see whether I find
that version somewhere.

%r{(?#{float})#{whitespace}(?[±])#{whitespace}(?#{float})i}

Personally I do not like the approach with string interpolation. I’d
rather extend the approach of TextualRegexp to include human readable
variants of these meta sequences via method calls.

But the kicker is still how to remember that \s is white space, not \w.
And we’ve been forced to do some tip toeing around the complexities of
regex in order to make it readable.

Actually, once you have got used to them and take a bit of care they are
pretty readable. For example /x goes a long way in making complex
expressions readable by letting you insert whitespace and comments.

Kind regards

robert

rogerdpack · October 5, 2009, 10:25pm

On 10/05/2009 03:27 PM, Ilan B. wrote:

Roger P. wrote:

Currently I am somewhat of a novice to regex’s.
For example, I can’t remember what \d means versus \D – which one is a
digit, and which one isn’t?

After you play with it for awhile, it starts to follow a fairly
consistent pattern.

Usually, a capital implies a negation… for example
\D <- NOT a number
\S <- NOT a space(like) character
\W <- NOT a word character…

etc…

The fact that there apparently are not many libraries for readable
regular expressions in use seems to indicate that people are mostly
using the native syntax of a regexp engine. Apparently it’s not that
hard.

Roger, I recommend “Mastering Regular Expressions” - that’s a really
good book on the matter and it covers the topic quite well without
delving too deep into the theory of formal language.

Kind regards

robert

rogerdpack · October 6, 2009, 2:53pm

Hi –

On Tue, 6 Oct 2009, Roger P. wrote:

The fact that there apparently are not many libraries for readable
regular expressions in use seems to indicate that people are mostly
using the native syntax of a regexp engine. Apparently it’s not that
hard.

Ok ok I’ll concede and suck it in and learn regular expressions. My
only misgiving is that regular expressions are typically so “write only”
that they don’t feel very ruby-y.
Thanks for the pointers!

I think it’s just practice, like learning musical notation or
whatever (only it’s not as elaborate as musical notation). It’s all
about atoms and quantifiers. Cling to that and you’ll be fine

David

rogerdpack · October 6, 2009, 1:23pm

The fact that there apparently are not many libraries for readable
regular expressions in use seems to indicate that people are mostly
using the native syntax of a regexp engine. Apparently it’s not that
hard.

Ok ok I’ll concede and suck it in and learn regular expressions. My
only misgiving is that regular expressions are typically so “write only”
that they don’t feel very ruby-y.
Thanks for the pointers!
-r

rogerdpack · February 17, 2011, 12:22am

%r{(?#{float})#{whitespace}(?[±])#{whitespace}(?#{float})i}

But the kicker is still how to remember that \s is white space, not \w.
And we’ve been forced to do some tip toeing around the complexities of
regex in order to make it readable.

I had thought of a regex creator helper

float = reg { optional(/[-+]/) + ‘DIGIT+ . DIGIT+’ }

I just found something related, for followers:

posix characters are apparently embedded in there:

[:blank:]

like

’ a '.scan /[[:blank:]]/
=> [" ", " “]
’ a '.scan /[^[:blank:]]/
=> [” ", " "]

ref: http://psoug.org/reference/regexp.html

Though I still wouldn’t be averse to something like
Regexp::DIGIT => ‘\d’

So one can do “#{Regexp::DIGIT}*”

rogerdpack · February 17, 2011, 10:53am

On Thu, Feb 17, 2011 at 12:22 AM, Roger P. [email protected]
wrote:

I just found something related, for followers:

ref: http://psoug.org/reference/regexp.html

Though I still wouldn’t be averse to something like
Regexp::DIGIT => ‘\d’

So one can do “#{Regexp::DIGIT}*”

Did I mention that what Ari and I cooked up a while ago is a gem?

http://rubygems.org/gems/TextualRegexp

Cheers

robert

rogerdpack · February 18, 2011, 1:25am

So one can do “#{Regexp::DIGIT}*”

Did I mention that what Ari and I cooked up a while ago is a gem?

TextualRegexp | RubyGems.org | your community gem host

Cool, but the default rdoc’s don’t explain its use easily…perhaps
there is a url I could refer to?
-r

rogerdpack · February 18, 2011, 9:13am

On Fri, Feb 18, 2011 at 1:25 AM, Roger P. [email protected]
wrote:

So one can do “#{Regexp::DIGIT}*”

Did I mention that what Ari and I cooked up a while ago is a gem?

TextualRegexp | RubyGems.org | your community gem host

Cool, but the default rdoc’s don’t explain its use easily…perhaps
there is a url I could refer to?

Errr, the project got stuck along the way as there did not seem to be
too much interest at the time. It was more of me toying around. IMHO
it should not be too hard to find out looking at the source code.

There’s a mini example
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/263918

If I find the time I might come up with more thorough documentation.

Cheers

robert

rogerdpack · February 17, 2011, 10:59am

Hey gang,
This is my first response. I don’t know if this helps, but I use this
site
quite a bit to clarify regular expressions when I am working with them.

Regards,
Eben Smith

On Thu, Feb 17, 2011 at 2:53 AM, Robert K.

Regex simplifier?

– angico

–
angico