RegExp help

cpjolicoeur · December 12, 2006, 7:26pm

I need some help with a regular expression for a validates_format_of
statement in my model. I have a user login field and i only want to
allow the login field to have alphanumeric characters and the underscore
( a-z, A-Z, 1-9, _ ) those are the only characters I want to allow.

What is the properly Ruby RegExp to do this that I would use in the
:with => // statement of the validates_format_of

cpjolicoeur · December 12, 2006, 7:33pm

Hi –

On Tue, 12 Dec 2006, Craig J. wrote:

I need some help with a regular expression for a validates_format_of
statement in my model. I have a user login field and i only want to
allow the login field to have alphanumeric characters and the underscore
( a-z, A-Z, 1-9, _ ) those are the only characters I want to allow.

What is the properly Ruby RegExp to do this that I would use in the
:with => // statement of the validates_format_of

The \w character class is all alphanumerics plus underscore – and the
\W character class is the opposite. Assuming you really don’t want to
allow zero, you could do:

:with => /[^\W0]+/

i.e., no character (that’s the ^) that is either \W or 0.

Note, however, that there’s been some flux in the question of whether
or not your regex gets automatically wrapped by beginning and
end-of-string anchors. That regex assumes that the anchors are added
(though I hope in the long run they aren’t). Try some tests, and if
you need to, you can wrap it in anchors like this:

/\A[^\W0]+\z/

David

–
Q. What’s a good holiday present for the serious Rails developer?
A. RUBY FOR RAILS by David A. Black (Ruby for Rails)
aka The Ruby book for Rails developers!
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

cpjolicoeur · December 12, 2006, 7:41pm

The \w character class is all alphanumerics plus underscore – and the
\W character class is the opposite. Assuming you really don’t want to
allow zero, you could do:

:with => /[^\W0]+/

i.e., no character (that’s the ^) that is either \W or 0.

ok, my mistake. the number 0 is fine; I should have said a-z, A-Z, 0-9,
_

your RegExp didnt work. I dont want to allow spaces in the login field.
With your RegExp, spaces don’t get flagged as invalid. I ONLY want a-z,
A-Z, 0-9, _. nothing else, not even spaces.

cpjolicoeur · December 12, 2006, 9:16pm

While I’m a little boggled by David’s answers, I think this is what
you’re looking for:

:with => /^[A-Za-z0-9_]+$/

cpjolicoeur · December 12, 2006, 8:26pm

Hi –

On Tue, 12 Dec 2006, Craig J. wrote:

ok, my mistake. the number 0 is fine; I should have said a-z, A-Z, 0-9,
_

your RegExp didnt work. I dont want to allow spaces in the login field.
With your RegExp, spaces don’t get flagged as invalid. I ONLY want a-z,
A-Z, 0-9, _. nothing else, not even spaces.

That’s because of the anchoring thing I mentioned. Try the anchored
version, which, including 0, would simply be:

/\A\w+\z/

David

–
Q. What’s a good holiday present for the serious Rails developer?
A. RUBY FOR RAILS by David A. Black (Ruby for Rails)
aka The Ruby book for Rails developers!
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

cpjolicoeur · December 12, 2006, 9:22pm

I wrote:

While I’m a little boggled by David’s answers

Should have been more specific here. I haven’t seen \A and \z; I have
always used ^ and $.

I think this is what
you’re looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

Mark.

cpjolicoeur · December 12, 2006, 9:27pm

Mark T. wrote:

I wrote:

While I’m a little boggled by David’s answers

Should have been more specific here. I haven’t seen \A and \z; I have
always used ^ and $.

I think this is what
you’re looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

Mark.

Thanks Mark. Both your’s and David’s answers seem to work, but I’m
using use as it is more the style I’m used to seeing as well.

cpjolicoeur · December 12, 2006, 10:06pm

On 12/12/06, Craig J. [email protected] wrote:

using use as it is more the style I’m used to seeing as well.
irb(main):006:0> “!@#$%(\nAAAAA" =~ /^\w+$/
=> 8
irb(main):007:0> "!@#$%(\nAAAAA” =~ /\A\w+\Z/
=> nil

^ and $ match beginning and end of line, \A and \Z match beginning and
end of string. You want \A and \Z.

cpjolicoeur · December 12, 2006, 10:10pm

Hi –

On Tue, 12 Dec 2006, Craig J. wrote:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

Thanks Mark. Both your’s and David’s answers seem to work, but I’m
using use as it is more the style I’m used to seeing as well.

It’s not a style matter; they do different things. ^ and $ anchor to
beginning and end of a line, whereas \A and \z match beginning and end
of string.

If you use ^ and $, you’ll want to be absolutely certain that no one
can ever submit a multi-line answer:

puts “Match” if /^\w+$/.match(“This is\nnot\nwhat you want!”)
=> Match

If you anchor to the beginning and end of the string:

puts “Match” if /\A\w+$\z/.match(“This is\nnot\nwhat you want!”)
=> nil

which is almost certainly better.

David

–
Q. What’s a good holiday present for the serious Rails developer?
A. RUBY FOR RAILS by David A. Black (Ruby for Rails)
aka The Ruby book for Rails developers!
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

cpjolicoeur · December 13, 2006, 8:39pm

^ and $ match beginning and end of line, \A and \Z match beginning and
end of string. You want \A and \Z.

I’d go for \z, because \Z discounts a final newline:

Thanks for the info. I must have missed the memo about Ruby regexes
being different from Perl. Are there other differences and Is this
documented anywhere?

Thanks.

Mark.

cpjolicoeur · December 12, 2006, 10:10pm

Hi –

On Tue, 12 Dec 2006, Jeremy E. wrote:

I think this is what

Thanks Mark. Both your’s and David’s answers seem to work, but I’m
using use as it is more the style I’m used to seeing as well.

irb(main):006:0> “!@#$%(\nAAAAA" =~ /^\w+$/
=> 8
irb(main):007:0> "!@#$%(\nAAAAA” =~ /\A\w+\Z/
=> nil

^ and $ match beginning and end of line, \A and \Z match beginning and
end of string. You want \A and \Z.

I’d go for \z, because \Z discounts a final newline:

irb(main):005:0> /abc\z/.match(“abc\n”)
=> nil
irb(main):006:0> /abc\Z/.match(“abc\n”)
=> #MatchData:0xb7eaf2d8

Might as well close that loophole too

David

–
Q. What’s a good holiday present for the serious Rails developer?
A. RUBY FOR RAILS by David A. Black (Ruby for Rails)
aka The Ruby book for Rails developers!
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

cpjolicoeur · December 13, 2006, 9:15pm

Hi –

On Wed, 13 Dec 2006, Mark T. wrote:

^ and $ match beginning and end of line, \A and \Z match beginning and
end of string. You want \A and \Z.

I’d go for \z, because \Z discounts a final newline:

Thanks for the info. I must have missed the memo about Ruby regexes
being different from Perl. Are there other differences and Is this
documented anywhere?

I think the memo would have been if they were exactly the same as
Perl’s The anchors should be documented in most or all extended
discussions of Ruby regexes (though they may or may not mention how
these compare to Perl). I’ve seen the second edition of the Friedl
book but don’t own it, and I don’t remember how detailed it gets in
its Ruby comparisons.

One area to focus on in the Perl/Ruby comparison is the modifiers.
Since Ruby has anchors for both line and string, it doesn’t need the
/m modifier as it’s defined in Perl. Ruby’s /m modifier is like
Perl’s /s: it adds newline to the . character class.

David

–
Q. What’s a good holiday present for the serious Rails developer?
A. RUBY FOR RAILS by David A. Black (Ruby for Rails)
aka The Ruby book for Rails developers!
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

cpjolicoeur · December 13, 2006, 11:28pm

Hi –

On Wed, 13 Dec 2006, Rob B. wrote:

I’d go for \z, because \Z discounts a final newline:
its Ruby comparisons.

One area to focus on in the Perl/Ruby comparison is the modifiers.
Since Ruby has anchors for both line and string, it doesn’t need the
/m modifier as it’s defined in Perl. Ruby’s /m modifier is like
Perl’s /s: it adds newline to the . character class.

David

FYI, Perl has \A, \Z, and \z, too.

Interesting. I don’t know whether my memory is faulty or Perl didn’t
have them when I was using it (late 1990s mostly).

David

–
Q. What’s a good holiday present for the serious Rails developer?
A. RUBY FOR RAILS by David A. Black (Ruby for Rails)
aka The Ruby book for Rails developers!
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

cpjolicoeur · December 13, 2006, 10:29pm

On Dec 13, 2006, at 3:13 PM, [email protected] wrote:

Thanks for the info. I must have missed the memo about Ruby regexes
One area to focus on in the Perl/Ruby comparison is the modifiers.
Since Ruby has anchors for both line and string, it doesn’t need the
/m modifier as it’s defined in Perl. Ruby’s /m modifier is like
Perl’s /s: it adds newline to the . character class.

David

FYI, Perl has \A, \Z, and \z, too. In Perl, the meaning of ^ and $
change with the use of the /m modifier and that’s why it’s common to
see /ms or /xms on Perl regexps. With Ruby, I’d expect to see /m or /
xm on most complex patterns.

I was surprised as how hard it was to find the modifiers in Ruby
listed in the Pickaxe, but they’re in chapter 22 (“The Ruby
Language”) starting on page 324.

The other significant way that the Perl and Ruby (1.8) regexps differ
is in the semantics of executing code during the match. Perl allow
code in the replacement text with the /e modifier on a substitution
where Ruby just passes the match off to a block.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

cpjolicoeur · December 14, 2006, 6:54pm

OK, so we have

Ruby /\Axyz\z/ is the same as Perl /^xyz$/,
Ruby /^xyz$/ is the same as Perl /^xyz$/m,
Ruby /^xyz$/m is the same as Perl /^xyz$/ms,

is this correct?

cpjolicoeur · December 14, 2006, 11:39pm

On Dec 14, 2006, at 12:52 PM, Mark T. wrote:

OK, so we have

Ruby /\Axyz\z/ is the same as Perl /^xyz$/,
Ruby /^xyz$/ is the same as Perl /^xyz$/m,
Ruby /^xyz$/m is the same as Perl /^xyz$/ms,

is this correct?

I think you’ve got it. Here are some examples of perl and ruby with
some similar regexps to demonstrate.

$ perl -e ‘$string = “uvw\nxyz\nABC”; if ($string =~ /^xyz$/) { print
“match\n” } else { print “nope\n” }’
nope

$ ruby -e ‘string = “uvw\nxyz\nABC”; if (string =~ /\Axyz\z/) then
print “match\n” else print “nope\n” end’
nope

$ perl -e ‘$string = “uvw\nxyz\nABC”; if ($string =~ /^xyz$/m)
{ print “match\n” } else { print “nope\n” }’
match

$ ruby -e ‘string = “uvw\nxyz\nABC”; if (string =~ /^xyz$/) then
print “match\n” else print “nope\n” end’
match

$ perl -e ‘$string = “uvw\nxyz\nABC”; if ($string =~ /^xyz…$/m)
{ print “match\n” } else { print “nope\n” }’
nope

$ perl -e ‘$string = “uvw\nxyz\nABC”; if ($string =~ /^xyz…$/ms)
{ print “match\n” } else { print “nope\n” }’
match

$ ruby -e ‘string = “uvw\nxyz\nABC”; if (string =~ /^xyz…$/m) then
print “match\n” else print “nope\n” end’
match

$ ruby -e ‘string = “uvw\nxyz\nABC”; if (string =~ /^xyz…\z/m)
then print “match\n” else print “nope\n” end’
match

$ ruby -e ‘string = “uvw\nxyz\nABC”; if (string =~ /\A…xyz…\z/
m) then print “match\n” else print “nope\n” end’
match

$ ruby -e ‘string = “uvw\nxyz\nABC”; if (string =~ /\A…xyz…\z/)
then print “match\n” else print “nope\n” end’
nope

$ ruby -e ‘string = “uvw\nxyz\nABC”; if (string =~ /^xyz…\z/) then
print “match\n” else print “nope\n” end’
nope

Rob B. http://agileconsultingllc.com
[email protected]