RegExp issues

ben-v · September 2, 2006, 3:57am

I am trying to validate user input so it contains only letters, numbers,
underscores and dashes. I am using this regexp:/[A-Z0-9_-]/. However,
when I type in ‘hello’ in the field, it says that it doesn’t match the
regexp, yet I’m sure that reg exp works for letters, numbers, and
underscores only. What am I doing wrong? Thanks for your help and time.

ben-v · September 2, 2006, 4:28am

Shouldn’t your regexp accept lowercase as well? /[A-Za-z0-9_-]/?

Yep, that was the issue, as someone coming from a Coldfusion backround,
I’m not that familiar with RegExps, but that’s good to know that you can
just use /w, and /d… Thanks for your help and time.

ben-v · September 2, 2006, 4:19am

Ben V. wrote:

I am trying to validate user input so it contains only letters, numbers,
underscores and dashes. I am using this regexp:/[A-Z0-9_-]/. However,
when I type in ‘hello’ in the field, it says that it doesn’t match the
regexp, yet I’m sure that reg exp works for letters, numbers, and
underscores only. What am I doing wrong? Thanks for your help and time.

Shouldn’t your regexp accept lowercase as well? /[A-Za-z0-9_-]/?

Or you could simplify things a bit and use \w, which matches word
characters, and \d, which matches digits?

[\w\d_-]

ben-v · September 2, 2006, 4:32am

On Sep 1, 2006, at 6:57 PM, Ben V. wrote:

I am trying to validate user input so it contains only letters,
numbers,
underscores and dashes. I am using this regexp:/[A-Z0-9_-]/. However,
when I type in ‘hello’ in the field, it says that it doesn’t match the
regexp, yet I’m sure that reg exp works for letters, numbers, and
underscores only. What am I doing wrong? Thanks for your help and
time.

That regexp doesn’t match lowercase characters.

input =~ /\A[\w-]+\Z/ is probably better.

–
Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

ben-v · September 2, 2006, 4:30am

On Sat, 2 Sep 2006 10:57:57 +0900
“Ben V.” [email protected] wrote:

I am trying to validate user input so it contains only letters, numbers,
underscores and dashes. I am using this regexp:/[A-Z0-9_-]/. However,
when I type in ‘hello’ in the field, it says that it doesn’t match the
regexp, yet I’m sure that reg exp works for letters, numbers, and
underscores only. What am I doing wrong? Thanks for your help and time.

–
Posted via http://www.ruby-forum.com/.

You need to validate lowercase also: /[A-Za-z0-9_-]/

ben-v · September 2, 2006, 4:34am

On Sep 1, 2006, at 7:14 PM, Timothy H. wrote:

Or you could simplify things a bit and use \w, which matches word
characters, and \d, which matches digits?

[\w\d_-]

\w matches digits too

‘0’ =~ /\w/ # => 0

–
Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

ben-v · September 2, 2006, 4:38am

Timothy H. wrote:

Or you could simplify things a bit and use \w, which matches word
characters, and \d, which matches digits?

I could’ve sworn digits are word characters too.

irb(main):001:0> /\w/ =~ “1”
=> 0

Ruby seems to agree.

David V.

ben-v · September 2, 2006, 4:49am

For convenience, try this:

/\w-/

The special token \w translates into “A-Za-z0-9_”, IOW uppercase and
lowercase letters, numbers, and underscore, leaving only the dash to be
included explicitly.

That definetly is much easier, like I said before, I am no expert at
RegEXP. I am trying to modify an image within a custom directory
created with the person’s username, and because Rmagick doesn’t seem to
want to work, I will have to run /usr/bin/convert, and I feel uneasy
putting user inputted text in a shell, but this regExp should do the job
of sanitizing it. Again, thank you very much for helping a
misunderstanding novice like me.

ben-v · September 2, 2006, 4:42am

Ben V. wrote:

I am trying to validate user input so it contains only letters, numbers,
underscores and dashes. I am using this regexp:/[A-Z0-9_-]/. However,
when I type in ‘hello’ in the field, it says that it doesn’t match the
regexp, yet I’m sure that reg exp works for letters, numbers, and
underscores only. What am I doing wrong? Thanks for your help and time.

You are not accepting lowercase characters.

For convenience, try this:

/\w-/

The special token \w translates into “A-Za-z0-9_”, IOW uppercase and
lowercase letters, numbers, and underscore, leaving only the dash to be
included explicitly.

ben-v · September 2, 2006, 6:59am

On Sep 1, 2006, at 6:57 PM, Ben V. wrote:

I am trying to validate user input so it contains only letters,
numbers,
underscores and dashes. I am using this regexp:/[A-Z0-9_-]/

There are many more letters than [A-Za-z]. You should either be
explicit that you’re testing only the ASCII subset, or be prepared to
fail the first time someone includes the word ‘café’. And are you OK
with ‘?’ as well as ‘-’? -Tim

ben-v · September 3, 2006, 3:21am

On Sep 2, 2006, at 7:37 AM, Ben V. wrote:

Posted via http://www.ruby-forum.com/.

That actually only shows up as a “question mark” because either your
browser or ruby-forum.com wasn’t all that bright. It was actually an
emdash. Anyway since this is for urls, you are ok as far as domain
names go, because they only allow alphanumeric characters and
hyphens. As far as stuff after the slash goes though, that could be
open game, I’m not sure what the rules are.

ben-v · September 2, 2006, 1:36pm

And are you OK

with ‘ï¿½’ as well as ‘-’? -Tim

Good point, well actually this will go in a url, for example if I enter
mypage, I can access my member page at mydomain.com/mypage. Allowing
question marks would be confusing, because as you know, questionmarks
have special status in URLs for passing data.

ben-v · September 4, 2006, 2:43pm

Ben V. <comprug gmail.com> writes:

I am trying to validate user input so it contains only letters, numbers,
underscores and dashes. I am using this regexp:/[A-Z0-9_-]/. However,
when I type in ‘hello’ in the field, it says that it doesn’t match the
regexp, yet I’m sure that reg exp works for letters, numbers, and
underscores only. What am I doing wrong? Thanks for your help and time.

No one else seems to have mentioned that that regex will match all
strings
/conatining/ a letter, digit hyphen or underscore:

irb(main):001:0> /[A-Z0-9_-]/ =~ “Â£$%^”
=> nil
irb(main):002:0> /[A-Z0-9_-]/ =~ “Â£$0%^”
=> 2

You probably want something like

/^[\w-]+$/

Gareth

ben-v · September 4, 2006, 2:43pm

Hi –

On Sat, 2 Sep 2006, Eric H. wrote:

input =~ /\A[\w-]+\Z/ is probably better.

\Z will allow a final newline character, though, so you’d probably
want to use \z.

David