Regular expression for string.anotherstring

bartb · September 15, 2006, 7:07pm

I’m trying to validate a user mail address for a fixed domain with the
rule
string =~ /\w+.\w+/
this matches firstname.lastname, which is what I want.
But it also matches [email protected], which is not what
I
want. Why does that match? I’m having this in Rails, but I guess the
quoting will not escape @ signs?

Bart

bartb · September 16, 2006, 6:03am

On 9/15/06, Bart B. [email protected] wrote:

I’m trying to validate a user mail address for a fixed domain with the rule
string =~ /\w+.\w+/
this matches firstname.lastname, which is what I want.
But it also matches [email protected], which is not what I
want. Why does that match? I’m having this in Rails, but I guess the
quoting will not escape @ signs?

Bart

I’m no regex master but what about:
/^[^@]+/

One or more things thats not an ‘@’ at the beginning of the string?

hth,
-Harold

bartb · October 18, 2006, 10:43am

Bart B. wrote:

I’m trying to validate a user mail address for a fixed domain with the
rule string =~ /\w+.\w+/
this matches firstname.lastname, which is what I want.
But it also matches [email protected], which is not what I
want. Why does that match? I’m having this in Rails, but I guess the
quoting will not escape @ signs?

I think you want this: “string.string”, but you don’t want this:
“string.string@domain”. Yes?

Have you tried:

string =~ /^\w+.\w+$/

“^” means match at the beginning of the string, “$” means match at the
end.
IOW the example must match the entire string. Is this what you wanted?

bartb · October 18, 2006, 10:43am

On 9/15/06, Paul L. [email protected] wrote:

Bart B. wrote:

I’m trying to validate a user mail address for a fixed domain with the
rule string =~ /\w+.\w+/
this matches firstname.lastname, which is what I want.
But it also matches [email protected], which is not what I
want.

Have you tried:

string =~ /^\w+.\w+$/

“^” means match at the beginning of the string, “$” means match at the end.
IOW the example must match the entire string. Is this what you wanted?

Actually “$” matches at the end of the string or the first line break
(‘\n’) whichever comes first.

\z literally matches at the end of the string, whereas
\Z matches either the end of the string unless the string ends with
‘\n’ in which case it matches just before that final ‘\n’

And if you want to match just before the first of an optional series
of trailing '\n’s I think that this:

\n*\Z

works as the end of the RE.

Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

bartb · October 18, 2006, 10:43am

Paul L. wrote:

I think you want this: “string.string”, but you don’t want this:
“string.string@domain”. Yes?

That’s right.

Have you tried:

string =~ /^\w+.\w+$/

“^” means match at the beginning of the string, “$” means match at the
end. IOW the example must match the entire string. Is this what you
wanted?

That’s a good suggestion, I’ll give it a shot.
One question though: why does \w match with ‘@’? According to the
documentation \w == [a…zA…Z]??

Bart

bartb · October 18, 2006, 10:43am

Bart B. wrote:

One question though: why does \w match with ‘@’? According to the
documentation \w == [a…zA…Z]??\w doesn’t match @, it’s just that your expression was too general:

test = lambda { |x| p $1 if x =~ /(\w+.\w+)/ }
test.call(‘[email protected]’) # => “fraggle.roc”
test.call(‘foo.bar@baz’) # => “foo.bar”

I.e., you were matching on either side of the @

Regards,
Jordan

bartb · October 18, 2006, 10:43am

MonkeeSage wrote:

I.e., you were matching on either side of the @

Oh boy, you are right! Thanks for the explanation.
I will try it with:
string =~ /^\w+.\w+\z/

But is there a ruby operator to look for matches on the entire string?

Thanks for your help
Bart

bartb · October 18, 2006, 10:43am

Paul L. schrieb:

factor, you can always split on newlines in advance of this test to be sure
you are matching the entire string unambiguously.

This would be unnecessarily complex. Just use Bart’s regexp (with \z
instead of $). It matches the entire string.

Regards,
Pit

bartb · October 18, 2006, 10:43am

On 9/17/06, Bart B. [email protected] wrote:

string =~ /^\w+.\w+$/

“^” means match at the beginning of the string, “$” means match at the
end. IOW the example must match the entire string. Is this what you
wanted?

That’s a good suggestion, I’ll give it a shot.
One question though: why does \w match with ‘@’? According to the
documentation \w == [a…zA…Z]??

Bart

Bart-

Maybe I’m off base here, but are you trying to determine if a string
matches a pattern, or are you trying to capture a matching substring
from a string?

-Alex

bartb · October 18, 2006, 10:43am

Bart B. wrote:

MonkeeSage wrote:

I.e., you were matching on either side of the @

Oh boy, you are right! Thanks for the explanation.
I will try it with:
string =~ /^\w+.\w+\z/

But is there a ruby operator to look for matches on the entire string?

string =~ /^\w+.\w+$/

Must match the entire string. Because of multiline issues, where that is
a
factor, you can always split on newlines in advance of this test to be
sure
you are matching the entire string unambiguously.

bartb · October 18, 2006, 10:43am

Hi –

On Mon, 18 Sep 2006, Bart B. wrote:

MonkeeSage wrote:

I.e., you were matching on either side of the @

Oh boy, you are right! Thanks for the explanation.
I will try it with:
string =~ /^\w+.\w+\z/

“Are you sure?\nabc.def” =~ /^\w+.\w+\z/ #

Keep in mind that ^ matches the beginning of any line.

But is there a ruby operator to look for matches on the entire string?

Do you mean scanning the string repeatedly? If so, then you can use
String#scan. If you mean anchoring a regex to the beginning of the
string, then you can use \A and \z (or \Z if you want to discount a
possible ending newline).

David

bartb · October 18, 2006, 10:43am

Hi –

On Tue, 19 Sep 2006, Paul L. wrote:

string =~ /^\w+.\w+$/

#!/usr/bin/ruby

s = “this is\na test\nstring.”

a = []

a << s.sub(/(^.*$)/,"\1")

That replaces “this is” with “this is”.

a << s.sub(/(^.*\z)/,"\1")

That replaces “string.” with “string.”.

a << s.sub(/(^.*$)/m,"\1")

That replaces “this is\na test\nstring.” with “this is\na test\n
string.”

a << s.sub(/(^.*\z)/m,"\1")

That does the same thing as the previous one.

In all of your examples, you’re just replacing what was matched with
what was matched. That doesn’t tell you anything about what was
matched.

Just use Bart’s regexp (with \z
instead of $). It matches the entire string.

My point is that, if there are embedded linefeeds and they are an issue,
they must be dealt with. Also, I don’t immediately see a difference in
behavior between \z and $, contrary to the documentation’s specification
that one matches the end of a line and the other matches the entire string.

You don’t see it because you’re not looking for it Look at the
difference in what gets matched:

irb(main):011:0> s =~ /(^.$)/
=> 0
irb(main):012:0> $1
=> “this is”
irb(main):013:0> s =~ /(^.\z)/
=> 15
irb(main):014:0> $1
=> “string.”
irb(main):015:0> s =~ /(^.*\z)/m
=> 0
irb(main):016:0> $1
=> “this is\na test\nstring.”

David

bartb · October 18, 2006, 10:43am

Paul L. wrote:

Also, I don’t immediately see a difference in
behavior between \z and $, contrary to the documentation’s specification
that one matches the end of a line and the other matches the entire string.

The \z matches up to the terminus of the entire string, including any
newlines that come between. It’s different from $ with multiline
because that will match every line end. To see it, change your example
to

a << s.sub(/(^.$)/,‘REPLACE’)
a << s.sub(/(^.\z)/,‘REPLACE’)
a << s.sub(/(^.$)/m,‘REPLACE’)
a << s.sub(/(^.\z)/m,‘REPLACE’)

Regards,
Jordan

bartb · October 18, 2006, 10:43am

[email protected] wrote:

But is there a ruby operator to look for matches on the entire string?

Do you mean scanning the string repeatedly? If so, then you can use
String#scan. If you mean anchoring a regex to the beginning of the
string, then you can use \A and \z (or \Z if you want to discount a
possible ending newline).

I meant that I want to do input validation on the entire string. It must
correspond to the format defined by the regular expression, no more, no
less.
That’s why the newline part is not important here: I can’t permit
newlines.
You are right about the ^ at the beginning, I will be using
test =~ /\A\w+.\w+\z/

Thanks a lot for the great input!
Bart

bartb · October 18, 2006, 10:43am

Pit C. wrote:

Must match the entire string. Because of multiline issues, where that is
a factor, you can always split on newlines in advance of this test to be
sure you are matching the entire string unambiguously.

This would be unnecessarily complex.

Not really.

#!/usr/bin/ruby

s = “this is\na test\nstring.”

a = []

a << s.sub(/(^.*$)/,"\1")

a << s.sub(/(^.*\z)/,"\1")

a << s.sub(/(^.*$)/m,"\1")

a << s.sub(/(^.*\z)/m,"\1")

a.each do |s|
puts “[#{s}]”
end

Output:

[this is
a test
string.]
[this is
a test
string.]
[this is
a test
string.]
[this is
a test
string.]

Just use Bart’s regexp (with \z
instead of $). It matches the entire string.

My point is that, if there are embedded linefeeds and they are an issue,
they must be dealt with. Also, I don’t immediately see a difference in
behavior between \z and $, contrary to the documentation’s specification
that one matches the end of a line and the other matches the entire
string.