Email Address Regex

From: [email protected]

For example, the vast majority of email addresses on this
mailing list are of the form:

Bill K. [email protected]

In some GUI environments it is harder to select the portion
between the <>'s than to select the entire address.

I’d agree that ideally my web form should be smart enough to
handle that. I started out with no validation at all, and
only added the <> rejection after observing the occasional
submit with that syntax causing a bounced email.

As I recall, I asked my then-employer what degree of
thoroughness he wanted me to invest in coding the email
validation logic, the answer was something like, “Just give
them the chance to enter it again, properly, in the manner
requested. If they can’t follow simple directions then I
don’t think we want them using our software.” Woot! :wink:

From RFC 1123:

At every layer of the protocols, there is a general rule whose
application can lead to enormous benefits in robustness and
interoperability:

"Be liberal in what you accept, and conservative
 in what you send"

agreed in general,

but I don’t
think everyone is in agreement that browsers’ willingness to

render this muck have resulted in a net benefit for mankind - although, i have

heard it argued both ways.

:slight_smile:

Regards,

Bill

On Thu, Jan 05, 2006 at 08:48:48AM +0900, Bill K. wrote:

agreed in general,

but I don’t
think everyone is in agreement that browsers’ willingness to

render this muck have resulted in a net benefit for mankind - although, i have

heard it argued both ways.

You could always just specifically disallow non-standards-compliant
(X)HTML, though depending on how you handle that it might end up
rejecting a lot of stuff meant for IE and OE that could be of use to you
(depending on what you find useful).


Chad P. [ CCD CopyWrite | http://ccd.apotheon.org ]

This sig for rent: a Signify v1.14 production from

On 1/4/06, Andreas S. [email protected] wrote:

Jacob F. wrote:

The only reason I defended the regex was because you claimed it was
invalid.

I don’t remember that. I dislike complex solutions like this Regex
because they are error prone (as proved by your correction for Tim’s
rfc822.rb), I didn’t claim yours was invalid.

Ok, checking back on the flow here, this is what I saw:

[Jacob] Be careful with email validation via regex, it’s harder than
you might think:

[Andreas] It is trivial to create a formally correct address that
makes absolutely no sense, so what’s the point of doing such a
complicated and error-prone validation?

[Tim] By “error prone” do you mean that it won’t detect addresses
that don’t exist?

[Andreas] No, I mean that it might declare some addresses invalid
although they aren’t.

In my mind, due to the use of pronouns, I believed the “error-prone
validation [that] might declare some addresses invalid” referred to my
example regex. Apparently they referred instead to inadequate regex
validation in general. Sorry for the confusion.

Jacob F.

Hello.

Jacob F.:

Be careful with email validation via regex, it’s harder than you might
think[1][2]:

/^([a-zA-Z0-9&?/!|#*$^%=~{}+'-]+|"([\x00-\x0C\x0E-\x21\x23-\x5B\x5D -\x7F]|\\[\x00-\x7F])*")(\.([a-zA-Z0-9&_?\/!|#$^%=~{}+’-]+|"([\x00-
x0C\x0E-\x21\x23-\x5B\x5D-\x7F]|\[\x00-\x7F])
"))*@([a-zA-Z0-9&
?/! |#*$^%=~{}+'-]+|\[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\\[\x00-\x7F])*\])(\. ([a-zA-Z0-9&_?\/!|#$^%=~{}+’-]+|[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\[
\x00-\x7F])
]))*$/

It does match
" spaces! @s! “escaped quotes!” "@shot.pl
and it’s the first one doing this that I know of, kudos!

Unfortunately, it does not match ‘international’ domains, so
it wouldn’t pass addresses in the domain of, say, g¿eg¿ó³ka.pl

Cheers,
– Shot

Hi!

At Wed, 4 Jan 2006 21:47:34 +0900, Andreas S. wrote:

It is trivial to create a formally correct address that makes
absolutely no sense, so what’s the point of doing such a complicated
and error-prone validation?

To give one example: On German keyboards “@” is entered using
“AltGr-q”. If one releases “AltGr” before pushing “q” (which may well
happen if you type the quick-and-dirty way) “[email protected]
becomes “nobodyqexample.com”.

Also one should keep in mind the three commandments of distrust:

  1. He who inputs is guilty.

  2. He who inputs remains guilty unless he proofs that he is not
    guilty.

  3. If the proof under rule 2 leaves any doubt (no matter how tiny it
    may be) the first rule applies.

In short: Input is evil unless you know for sure that it is not.

Josef ‘Jupp’ Schugt

On 1/6/06, Shot - Piotr S. [email protected] wrote:

|#$^%=~{}+'-]+|[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\[\x00-\x7F])])(.
([a-zA-Z0-9&_?/`!|#$^%=~{}+'-]+|[([\x00-\x0C\x0E-\x5A\x5E-\x7F]|\[
\x00-\x7F])
]))*$/

It does match
" spaces! @s! "escaped quotes!" "@shot.pl
and it’s the first one doing this that I know of, kudos!

Not the first, I’ve been preceded by others that are even more correct
(and complex) :). Particularly:

Mail::RFC822::Address

Unfortunately, it does not match ‘international’ domains, so
it wouldn’t pass addresses in the domain of, say, g¿eg¿ó³ka.pl

Good point. When I wrote this expression, I was only considering ASCII
characters in the 0x00-0x7F (0-127 decimal, which doesn’t include
extended characters). Looking back at RFC822, it looks like that RFC
is likewise limited. It has no support for extended ASCII or UNICODE.
This is reasonable, based on the age of the RFC (1982).

As I understand from Yohanes’ post in this thread, RFC2822 (2001)
supercedes RFC822, so I assume RFC2822 probably takes extended ASCII
– and hopefully UNICODE, as well – into account. Time to update the
regex! I’ll leave it to someone else, however. :wink:

Jacob F.

The full RFC2822 regex is too big, but RMail has a parser for it.

The full RFC2822 regex is too big, but RMail has a parser for it.

On Jan 4, 2006, at 12:12 PM, [email protected] wrote:

Quoting “Andreas S.” [email protected]:

It is trivial to create a formally correct address that makes
absolutely no sense, so what’s the point of doing such a
complicated and error-prone validation?

For example, a friend of mine has the email address:

?@hisdomain.net

(The domain above was changed to protect his privacy. But the single
question mark as the ‘username’ is all that he has :slight_smile:

On 1/9/06, Gavin K. [email protected] wrote:

(The domain above was changed to protect his privacy. But the single
question mark as the ‘username’ is all that he has :slight_smile:

And my regex matches that address. :slight_smile:

Jacob F.