Email Address verifier--eyes needed

dirtytree · May 18, 2006, 9:33pm

Folks:

I’ve posted a really basic e-mail address verifier to

http://wiki.rubygarden.org/Ruby/page/show/VerifyEmailAddress

I’d appreciate folks who understand DNS and SMTP having a look at it
to see if it looks reasonable. You could comment here or, possibly
more usefully, comment on the wiki page itself.

Thanks

Dave

dirtytree · May 19, 2006, 1:36am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dave,

I’ve posted a really basic e-mail address verifier to

http://wiki.rubygarden.org/Ruby/page/show/VerifyEmailAddress

I’d appreciate folks who understand DNS and SMTP having a look at
it to see if it looks reasonable. You could comment here or,
possibly more usefully, comment on the wiki page itself.

I think this may fail with a email addresses that have quoted
local parts that contain “@”. Here’s a more restrictive regexp
that handles quoted local parts:

http://tfletcher.com/lib/rfc822.rb

My only problem with this regexp is it tests the email according to
the RFC 822 spec, which is looser than what is allowed in real-life.
For example, this regexp allows a host with any TLD to match, but
there are a limited number of TLDs that are issued by ICANN.

Also, the part of the code that checks the MX and A records can
probably be shortened to something like:

 mx_hosts = dns.getresources(domain,

Resolv::DNS::Resource::IN::MX) rescue []

 mx_hosts.sort_by { |mx| mx.preference }.map { |mx|

mx.exchange }.push(domain).each do |host|
a_records = dns.getresources(host.to_s,
Resolv::DNS::Resource::IN::A) rescue []
return false if check_hosts(a_records)
end

Thanks,

Dan

Dan K.
Autopilot Marketing Inc.

Email: [email protected]
Phone: 1 (604) 820-0212
Web: http://autopilotmarketing.com/
vCard: http://autopilotmarketing.com/~dan.kubb/vcard

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFEbQQ94DfZD7OEWk0RAqq+AJsHJaD2D4XoLCJ57iQu0PeIe91dbQCgszCl
1/tPyLvgy4iIg7MLyoZO2RM=
=4Oqi
-----END PGP SIGNATURE-----

dirtytree · May 20, 2006, 9:37am

“Dave” == Dave T. [email protected] writes:

I’d appreciate folks who understand DNS and SMTP having a look at it
to see if it looks reasonable.

It has at least one fairly serious flaw, as far as I can see. A proper
MTA will queue messages to a domain that it can’t get a DNS response
for, while your code will reject it. This situation if reasonably
common, since it’ll occur every time all the DNS servers for a domain
become non-responsive.

Also, just seeing if something answers on port 25 seems a bit dodgy to
me. I know of plenty of places that have SSH servers on that port, for
example. There’s also a distinct possibility that while there is a
proper SMTP server there, all it’s going to tell you is “Bugger off!”,
since it doesn’t like where you connected from, or something. Since
you’re already connected, running through a HELO/MAIL FROM/RCPT
TO/QUIT sequence to see if it’d really accept mail from you would
increase the confidence in the answer considerably. In case of
failure, you’d also know if it was permanent (5xx reply) or temporary
(4xx reply).

To match the way things work, your function really needs to return one
of three possible results: “can send to that domain”, “can’t send to
that domain at all” and “can’t send to that domain right now”.

	     Calle D. <[email protected]>
	 http://www.livejournal.com/users/cdybedahl/
  "Women. They don't even make sense when you are one." -- babycola

dirtytree · May 21, 2006, 6:57am

On May 20, 2006, at 2:36 AM, Calle D. wrote:

since it doesn’t like where you connected from, or something. Since
you’re already connected, running through a HELO/MAIL FROM/RCPT
TO/QUIT sequence to see if it’d really accept mail from you would
increase the confidence in the answer considerably. In case of
failure, you’d also know if it was permanent (5xx reply) or temporary
(4xx reply).

To match the way things work, your function really needs to return one
of three possible results: “can send to that domain”, “can’t send to
that domain at all” and “can’t send to that domain right now”

I should probably explain the context in which it’s being used.

About one in 30 people signing up for a PDF from us mistype their
email addresses, and about 70% of those mistype the domain. So I
figured that quick sanity check before I accepted the form might be
in order. In this case, I’m thinking I’m OK to pass any error on to
the end user: I’m not trying to be bullet proof as much as provide
them with a sanity check. Should they happen to have SSHD running on
25, then at least I know there’s a domain there.

But… do people think I should move on and do the full RCPT TO
sequence? Anyone happen to have the code that does it reliably?

Dave

dirtytree · May 21, 2006, 9:39am

“Dave” == Dave T. [email protected] writes:

I should probably explain the context in which it’s being used.

About one in 30 people signing up for a PDF from us mistype their
email addresses, and about 70% of those mistype the domain. So I
figured that quick sanity check before I accepted the form might be
in order.

Ok, that makes sense, and for that use the code looks all right
(except possibly that DNS server timeouts might be too long).

The problem that the pessimist in me sees is that if you distribute
the code, people will use it without reading the documentation and
accept its answers as gospel truth. Experience from the Perl world
tells me that it would probably be a better thing in the long run to
have easily available, easy-to-use and as correct as humanly possible
mail address checking code out there.

Which also means that I should probably help write it…

	     Calle D. <[email protected]>
	 http://www.livejournal.com/users/cdybedahl/
   "Facts are for people with weak opinions." -- Lars Willför, I]M

Email Address verifier--eyes needed

To match the way things work, your function really needs to return one of three possible results: “can send to that domain”, “can’t send to that domain at all” and “can’t send to that domain right now”.

Which also means that I should probably help write it…

To match the way things work, your function really needs to return one
of three possible results: “can send to that domain”, “can’t send to
that domain at all” and “can’t send to that domain right now”.