Forum: Ruby on Rails Email Address verifier--eyes needed...

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Dave T. (Guest)
on 2006-05-18 23:33
(Received via mailing list)
Folks:

I've posted a really basic e-mail address verifier to

    http://wiki.rubygarden.org/Ruby/page/show/VerifyEmailAddress

I'd appreciate folks who understand DNS and SMTP having a look at it
to see if it looks reasonable. You could comment here or, possibly
more usefully, comment on the wiki page itself.


Thanks


Dave
Dan K. (Guest)
on 2006-05-19 03:36
(Received via mailing list)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dave,

> I've posted a really basic e-mail address verifier to
>
>    http://wiki.rubygarden.org/Ruby/page/show/VerifyEmailAddress
>
> I'd appreciate folks who understand DNS and SMTP having a look at
> it to see if it looks reasonable. You could comment here or,
> possibly more usefully, comment on the wiki page itself.

I think this may fail with a email addresses that have quoted
local parts that contain "@".  Here's a more restrictive regexp
that handles quoted local parts:

   http://tfletcher.com/lib/rfc822.rb

My only problem with this regexp is it tests the email according to
the RFC 822 spec, which is looser than what is allowed in real-life.
For example, this regexp allows a host with any TLD to match, but
there are a limited number of TLDs that are issued by ICANN.

Also, the part of the code that checks the MX and A records can
probably be shortened to something like:

     mx_hosts = dns.getresources(domain,
Resolv::DNS::Resource::IN::MX) rescue []

     mx_hosts.sort_by { |mx| mx.preference }.map { |mx|
mx.exchange }.push(domain).each do |host|
       a_records = dns.getresources(host.to_s,
Resolv::DNS::Resource::IN::A) rescue []
       return false if check_hosts(a_records)
     end

- --

Thanks,

Dan
__________________________________________________________________

Dan K.
Autopilot Marketing Inc.

Email: removed_email_address@domain.invalid
Phone: 1 (604) 820-0212
Web:   http://autopilotmarketing.com/
vCard: http://autopilotmarketing.com/~dan.kubb/vcard
__________________________________________________________________



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFEbQQ94DfZD7OEWk0RAqq+AJsHJaD2D4XoLCJ57iQu0PeIe91dbQCgszCl
1/tPyLvgy4iIg7MLyoZO2RM=
=4Oqi
-----END PGP SIGNATURE-----
Calle D. (Guest)
on 2006-05-20 11:37
(Received via mailing list)
>>>>> "Dave" == Dave T. <removed_email_address@domain.invalid> writes:

> I'd appreciate folks who understand DNS and SMTP having a look at it
> to see if it looks reasonable.

It has at least one fairly serious flaw, as far as I can see. A proper
MTA will queue messages to a domain that it can't get a DNS response
for, while your code will reject it. This situation if reasonably
common, since it'll occur every time all the DNS servers for a domain
become non-responsive.

Also, just seeing if something answers on port 25 seems a bit dodgy to
me. I know of plenty of places that have SSH servers on that port, for
example. There's also a distinct possibility that while there is a
proper SMTP server there, all it's going to tell you is "Bugger off!",
since it doesn't like where you connected from, or something. Since
you're already connected, running through a HELO/MAIL FROM/RCPT
TO/QUIT sequence to see if it'd really accept mail from you would
increase the confidence in the answer considerably. In case of
failure, you'd also know if it was permanent (5xx reply) or temporary
(4xx reply).

To match the way things work, your function really needs to return one
of three possible results: "can send to that domain", "can't send to
that domain at all" and "can't send to that domain right now".
--
		     Calle D. <removed_email_address@domain.invalid>
		 http://www.livejournal.com/users/cdybedahl/
      "Women. They don't even make sense when you are one." -- babycola
Dave T. (Guest)
on 2006-05-21 08:57
(Received via mailing list)
On May 20, 2006, at 2:36 AM, Calle D. wrote:

> since it doesn't like where you connected from, or something. Since
> you're already connected, running through a HELO/MAIL FROM/RCPT
> TO/QUIT sequence to see if it'd really accept mail from you would
> increase the confidence in the answer considerably. In case of
> failure, you'd also know if it was permanent (5xx reply) or temporary
> (4xx reply).
>
> To match the way things work, your function really needs to return one
> of three possible results: "can send to that domain", "can't send to
> that domain at all" and "can't send to that domain right now"


I should probably explain the context in which it's being used.

About one in 30 people signing up for a PDF from us mistype their
email addresses, and about 70% of those mistype the domain. So I
figured that quick sanity check before I accepted the form might be
in order. In this case, I'm thinking I'm OK to pass any error on to
the end user: I'm not trying to be bullet proof as much as provide
them with a sanity check. Should they happen to have SSHD running on
25, then at least I know there's a domain there.

But... do people think I should move on and do the full RCPT TO
sequence? Anyone happen to have the code that does it reliably?


Dave
Calle D. (Guest)
on 2006-05-21 11:39
(Received via mailing list)
>>>>> "Dave" == Dave T. <removed_email_address@domain.invalid> writes:

> I should probably explain the context in which it's being used.

> About one in 30 people signing up for a PDF from us mistype their
> email addresses, and about 70% of those mistype the domain. So I
> figured that quick sanity check before I accepted the form might be
> in order.

Ok, that makes sense, and for that use the code looks all right
(except possibly that DNS server timeouts might be too long).

The problem that the pessimist in me sees is that if you distribute
the code, people will use it without reading the documentation and
accept its answers as gospel truth. Experience from the Perl world
tells me that it would probably be a better thing in the long run to
have easily available, easy-to-use and as correct as humanly possible
mail address checking code out there.

Which also means that I should probably help write it...
--
		     Calle D. <removed_email_address@domain.invalid>
		 http://www.livejournal.com/users/cdybedahl/
       "Facts are for people with weak opinions." -- Lars Willför, I]M
This topic is locked and can not be replied to.