In praise of simple solutions
There's a bit of a discussion recently about validating email addresses. Attempting to do this using regular expressions is one of the best examples of a ânow you have two problemsâ situation because you'll run into two unpleasant realities:
- Regular expressions are the wrong tool given the surprising latitude in RFC822 - unsurprising for a spec which was written in 1982, before we even had the term internet, to say nothing of direct internet connections, IP or TCP, DNS, etc.
- If you perfectly implement RFC822, you'll find that someone else didn't. When that someone else is a company like Microsoft the odds approach certainty that you'll have a customer who complains about your software doing the correct thing. Sure, you can suggest that they upgrade to something better but there's probably a reason why they are still using a hoary relic from 1995.
Fortunately we can avoid this hassle pretty easily by reminding ourselves of what the actual goal is: ensuring that we can send email which doesn't bounce. Writing a big scary regular expression is a tempting challenge, particularly if you've spent much time in the Perl community and want to show off. Sending a message using SMTP, on the other hand, is so easy that almost anyone can do it by hand using telnet.
Back in 1999 this was of more than academic interest to me because we ran a mailing list for a large retail company and had to do with a significant number of typos, usernames without a domain, etc. Being lazy, I took the simple approach: do only enough validation so you can attempt to deliver a real message. In this case, it meant checking to make sure that the address contained both a username and a domain and that the domain had either an MX or A record so we could connect to it and send our confirmation email. This allowed us to check not only for correct syntax but also for various other delivery failures such as a full mailbox or an over-zealous spam filter - all of which would result in an unhappy customer when their promised information did not arrive. If we were lucky, they'd call us and correct it rather than simply disappearing.
If you're curious, a copy of the old code is available: php3domo.zip thanks to archive.org. It could certainly use an upgrade since it dates back to the head days of PHP 3.0 and these days it'd sense to run through PEAR Validate first but I'm happy that it still works out of the box on PHP5 and blocks most of the spam I see today because a solid majority of spammers still use forged addresses despite anti-forgery efforts like SPF or Domain Keys.


blog comments powered by Disqus