• You. Yes, you - stop trying to validate emails like that.
    41 replies, posted
Hey there Mr. Web Developer, working on that next big website? That's just great, I'm happy for you! Oh, you're working on the user registration form? Lovely! Let me try it out! Name: Omniscient Voice Favorite Color: Red Email: [email]ominiscient.voice+feedmespam@gmail.com[/email] That's it, [b]submit[/b] [highlight]INVALID EMAIL TRY AGAIN[/highlight] Uhm? Maybe I did something wrong, let me try that again... [highlight]INVALID EMAIL, ATTACK DOGS HAVE BEEN DISPATCHED[/highlight] My my, Mr. Web Developer, are you perchance using a regex pattern to validate emails? Maybe something like, say, this? [code]^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$[/code] You are? [img]http://media.comicvine.com/uploads/2/27874/600107-oh_you_super.jpg[/img] Tell me now, is this a valid email? [code]"John Doe"@google.com[/code] What about this one? [code]love\@urpantz@gmail.com[/code] Are you sure? Last one: [code]100%bacon!@gmail.com[/code] If you answered [b]no[/b] to any of those, shame on you. [img]http://1.bp.blogspot.com/_NTBMCDeBwj8/SaMclYNiDdI/AAAAAAAAAAc/swpkblzP_z0/s320/angry_pope.gif[/img] Let me introduce you to [b][url=http://www.ietf.org/rfc/rfc2821.txt]RFC 2821[/url][/b], specifically, section 2.3.10: [quote] As used in this specification, an "address" is a character string that identifies a user to whom mail will be sent or a location into which mail will be deposited. The term "mailbox" refers to that depository. The two terms are typically used interchangeably unless the distinction between the location in which mail is placed (the mailbox) and a reference to it (the address) is important. An address normally consists of user and domain specifications. The standard mailbox naming convention is defined to be "local- part@domain": contemporary usage permits a much broader set of applications than simple "user names". Consequently, and due to a long history of problems when intermediate hosts have attempted to optimize transport by modifying them, the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address. [/quote] Here, [b][url=http://tools.ietf.org/html/rfc2822]RFC 2822[/url][/b] goes into more detail, in section 3.4.1: [quote] An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain. The locally interpreted string is either a quoted-string or a dot-atom. If the string can be represented as a dot-atom (that is, it contains no characters other than atext characters or "." surrounded by atext [/quote] Section 3 of [b][url=http://tools.ietf.org/html/rfc3696]RFC 3696[/url][/b] gives you some more examples of valid email addresses. How's your little regex pattern looking right now? Not so good, that's right. 'Oh, Omniscient Voice, I am truly sorry - please, help me correct my mistakes, give me a regex pattern that works just right, I beg of you!', you say. Sure, [b]here it is: [/b] [code](?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*) [/code] [url=http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html]Source[/url] [img]http://wackyiraqi.com/wtf/hnng.jpg[/img] [i]Pictured: you.[/i] Oh come on, Mr. Web Developer, it's not that bad. The truth is, [b]most, if not all email providers enforce their own naming rules[/b], most ask that your email address begin with a letter, and most do not allow anything but a very small set of characters. However, my first example is very common: [b][email]ominiscient.voice+feedmespam@gmail.com[/email][/b] Yet, not only did it fail to pass your validation, it fails to pass on [b]a worrying amount of other websites[/b]. So, you ask, 'What should I do?'. It's actually pretty simple: [b]1.[/b] Apply some very loose validation on the [b]client-side[/b], here are some examples: [code] frenchfries@myhost.com PASS pants_on_fire4u FAIL !"#3423@pants.com PASS uhoh@donuts FAIL [/code] This is just there to deter drive-by form-fillers, and most importantly, to alert your users to any obvious mistake. [b]2.[/b] Ask the user to validate his email. [b]3.[/b] Bask in the glorious feeling of knowing your stuff isn't broken (well, atleast not this part). That wasn't too hard, was it? Now now, no need to thank me, Mr. Web Developer, I don't do this for you, I do it for the internet. [img]http://kontraband.se/blog/wp-content/uploads/2009/07/jesususeme.jpg[/img] [i]Pictured: Google told me this is the internet.[/i] [b]Sources:[/b] [url]http://www.faqs.org/rfcs/rfc822.html[/url] [url]http://en.wikipedia.org/wiki/Email_address[/url] [url]http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx[/url] [url]http://tools.ietf.org/html/rfc2822[/url] [url]http://tools.ietf.org/html/rfc3696[/url] [b]Please direct any mistakes in this thread to my email address, stan\@ky"Joe ^.^"?!\@nuhuh@coolmail.com[/b]
Informative. [editline]19th November 2010[/editline] But also, how many people are actually going to use an email like this? "John Doe"@google.com love\@urpantz@gmail.com 100%bacon!@gmail.com
Very loose client side checks you say? How about: [code].+@.+\..{2,}[/code]
I just use PHP's input_filter function for this.
[QUOTE=Fizzadar;26165110]I just use PHP's input_filter function for this.[/QUOTE] filter_var also works pretty nicely, but I was trying to keep the OP as language-independent as possible.
Most Mail-Servers can't even handle such Mail-Addresses. For example Mail Enable. Tried it and it just fails :P Edit: This Post doesn't say that people shouldn't validate Mails with that methode. I just wanted to say that most ... oh well ;)
This is my email validation: [img]http://ahb.me/YEW[/img] [b]Bonus stats:[/b] Of the 8,954 users on AnyHub who have an email attached to their accounts (it's optional), about 20 have provided invalid email addresses.
What about [url=http://www.w3schools.com/PHP/func_filter_var.asp]filter_var(email, FILTER_VALIDATE_EMAIL)[/url]? Edit: Oh wait, it was briefly mentioned in a later post.
That's absurd! I just insert all raw data directly into the database!
has anybody come across at top-level domain with is one character long?
[url]www.x.com[/url] Oops, you meant the domain itself! My mistake.
[QUOTE=Crhem van der B;26244479][url]www.x.com[/url] Oops, you meant the domain itself! My mistake.[/QUOTE] I meant the .com bit the [url=http://en.wikipedia.org/wiki/Top-level_domain]TLD[/url]?
Yes, I understood right after I post, my mistake.
[QUOTE=JimTools;26236683]has anybody come across at top-level domain with is one character long?[/QUOTE] You don't simply "come across" new top-level domains like that :P [url]http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains[/url]
If you have $200,000, then you can create a new one!
[QUOTE=cas97;26246625]If you have $200,000, then you can create a new one![/QUOTE] Huh, I thought it would cost more than that
Oy, get out of my thread with those domains and gamestations and facespaces, you damned kids!
I'm buying .charlie when I make my fortune
[QUOTE=pro ruby dev;26259106]I'm buying .charlie when I make my fortune[/QUOTE]isn't that a bit long for a tld
[QUOTE=TehWhale;26259921]isn't that a bit long for a tld[/QUOTE] Well there's .museum
[QUOTE=StankyJoe;26248519]Oy, get out of my thread with those domains and gamestations and facespaces, you damned kids![/QUOTE] I pre-fair young adult but good thread, not like I would be using your recommended regular expression thought its just a bit too long.
Recommended? The expression is supposed to show that validating email addresses using regex is a bad idea, and is not meant for actual use.
whats your site, stanky?
This is also wrong [php]if (strpos($email, '@') !== false) // ... [/php]
[QUOTE=compwhizii;26302910]This is also wrong [php]if (strpos($email, '@') !=== false) // ... [/php][/QUOTE] Even without the syntax error, emails don't necessarily have to contain an @ character so it's wrong in two ways
[QUOTE=Siemens;26302929]Even without the syntax error, emails don't necessarily have to contain an @ character so it's wrong in two ways[/QUOTE] What?
[QUOTE=compwhizii;26302947]What?[/QUOTE] IIRC, an email like foo! is equivalane to foo@localhost
Interesting.
Great thread, thanks for this. Will definitely remember this when I get to work on my email forms.
HTML5 specifies this format for client-side validation. [code]1*( atext / "." ) "@" ldh-str 1*( "." ldh-str )[/code] It will probably reject some of the stranger formats, but should work fine for most email addresses (and handles Unicode by converting to Punycode before validation) [url]http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#e-mail-state[/url]
Sorry, you need to Log In to post a reply to this thread.