/^(?:(\d)[ \-\.]?)?(?:\(?(\d{3})\)?[ \-\.])?(\d{3})[ \-\.](\d{4})(?: ?x?(\d+))?$/
# expanded version w/ comments
/^
(?:
(\d) (?# prefix digit)
[ \-\.]? (?# optional separator)
)?
(?:
\(?(\d{3})\)? (?# area code)
[ \-\.] (?# separator
)?
(\d{3}) (?# trunk)
[ \-\.] (?# separator)
(\d{4}) (?# line)
(?:\ ?x? (?# optional space or 'x')
(\d+) (?# extension)
)?
$/x
Refactorings
No refactoring yet !
danielharan
October 31, 2008, October 31, 2008 16:56, permalink
Works well with North America:
http://en.wikipedia.org/wiki/North_American_Numbering_Plan
Once you get out of that region, things get weirder. If you're dealing with any kind of internationalization, it gets *much* harder.
There are a lots of () in your regexp, some that do not seem strictly necessary. Is that just for style?
halogenandtoast
October 31, 2008, October 31, 2008 18:07, permalink
Little less complex, same results
/^(\d[ -\.]?)?(\d{3}[ -\.]?)?\d{3}[ -\.]?\d{4}(x\d+)?$/
Lex
October 31, 2008, October 31, 2008 20:55, permalink
@danielharan All the parens are for capture groups. The point is that when you run the regex it extracts each section of the phone number into a separate capture. Try it out at www.rubular.com. Also, I think the numbering schemes in different countries vary enough that a universal regex simply won't work. This one is just designed to work in the USA.
@halogenandtoast Your version does still validate, but it doesn't extract the sections. This regex is designed for parsing, not validation.
Try it out at http://www.rubular.com and try different formats. That site will also show each capturing group. I designed this to be flexible. It should work on "1 (555) 555 5555", "1.555.555.5555 x5555", "555-5555", "(555) 555.5555 5555" or even "155555555555555".
Here's a refactoring with named captures, if you're using the Onigurama engine which supports them.
/^
(?:
(?<prefix>\d)
[ \-\.]?
)?
(?:
\(?(?<areacode>\d{3})\)?
[ \-\.]
)?
(?<trunk>\d{3})
[ \-\.]
(?<line>\d{4})
(?:\ ?x?
(?<extension>\d+)
)?
$/x
I wrote this regex for parsing phone numbers. It's PCRE compatible so it will work with Perl, Ruby, Javascript and many other languages, but I like Ruby best so it's classified as Ruby. I'm sure it can be made tighter. What do you think?
EDIT: Left out a necessary space