Aca17ffad5faa4b9f2ea9090b56a1c59

I wrote this regex for parsing phone numbers. It's PCRE compatible so it will work with Perl, Ruby, Javascript and many other languages, but I like Ruby best so it's classified as Ruby. I'm sure it can be made tighter. What do you think?

EDIT: Left out a necessary space

/^(?:(\d)[ \-\.]?)?(?:\(?(\d{3})\)?[ \-\.])?(\d{3})[ \-\.](\d{4})(?: ?x?(\d+))?$/

# expanded version w/ comments

/^
  (?:
    (\d)           (?# prefix digit)
    [ \-\.]?       (?# optional separator)
  )?
  (?:
    \(?(\d{3})\)?  (?# area code)
    [ \-\.]        (?# separator
  )?
  (\d{3})          (?# trunk)
  [ \-\.]          (?# separator)
  (\d{4})          (?# line)
  (?:\ ?x?         (?# optional space or 'x')
    (\d+)          (?# extension)
  )?
$/x

Refactorings

No refactoring yet !

880cbab435f00197613c9cc2065b4f5a

danielharan

October 31, 2008, October 31, 2008 16:56, permalink

No rating. Login to rate!

Works well with North America:
http://en.wikipedia.org/wiki/North_American_Numbering_Plan

Once you get out of that region, things get weirder. If you're dealing with any kind of internationalization, it gets *much* harder.

There are a lots of () in your regexp, some that do not seem strictly necessary. Is that just for style?

B066cb3c505933f832faa83238489a89

halogenandtoast

October 31, 2008, October 31, 2008 18:07, permalink

No rating. Login to rate!

Little less complex, same results

/^(\d[ -\.]?)?(\d{3}[ -\.]?)?\d{3}[ -\.]?\d{4}(x\d+)?$/
Aca17ffad5faa4b9f2ea9090b56a1c59

Lex

October 31, 2008, October 31, 2008 20:55, permalink

No rating. Login to rate!

@danielharan All the parens are for capture groups. The point is that when you run the regex it extracts each section of the phone number into a separate capture. Try it out at www.rubular.com. Also, I think the numbering schemes in different countries vary enough that a universal regex simply won't work. This one is just designed to work in the USA.

@halogenandtoast Your version does still validate, but it doesn't extract the sections. This regex is designed for parsing, not validation.

Try it out at http://www.rubular.com and try different formats. That site will also show each capturing group. I designed this to be flexible. It should work on "1 (555) 555 5555", "1.555.555.5555 x5555", "555-5555", "(555) 555.5555 5555" or even "155555555555555".

Here's a refactoring with named captures, if you're using the Onigurama engine which supports them.

/^
  (?:
    (?<prefix>\d)
    [ \-\.]?
  )?
  (?:
    \(?(?<areacode>\d{3})\)?
    [ \-\.]
  )?
  (?<trunk>\d{3})
  [ \-\.]
  (?<line>\d{4})
  (?:\ ?x?
    (?<extension>\d+)
  )?
$/x

Your refactoring





Format Copy from initial code

or Cancel