Parsing Street Addresses

Hello - I am interested in parsing arbitrary street addresses from
strings (semi-clean voter lists, mainly). These data may show up in
various formats, but there are several common patterns. Non-exhaustive
examples:

12-123 Washington Ave Minneapolis MN 12345
12/A-123 Washington Hwy Minneapolis Minnesota
12 Washington Dr Minneapolis Minn 12345
12 Washington Ridge St …
12/AB-123 Washington Blvd …
12/A-123 Washington Pl …
#12-123 Washington Rd E …
1234/A Washington Ave …
12B-123-A Washington St …
etc…

My question is this: before I start cooking up a complex regexp to
parse these strings into standard pieces(like state, city, street name,
street type, unit number, etc), has someone already done this? Or is
there some kind of toolkit to assist the parsing of street addresses?
Surely this is a very common problem and it must have been solved many
times by now. Or perhaps this type of data is so irregular as to
preclude syntactical analysis?

Hi,

why not just split the strings with space?

On Dec 27, 2006, at 11:01 PM, krst wrote:

12/AB-123 Washington Blvd …
there some kind of toolkit to assist the parsing of street addresses?
Surely this is a very common problem and it must have been solved many
times by now. Or perhaps this type of data is so irregular as to
preclude syntactical analysis?

The USPS might be able to help somewhat:

http://www.usps.com/business/addressverification/welcome.htm

Craig

I did some more searching and found a very useful website:

They store 100’s of standard regular expressions for a variety of
purposes. I searched on “address”, “postal”, etc. and found some
patterns to start with. I think this site could be useful for anyone
needing to get quickly started on regexp parsing.