Forum: Ruby on Rails Parsing Street Addresses

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
krst (Guest)
on 2006-12-31 01:40
(Received via mailing list)
Hello - I am interested in parsing arbitrary street addresses from
strings (semi-clean voter lists, mainly). These data may show up in
various formats, but there are several common patterns. Non-exhaustive
examples:

12-123 Washington Ave Minneapolis MN 12345
12/A-123 Washington Hwy Minneapolis Minnesota
12 Washington Dr Minneapolis Minn 12345
12 Washington Ridge St ...
12/AB-123 Washington Blvd ...
12/A-123 Washington Pl ...
#12-123 Washington Rd E ...
1234/A Washington Ave ...
12B-123-A Washington St ...
etc...

My question is this: before I start cooking up a complex regexp to
parse these strings into standard pieces(like state, city, street name,
street type, unit number, etc), has someone already done this? Or is
there some kind of toolkit to assist the parsing of street addresses?
Surely this is a very common problem and it must have been solved many
times by now. Or perhaps this type of data is so irregular as to
preclude syntactical analysis?
Elad M. (Guest)
on 2006-12-31 01:48
(Received via mailing list)
Hi,

why not just split the strings with space?
Craig D. (Guest)
on 2006-12-31 06:33
(Received via mailing list)
On Dec 27, 2006, at 11:01 PM, krst wrote:

> 12/AB-123 Washington Blvd ...
> there some kind of toolkit to assist the parsing of street addresses?
> Surely this is a very common problem and it must have been solved many
> times by now. Or perhaps this type of data is so irregular as to
> preclude syntactical analysis?

The USPS might be able to help somewhat:

http://www.usps.com/business/addressverification/welcome.htm

Craig
krst (Guest)
on 2006-12-31 23:37
(Received via mailing list)
I did some more searching and found a very useful website:

http://regexlib.com

They store 100's of standard regular expressions for a variety of
purposes. I searched on "address", "postal", etc. and found some
patterns to start with. I think this site could be useful for anyone
needing to get quickly started on regexp parsing.
This topic is locked and can not be replied to.