RegEx help to detect First and Last name

Hello,

I need some help on RegEx to detect First and Last names. This is what I
currently have:
/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/

This is used to detect a First and Last name where two words are next to
each other that begin with a capital letter. So it will detect:
John S.
Jane Smith

I run into problems where the name is close to the beginning of the
sentence:
Having John S. over for dinner. — This will look at “Having John”
Getting Jane Smith ready for school. — This will look at “Getting
Jane”

Do you know how to do a RegEx where it will ignore the first word
whenever three capitalized words are next to each other? Thanks!

-A

first you have to check whether there is three capital words are there
or two…

if str.match(/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/)

Do something

elsif str.match/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/

Do something

end

I hope this will help u…

Thanks

Brijesh S.

That’s close. You want something like

/\A([A-Z]+[a-zA-Z])\s+([A-Z]+[a-zA-Z])\s+([A-Z]+[a-zA-Z]*)/

Which gives you

irb(main):021:0> x
=> “Having Jane Smith”
irb(main):022:0> x =~ /\A([A-Z]+[a-zA-Z])\s+([A-Z]+[a-zA-Z])\s+([A-Z]
+[a-zA-Z]*)/
=> 0
irb(main):023:0> $1
=> “Having”
irb(main):024:0> $2
=> “Jane”
irb(main):025:0> $3
=> “Smith”

On Thu, Mar 4, 2010 at 10:53 PM, Allan L. [email protected]
wrote:

I run into problems where the name is close to the beginning of the
sentence:
Having John S. over for dinner. — This will look at “Having John”
Getting Jane Smith ready for school. — This will look at “Getting
Jane”

Do you know how to do a RegEx where it will ignore the first word
whenever three capitalized words are next to each other? Thanks!

You know this is not something you’re going to solve with regular
expressions, though, right? :slight_smile:

“San Francisco’s Jane Smith, quoted in Broder’s Washington Post
article, said …”

You need a lot more heuristics than a simple RegEx to reliably find
names in a block of text.


Hassan S. ------------------------ [email protected]
twitter: @hassan

On Fri, Mar 5, 2010 at 12:16 PM, Hassan S.
[email protected] wrote:

You know this is not something you’re going to solve with regular
expressions, though, right? :slight_smile:

“San Francisco’s Jane Smith, quoted in Broder’s Washington Post
article, said …”

You need a lot more heuristics than a simple RegEx to reliably find
names in a block of text.

Some other cases to consider

John Phillip Sousa (or if you’re a kid a heart John Jacob Jingelheimer
Smith) not to mention Spanish names which can have MANY parts.

Robert De Niro

Jesus Mary and Joseph

Surnames with origins in some languages don’t start with a capital

Michael H. de Young - Dutch

Wernher von Braun - German


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

On Fri, Mar 5, 2010 at 10:06 PM, Allan L. [email protected]
wrote:

Thanks for the suggestions. I’m going to play around with this.

On the most part, I’m doing detection for scenarios with two names, so
names like Robert De Niro will not come up.

I’m pretty sure, though that the actor would say he HAD two names, and
his first name was “Robert” and his last name was “De Niro”


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Thanks for the suggestions. I’m going to play around with this.

On the most part, I’m doing detection for scenarios with two names, so
names like Robert De Niro will not come up.

-A

Rick Denatale wrote:

On Fri, Mar 5, 2010 at 12:16 PM, Hassan S.
[email protected] wrote:

You know this is not something you’re going to solve with regular
expressions, though, right? �:-)

“San Francisco’s Jane Smith, quoted in Broder’s Washington Post
�article, said …”

You need a lot more heuristics than a simple RegEx to reliably find
names in a block of text.

Some other cases to consider

John Phillip Sousa (or if you’re a kid a heart John Jacob Jingelheimer
Smith) not to mention Spanish names which can have MANY parts.

Robert De Niro

Jesus Mary and Joseph

Surnames with origins in some languages don’t start with a capital

Michael H. de Young - Dutch

Wernher von Braun - German


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale