Hello,
I need some help on RegEx to detect First and Last names. This is what I
currently have:
/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/
This is used to detect a First and Last name where two words are next to
each other that begin with a capital letter. So it will detect:
John S.
Jane Smith
I run into problems where the name is close to the beginning of the
sentence:
Having John S. over for dinner. — This will look at “Having John”
Getting Jane Smith ready for school. — This will look at “Getting
Jane”
Do you know how to do a RegEx where it will ignore the first word
whenever three capitalized words are next to each other? Thanks!
-A
first you have to check whether there is three capital words are there
or two…
if str.match(/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/)
Do something
elsif str.match/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/
Do something
end
I hope this will help u…
Thanks
Brijesh S.
That’s close. You want something like
/\A([A-Z]+[a-zA-Z])\s+([A-Z]+[a-zA-Z])\s+([A-Z]+[a-zA-Z]*)/
Which gives you
irb(main):021:0> x
=> “Having Jane Smith”
irb(main):022:0> x =~ /\A([A-Z]+[a-zA-Z])\s+([A-Z]+[a-zA-Z])\s+([A-Z]
+[a-zA-Z]*)/
=> 0
irb(main):023:0> $1
=> “Having”
irb(main):024:0> $2
=> “Jane”
irb(main):025:0> $3
=> “Smith”
On Thu, Mar 4, 2010 at 10:53 PM, Allan L. [email protected]
wrote:
I run into problems where the name is close to the beginning of the
sentence:
Having John S. over for dinner. — This will look at “Having John”
Getting Jane Smith ready for school. — This will look at “Getting
Jane”
Do you know how to do a RegEx where it will ignore the first word
whenever three capitalized words are next to each other? Thanks!
You know this is not something you’re going to solve with regular
expressions, though, right?
“San Francisco’s Jane Smith, quoted in Broder’s Washington Post
article, said …”
You need a lot more heuristics than a simple RegEx to reliably find
names in a block of text.
–
Hassan S. ------------------------ [email protected]
twitter: @hassan
On Fri, Mar 5, 2010 at 12:16 PM, Hassan S.
[email protected] wrote:
You know this is not something you’re going to solve with regular
expressions, though, right?
“San Francisco’s Jane Smith, quoted in Broder’s Washington Post
article, said …”
You need a lot more heuristics than a simple RegEx to reliably find
names in a block of text.
Some other cases to consider
John Phillip Sousa (or if you’re a kid a heart John Jacob Jingelheimer
Smith) not to mention Spanish names which can have MANY parts.
Robert De Niro
Jesus Mary and Joseph
Surnames with origins in some languages don’t start with a capital
Michael H. de Young - Dutch
Wernher von Braun - German
–
Rick DeNatale
Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
On Fri, Mar 5, 2010 at 10:06 PM, Allan L. [email protected]
wrote:
Thanks for the suggestions. I’m going to play around with this.
On the most part, I’m doing detection for scenarios with two names, so
names like Robert De Niro will not come up.
I’m pretty sure, though that the actor would say he HAD two names, and
his first name was “Robert” and his last name was “De Niro”
–
Rick DeNatale
Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
Thanks for the suggestions. I’m going to play around with this.
On the most part, I’m doing detection for scenarios with two names, so
names like Robert De Niro will not come up.
-A
Rick Denatale wrote:
On Fri, Mar 5, 2010 at 12:16 PM, Hassan S.
[email protected] wrote:
You know this is not something you’re going to solve with regular
expressions, though, right? �:-)
“San Francisco’s Jane Smith, quoted in Broder’s Washington Post
�article, said …”
You need a lot more heuristics than a simple RegEx to reliably find
names in a block of text.
Some other cases to consider
John Phillip Sousa (or if you’re a kid a heart John Jacob Jingelheimer
Smith) not to mention Spanish names which can have MANY parts.
Robert De Niro
Jesus Mary and Joseph
Surnames with origins in some languages don’t start with a capital
Michael H. de Young - Dutch
Wernher von Braun - German
–
Rick DeNatale
Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale