Forum: Ruby The "ruby way" to break apart a name?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
8217faf2bfdfa7daf10135d41ddd421e?d=identicon&s=25 Jeff Cohen (jeff)
on 2005-12-21 02:56
Switching from C# to Ruby, and learning to write "the Ruby way"... is
there a better way to get the first and last names from a string?

Assume for simplicity that the the first name is the text up to the
first space, and the last name is the text after the last space.

def split_name(fullname)
	parts = fullname.split(' ')
	[parts.first, parts.last]
end


This returns me an array so I can do this:

first, last = split_name("Donald P. Q. Duck")

first => "Donald"
last => "Duck"

(man, I love Ruby).

But something about split_name still feels a bit "wrong", like there's a
more succint Ruby way to return the first and last elements of the
split() results.

Thanks
Jeff
D9179cdd918879d0510dfc56411e4772?d=identicon&s=25 Mark Hubbart (Guest)
on 2005-12-21 03:07
(Received via mailing list)
On 12/20/05, Jeff Cohen <cohen.jeff@gmail.com> wrote:
>
> But something about split_name still feels a bit "wrong", like there's a
> more succint Ruby way to return the first and last elements of the
> split() results.

Perhaps:
  "Donald P. Q. Duck".split.values_at(0,-1)
    ==>["Donald", "Duck"]

... where 0 and -1 are array indices.

cheers,
Mark
04a56914cc09f0858d3fca2bf4cbde34?d=identicon&s=25 nobuyoshi nakada (Guest)
on 2005-12-21 03:17
(Received via mailing list)
Hi,

At Wed, 21 Dec 2005 10:57:01 +0900,
Jeff Cohen wrote in [ruby-talk:171830]:
> Assume for simplicity that the the first name is the text up to the
> first space, and the last name is the text after the last space.

What should be returned if fullname has no space?

  def split_name(fullname)
    fullname.scan(/(\S+).*\s(\S+)/).first
  end
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2005-12-21 03:44
(Received via mailing list)
Jeff Cohen wrote...
>
> ...
>
> But something about split_name still feels a bit "wrong", like there's a
> more succint Ruby way to return the first and last elements of the
> split() results.

class String
  def split_name
    split.values_at(0, -1)
  end
end

Cheers,
Dave
Cee0292fffa691f1fb320d5400200e99?d=identicon&s=25 Marcel Molina Jr. (Guest)
on 2005-12-21 07:02
(Received via mailing list)
On Wed, Dec 21, 2005 at 10:57:01AM +0900, Jeff Cohen wrote:
>
> more succint Ruby way to return the first and last elements of the
> split() results.

In Ruby you can open up any class and modify/extend it at any time. Even
the
built in ones. So you could do:

  class String
    def name_parts(pattern = ' ', limit = 2)
      split(pattern, limit)
    end
  end

  'Marcel Molina Jr.'.name_parts
  => ["Marcel", "Molina Jr."]

marcel
2dc2228d8db84b46bdd834b0515a26a1?d=identicon&s=25 Lyndon Samson (Guest)
on 2005-12-21 12:24
(Received via mailing list)
>
> In Ruby you can open up any class and modify/extend it at any time. Even the
> built in ones. So you could do:
>
Well, you cant after String.freeze :-)
8217faf2bfdfa7daf10135d41ddd421e?d=identicon&s=25 Jeff Cohen (jeff)
on 2005-12-21 17:29
Thanks for the help, everyone.  All of the suggestions have been
helpful.

Jeff
Fe9b2d0628c0943af374b2fe5b320a82?d=identicon&s=25 Eero Saynatkari (rue)
on 2005-12-21 21:52
Lyndon Samson wrote:
>>
>> In Ruby you can open up any class and modify/extend it at any time. Even the
>> built in ones. So you could do:
>>
> Well, you cant after String.freeze :-)

Ha! Maybe you can  String = String.dup ! :)


E
04d072ab8843cfd3d1714faf3a2a0fb2?d=identicon&s=25 mathew (Guest)
on 2005-12-21 23:20
(Received via mailing list)
Jeff Cohen wrote:
> Assume for simplicity that the the first name is the text up to the
> first space, and the last name is the text after the last space.
[...]
> But something about split_name still feels a bit "wrong",

Well, I think the bigger issue is that your assumptions are wrong. :-)

In some countries, the surname is written first, then the 'first' name.
Japan is an example. Some Japanese write their names in reverse when
writing them transliterated to English, and some don't. (...which makes
me wonder which is the case for Matz...)

Also, the number of words in the full name can vary between 1 and a
fairly large integer. (I knew a guy with 6.)  The number of name words
required to actually route mail to a unique person can vary between 1
and (at least) 3, and compound names are not always hyphenated. Then
there are things like "Jr", and salutations that go after the name
rather than in front.

There are quite a few postings in comp.risks about this kind of thing.
In general it's very hard to do it right, and if (for example) you want
to produce a "Dear <salutation goes here>" header for a letter, it's
best to store the salutation as a separate field, rather than try to
guess what it might be from the name.

Of course, if you're working with a badly structured database someone
else has given you, you may not have the choice...


mathew
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2005-12-21 23:57
(Received via mailing list)
On Dec 21, 2005, at 4:17 PM, mathew wrote:

> Also, the number of words in the full name can vary between 1 and a
> fairly large integer.

Amen.  Names are much trickier than you think...

James Edward Gray II (married to Dana Ann Leslie Gray)
33ec7e55a251c1be8d6febfd929aebbe?d=identicon&s=25 Greg Kujawa (gregarican)
on 2005-12-22 15:32
(Received via mailing list)
James Edward Gray II wrote:

> Amen.  Names are much trickier than you think...
>
>James Edward Gray II (married to Dana Ann Leslie Gray)

I second that sentiment. I recall a college buddy of mine. His full
name was Bradley Lee Bradley. Trying hard to think of a good regex
representation of that :-)
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2005-12-22 15:48
(Received via mailing list)
James Edward Gray II <james@grayproductions.net> writes:

> On Dec 21, 2005, at 4:17 PM, mathew wrote:
>
>> Also, the number of words in the full name can vary between 1 and a
>> fairly large integer.
>
> Amen.  Names are much trickier than you think...
>
> James Edward Gray II (married to Dana Ann Leslie Gray)

Full Ack, James II.
Cff9eed5d8099e4c2d34eae663aae87e?d=identicon&s=25 Jacob Fugal (Guest)
on 2005-12-22 19:01
(Received via mailing list)
On 12/22/05, Christian Neukirchen <chneukirchen@gmail.com> wrote:
>
> Full Ack, James II.

I thought that was a spoonerism at first and wondered what you had
against James' comments. Then I looked again and read "Full ACK", and
it made more sense. :)

Jacob Fugal
1fba4539b6cafe2e60a2916fa184fc2f?d=identicon&s=25 unknown (Guest)
on 2006-01-01 21:33
(Received via mailing list)
On Thu, 22 Dec 2005, James Edward Gray II wrote:

> On Dec 21, 2005, at 4:17 PM, mathew wrote:
>
>> Also, the number of words in the full name can vary between 1 and a fairly
>> large integer.
>
> Amen.  Names are much trickier than you think...
>
> James Edward Gray II (married to Dana Ann Leslie Gray)

And then there's...

Louis George Maurice Adolphe Roch Albert Abel Antonio Alexandre Noé
Jean Lucien Daniel Eugène Joseph-le-brun Joseph-Barême Thomas Thomas
Thomas-Thomas Pierre Arbon Pierre-Maurel Barthélemi Artus Alphonse
Bertrand Dieudonné Emanuel Josué Vincent Luc Michel
Jules-de-la-plane Jules-Bazin Julio César Jullie

... a 19th-century musician who had a lot of godfathers, all of whom
he was named after.  You gotta love the "Thomas"'s :-)

(And I'll rattle off that name from memory for a drink.)

Also, don't forget cases like Ralph Vaughan Williams, where the last
name is Vaughan Williams.


David

--
David A. Black
dblack@wobblini.net

"Ruby for Rails", from Manning Publications, coming April 2006!
http://www.manning.com/books/black
8217faf2bfdfa7daf10135d41ddd421e?d=identicon&s=25 Jeff Cohen (jeff)
on 2006-01-02 01:45
unknown wrote:
> On Thu, 22 Dec 2005, James Edward Gray II wrote:
>
>> On Dec 21, 2005, at 4:17 PM, mathew wrote:
>>
>>> Also, the number of words in the full name can vary between 1 and a fairly
>>> large integer.
>>
>> Amen.  Names are much trickier than you think...
>>

HEY YOU GUYS! :-)  I was originally asking for the best way to return
the first and last parts of an array.

Breaking apart a name was just an example because I had to make up and
example in order to ask the question.  Hence I said "Assume for
simplicity..."  OF COURSE dealing with names is not as trivial as my
example.

The initial replies helped me understand arrays in Ruby better.

Thanks
Jeff
5a601582df3b42b65a5e8353fc9305da?d=identicon&s=25 =?ISO-8859-1?Q?Gerardo_Santana_G=F3mez_Garrido?= (Guest)
on 2006-01-02 04:28
(Received via mailing list)
We had a similar problem at work.

In the Spanish speaking world we use two last names: one from the
father's family (apellido paterno) and another from the mother's
family (apellido materno). For the "first name" there's no limit in
the number of names.

Fortunately for us, the names were stored in the database as:

<apellido paterno> <apellido materno> <nombres>

But there was a difficulty. In Spanish we have last names composed of
more than one word like "de la Vega", "y Cruz", "de las Casas"

Examples:

Cruz y Cruz María del Rosario
de la Vega Domínguez Jorge
Ponce de León Ernesto Zedillo

We couldn't avoid regular expressions:
http://santanatechnotes.blogspot.com/2005/12/match...

---------- Forwarded message ----------
From: mathew <meta@pobox.com>
Date: 21-dic-2005 16:17
Subject: Re: The "ruby way" to break apart a name?
To: ruby-talk ML <ruby-talk@ruby-lang.org>


Jeff Cohen wrote:
> Assume for simplicity that the the first name is the text up to the
> first space, and the last name is the text after the last space.
[...]
> But something about split_name still feels a bit "wrong",

Well, I think the bigger issue is that your assumptions are wrong. :-)

In some countries, the surname is written first, then the 'first' name.
Japan is an example. Some Japanese write their names in reverse when
writing them transliterated to English, and some don't. (...which makes
me wonder which is the case for Matz...)

Also, the number of words in the full name can vary between 1 and a
fairly large integer. (I knew a guy with 6.)  The number of name words
required to actually route mail to a unique person can vary between 1
and (at least) 3, and compound names are not always hyphenated. Then
there are things like "Jr", and salutations that go after the name
rather than in front.

There are quite a few postings in comp.risks about this kind of thing.
In general it's very hard to do it right, and if (for example) you want
to produce a "Dear <salutation goes here>" header for a letter, it's
best to store the salutation as a separate field, rather than try to
guess what it might be from the name.

Of course, if you're working with a badly structured database someone
else has given you, you may not have the choice...


mathew
--
      <URL:http://www.pobox.com/~meta/>
My parents went to the lost kingdom of Hyrule
     and all I got was this lousy triforce.



--
Gerardo Santana
"Between individuals, as between nations, respect for the rights of
others is peace" - Don Benito Juárez
http://santanatechnotes.blogspot.com/
25e11a00a89683f7e01e425a1a6e305c?d=identicon&s=25 Wilson Bilkovich (Guest)
on 2006-01-02 08:29
(Received via mailing list)
This reminds me of U.S. street addresses.  I (on and off) do work that
interfaces with a huge IBM mainframe application. That system has
many, many separate fields for street addresses:
Number, Direction, Street/Route, Quad, Suffix, Apartment, Line2, etc.
I laughed at the way the original engineers had overbuilt. Ha ha ha.
...
...
Then I had to write code that took a single address string and split
it into its component parts, and I stopped laughing.
e.g. 123 N. NESTOR LANE RD. SE #10B

Life is complicated, it turns out.
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-01-02 17:15
(Received via mailing list)
Wilson Bilkovich <wilsonb@gmail.com> writes:

>
> Life is complicated, it turns out.

And what's the point of storing that in different fields?
25e11a00a89683f7e01e425a1a6e305c?d=identicon&s=25 Wilson Bilkovich (Guest)
on 2006-01-02 17:15
(Received via mailing list)
On 1/2/06, Christian Neukirchen <chneukirchen@gmail.com> wrote:
> > it into its component parts, and I stopped laughing.
> > e.g. 123 N. NESTOR LANE RD. SE #10B
> >
> > Life is complicated, it turns out.
>
> And what's the point of storing that in different fields?
>
One reason is to be able to say things like "Who else lives on the
same street as this person" with a database query.
Another is that the Postal Service has a bunch of funky requirements
that must be met in order to qualify for discount mailing rates.
That being said, these days most big companies subscribe to an address
correction service, and just fix these things on the fly.  Saves a lot
of hassle.
1b62a85b59ccab03b84ee5ec378f75b4?d=identicon&s=25 Steve Litt (Guest)
on 2006-01-02 17:15
(Received via mailing list)
On Sunday 01 January 2006 10:25 pm, Gerardo Santana Gómez Garrido wrote:
> We had a similar problem at work.
>
> In the Spanish speaking world we use two last names: one from the
> father's family (apellido paterno) and another from the mother's
> family (apellido materno). For the "first name" there's no limit in
> the number of names.
>
> Fortunately for us, the names were stored in the database as:
>
> <apellido paterno> <apellido materno> <nombres>

I've always wondered about this, both in Spanish names and American
hyphenated
names. When the mother and father have a baby, does it go like this:

baby.apellido_materno = mother.apellido_paterno
baby.apellido_paterno = father.apellido_paterno

Or is it done like this:

baby.apellido_materno = mother.apellido_materno
baby.apellido_paterno = father.apellido_paterno

I KNOW it can't be this:

baby.apellido_materno = mother.apellido_paterno + '-'
+mother.apellido_materno
baby.apellido_paterno = father.apellido_paterno + '-'
+father.apellido_materno

If the preceding were done, names would become huge and still growing.

SteveT

Steve Litt
http://www.troubleshooters.com
slitt@troubleshooters.com
Fee23d1fc58edee59e05d7a52dcf172e?d=identicon&s=25 Kevin Brown (Guest)
on 2006-01-02 17:15
(Received via mailing list)
On Monday 02 January 2006 01:28, Wilson Bilkovich wrote:
>
> Life is complicated, it turns out.

That's nothing.  Addresses here are based on landmarks:

De donde fue Texaco Viejo 1/2 C al E, 2 c al S
City Name, Department name (sometimes), Nicaragua

Break that one up.

(literal translation is: From where the old texaco USED TO BE (it's a
petronic
now), 1/2 a block to the east and 2 blocks to the south)  That is just
how
addresses are here.
25e11a00a89683f7e01e425a1a6e305c?d=identicon&s=25 Wilson Bilkovich (Guest)
on 2006-01-02 17:43
(Received via mailing list)
On 1/2/06, Kevin Brown <blargity@gmail.com> wrote:
> > e.g. 123 N. NESTOR LANE RD. SE #10B
> (literal translation is: From where the old texaco USED TO BE (it's a petronic
> now), 1/2 a block to the east and 2 blocks to the south)  That is just how
> addresses are here.
>

That's insane. You win. Wow.
5a601582df3b42b65a5e8353fc9305da?d=identicon&s=25 =?ISO-8859-1?Q?Gerardo_Santana_G=F3mez_Garrido?= (Guest)
on 2006-01-02 17:46
(Received via mailing list)
2006/1/2, Steve Litt <slitt@earthlink.net>:
> > <apellido paterno> <apellido materno> <nombres>
>
> I've always wondered about this, both in Spanish names and American hyphenated
> names. When the mother and father have a baby, does it go like this:
>
> baby.apellido_materno = mother.apellido_paterno
> baby.apellido_paterno = father.apellido_paterno

This is the one.
Father's last name prevails in each case. Very machista eh? :)

apellido_materno would be the "mother's name" in the English speaking
world, I believe (since mother's last name is her father's last name)

--
Gerardo Santana
"Between individuals, as between nations, respect for the rights of
others is peace" - Don Benito Juárez
http://santanatechnotes.blogspot.com/
7da0c2cbd3e9a596006b994b6a36f09c?d=identicon&s=25 Daniel Calvelo (Guest)
on 2006-01-02 18:29
(Received via mailing list)
And the only problems arise when one or both of paterno or materno is
itself composite and non hyphenated. Right, Gerardo?

In France, not only you *can* receive either your father's or mother's
family name, but you can change your family when you marry, divorce
remarry and so on.
D4e51fd9554030ab55c379fdc1a34826?d=identicon&s=25 Keith Lancaster (klancaster)
on 2006-01-03 03:01
Christian Neukirchen wrote:
> Wilson Bilkovich <wilsonb@gmail.com> writes:
>
>>
>> Life is complicated, it turns out.
>
> And what's the point of storing that in different fields?

I'm working on a system that interfaces with a GIS mainframe system
(global information system?). It does street address / keymap ops for
police/fire. It requires that all fields are separate, including things
like street prefix, postfix, type, yada yada. A real pain.

Keith
D36eff3004b39abc4b93fe8a410d8bd3?d=identicon&s=25 Ron M (Guest)
on 2006-01-05 19:57
(Received via mailing list)
Wilson Bilkovich wrote:
>>City Name, Department name (sometimes), Nicaragua
>>
>>Break that one up.
>>
>>(literal translation is: From where the old texaco USED TO BE (it's a petronic
>>now), 1/2 a block to the east and 2 blocks to the south)  That is just how
>>addresses are here.
>
> That's insane. You win. Wow.
>

Even in the US it gets tricky.  Consider what the US Postal Service
says about Puerto Rico, where you can have two different houses,
with the same street names and numbers, city, and 5-digit zip-code
in two different places in the same city.
http://www.usps.com/ncsc/addressstds/prgeninfo.htm
And that some streets there don't have street names
http://www.usps.com/ncsc/addressstds/addressformats.htm

And when you're geocoding it gets even worse.  In the Rural US
you find lots of places where the mailboxes are at a different
location than the buildings (say, all mailboxes for a set of
farms centrally located for a group of farms); so geocoding for
emergency services should give different results than geocoding
for mail deliveries.

Nothing really on topic here, except re-iterating that the
original poster's request is a non-trivial problem.
This topic is locked and can not be replied to.