Sorting Titles in Ruby

Before I start writing this, I thought I’d see if something already
exists, but I’m having trouble formulating a good google search this
morning.

Let’s say I have a list of, say movies.

20,000 Leagues Under the Sea
A Fine Madness
Fanny
Marooned
The Man Who Came to Dinner

I’d like to sort those in title order, as if they were actually

Fanny
Fine Madness, A
Man Who Came to Dinner, The
Marooned
Twenty-thousand Leagues Under the Sea

In other words, leading articles should be moved to the end, and
numbers should be spelled out before sorting.

It seems that sorting titles like this is fairly common, so maybe this
already exists in a usable form.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On Dec 24, 2007 6:11 PM, Rick DeNatale [email protected] wrote:

In other words, leading articles should be moved to the end, and
numbers should be spelled out before sorting.

You can use the Linguistics
gem(http://www.deveiate.org/projects/Linguistics/) to convert numbers
to words. I never used it, so you need probably a simple regex to
extract the numbers.
Leading articles can be easily removed with a regex.

On Dec 24, 12:01 pm, Thomas W. [email protected]
wrote:

On Dec 24, 2007 6:11 PM, Rick DeNatale [email protected] wrote:

In other words, leading articles should be moved to the end, and
numbers should be spelled out before sorting.

You can use the Linguistics
gem(http://www.deveiate.org/projects/Linguistics/) to convert numbers
to words. I never used it, so you need probably a simple regex to
extract the numbers.
Leading articles can be easily removed with a regex.

Simple implementation of that…

Glenn Parker’s solution to Ruby Q. #25 [1]

require “num2eng”

class String
def title_case
# from somewhere…can’t recall
gsub(/\b\w/) { $&.upcase }
end
def canonical_form
nums = scan(/[\d,_]+/)
unless nums.empty?
nums.each { | num |
sub = num.gsub(“,”,“”).to_i.to_english.title_case
gsub!(num, sub)
}
end
gsub(/\A(An?|The)(.*)/, ‘\2, \1’).strip
end
end

titles = [
“20,000 Leagues Under the Sea”,
“A Fine Madness”,
“Fanny”,
“Marooned”,
“The Man Who Came to Dinner”,
“5 Days and 6 Nights of Fictitious Cinema”
]

puts titles.map { | title |
title.canonical_form
}.sort

=>

Fanny
Fine Madness, A
Five Days and Six Nights of Fictitious Cinema
Man Who Came to Dinner, The
Marooned
Twenty Thousand Leagues Under the Sea

Ps. Corner cases like “20, 000 Leagues …” will break, but it’s a
start. :slight_smile:

[1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/135449

Regards,
Jordan

On Dec 26, 3:51 am, MonkeeSage [email protected] wrote:

to words. I never used it, so you need probably a simple regex to
# from somewhere…can’t recall
gsub(/\b\w/) { $&.upcase }
end
def canonical_form
nums = scan(/[\d,_]+/)
unless nums.empty?
nums.each { | num |
sub = num.gsub(“,”,“”).to_i.to_english.title_case
gsub!(num, sub)
}
end

Err…that’s silly. I meant…

scan(/[\d,_]+/).each { | num |
  sub = num.gsub(",","").to_i.to_english.title_case
  gsub!(num, sub)
}

…too much eggnog this evening, heh. :wink:

On Dec 24, 2007 9:11 PM, Rick DeNatale [email protected] wrote:

Before I start writing this, I thought I’d see if something already
exists, but I’m having trouble formulating a good google search this
morning.

You can use my titlecase utility as a starting point:

http://zem.novylen.net/ruby/titlecase.rb

martin

On Dec 27, 2007 4:34 AM, Martin DeMello [email protected] wrote:

On Dec 24, 2007 9:11 PM, Rick DeNatale [email protected] wrote:

Before I start writing this, I thought I’d see if something already
exists, but I’m having trouble formulating a good google search this
morning.

You can use my titlecase utility as a starting point:

http://zem.novylen.net/ruby/titlecase.rb

martin

I think that capitalizing a title “properly,” whatever that means, is
a separate issue. In fact I had just done some google searching on
phrases like “title capitalization” and it seems that there are all
kinds of different styles. I also see that ActiveSupport has a
titlecase method on String which just capitalizes each word.

So for my application, I think I’ll leave the title capitalization to
the humans, and I’m leaning towards sorting a canonical form which
moves leading articles to the end of the title, and converts numbers
(I’m just now thinking about Roman numerals) to words, then lowercases
everything to make the sort case insensitive.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On Dec 26, 3:58 am, MonkeeSage [email protected] wrote:

numbers should be spelled out before sorting.
require “num2eng”
sub = num.gsub(“,”,“”).to_i.to_english.title_case

“Marooned”,
Fanny

Regards,
Jordan

What the heck…again…I meant this…

  gsub!(num) {
    num.gsub(",","").to_i.to_english.title_case
  }

…friends don’t let friends drink and code!