Proper Case (#89)

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this quiz
    until
    48 hours have passed from the time on this message.

  2. Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

  1. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by elliot temple

sometimes i type in all or mostly lowercase. a friend of mine says it’s
hard to
read essays with no capital letters. so the problem is to write a method
which
takes a string (which could include many paragraphs), and capitalizes
words that
should be capitalized. at minimum it should do the starts of sentences.

solutions could range from simple (a few regexes) to complex (lots of
special
cases are possible, like abbreviations that use a period). an addition
would be
using a dictionary to find proper nouns and capitalize those. it could
also ask
the user about cases the program can’t figure out. or log them.

i can provide an example solution (regex based) and a list of reasons it
doesn’t
work very well, if you want.

sample input:

  • this email itself works nicely

  • this one is hard. sometimes i might want to write about gsub vs. gsub!
    without
    the “.” or “!” causing any capitalization (or the punctuation in
    quotes).

one problem is maybe dealing with sentences that contain periods is too
hard. i
don’t know.

Ruby Q. wrote:

  • this email itself works nicely

  • this one is hard. sometimes i might want to write about gsub vs. gsub! without
    the “.” or “!” causing any capitalization (or the punctuation in quotes).

one problem is maybe dealing with sentences that contain periods is too hard. i
don’t know.

It would be nice if you could assume two spaces after a end of sentence
with puncuation. Generally I think that’s correct grammar, although my
grammar stinks so I could easily be wrong. If you have to get into
parsing incorrect grammar it becomes much more difficult.

On Aug 4, 2006, at 8:20 AM, Mike H. wrote:

It would be nice if you could assume two spaces after a end of
sentence with puncuation. Generally I think that’s correct
grammar, although my grammar stinks so I could easily be wrong. If
you have to get into parsing incorrect grammar it becomes much more
difficult.

Actually, that’s an old typographical convention that we can’t seem
to shake. Here’s an report that talks a little about the issue:

http://webword.com/reports/period.html

Here’s an explanation from that report:

The only reason that two spaces were used after a period during the
‘typewriter’ age was because original typewriters had monospaced
fonts – the extra space was needed for the eye to pick up on the
beginning
of a new sentence. That need is negated w/proportional space type,
hence
[it is] the typographic standard.

James Edward G. II

Mike H. wrote:

it’s hard to
using a dictionary to find proper nouns and capitalize those. it could

  • this one is hard. sometimes i might want to write about gsub vs.
    gsub! without
    the “.” or “!” causing any capitalization (or the punctuation in quotes).

one problem is maybe dealing with sentences that contain periods is
too hard. i
don’t know.

My day job is developing natural language processing apps, and we’ve had
to implement a similar case-correcting tool. What we found is that a
simple regex-based approach is correct about 90% of the time. When we
used machine learning to do the same thing, the results went up to about
95%. Compare this to human performance (i.e. have two or more people
manually correct a text, then compare how often their corrections were
in agreement), which was, IIRC, about 97%.

It would be nice if you could assume two spaces after a end of sentence
with puncuation. Generally I think that’s correct grammar, although my
grammar stinks so I could easily be wrong. If you have to get into
parsing incorrect grammar it becomes much more difficult.

The two-spaces-after-period rule is not a grammatical one; it’s a
typographic convention that grew out of typewriter (i.e. monospaced)
fonts.

On Aug 4, 2006, at 14:20, Mike H. wrote:

it’s hard to
addition would be

  • this email itself works nicely

It would be nice if you could assume two spaces after a end of
sentence with puncuation. Generally I think that’s correct
grammar, although my grammar stinks so I could easily be wrong. If
you have to get into parsing incorrect grammar it becomes much more
difficult.

It’s not correct grammar, just a typographical convention; one which
is sort of semi-obsolete and regularly gives rise to great debate in
typographical circles over its perceived rightness, wrongness, and
pragmatic value.

That isn’t to say you shouldn’t use it, since it’ll be very accurate
in the general case, but redefining the problem to say “anything that
doesn’t use two spaces is wrong” is a bit of a dodge.


Matthew S. [email protected]
Institute for Communicating and Collaborative Systems
University of Edinburgh

It would be nice if you could assume two spaces after a end of sentence
with puncuation. Generally I think that’s correct grammar, although my
grammar stinks so I could easily be wrong. If you have to get into
parsing incorrect grammar it becomes much more difficult.

There’s an old typewriter convention to use two spaces, but I’d be
surprised if you can find a single printed English book that uses two
spaces after a sentence.

Paul.

On 04/08/06, Ruby Q. [email protected] wrote:

sometimes i type in all or mostly lowercase. a friend of mine says it’s hard to
read essays with no capital letters. so the problem is to write a method which
takes a string (which could include many paragraphs), and capitalizes words that
should be capitalized. at minimum it should do the starts of sentences.

perhaps u could also correct rly annoying abbreviations used by ppl
for whom typing a few extra letters is 2 hard! thx!111

(Ugh - did I just type that?!)

Paul.

i say your friend is just being hard headed. we don’t need no stinking
caps! :wink:

t.

James Edward G. II wrote:

[it is] the typographic standard.

James Edward G. II

I stand corrected.

Paul B. wrote:

for whom typing a few extra letters is 2 hard! thx!111
Joking aside, this kind of tool would have been most welcome when I
taught freshman-level programming a few years back. We’re showing our
age here.

James Edward G. II wrote:

http://webword.com/reports/period.html

Here’s an explanation from that report:

The only reason that two spaces were used after a period during the
‘typewriter’ age was because original typewriters had monospaced
fonts – the extra space was needed for the eye to pick up on the
beginning
of a new sentence. That need is negated w/proportional space type, hence
[it is] the typographic standard.

Very interesting. It’s also very interesting to me that I spend most of
my time reading and writing in monospaced fonts and I think two spaces
looks worse in monospace, so I only ever use one. When typing in
proportional fonts I sometimes still do a double-space, but mostly I’ve
given up caring what others think and just do what I want (one space),
similar to the situation with punctuation inside or outside of quotation
marks. I blame latex for my nonchalant attitude, however no matter how
much I use latex I will never fall for the horrendously wrong `` ‘’
convention.

James Edward G. II wrote:

The report mentions this as well:

In short, the “rivers” of whitespace, caused by using two spaces,
invariably annoy graphic designers and typographers.

That sounds like a noble cause. Maybe I’ll reconsider…

On 8/4/06, Hans F. [email protected] wrote:


Rick DeNatale

IPMS/USA Region 12 Coordinator
http://ipmsr12.denhaven2.com/

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/

On Aug 4, 2006, at 9:45 AM, Hans F. wrote:

Here’s an explanation from that report:
two spaces looks worse in monospace, so I only ever use one.
The report mentions this as well:

In short, the “rivers” of whitespace, caused by using two spaces,
invariably annoy graphic designers and typographers.

James Edward G. II

On Aug 4, 2006, at 6:36 PM, Rick DeNatale wrote:

The noble cause being to annoy graphic designers and typographers?

Or maybe you meant something else.

Sorry for the two empty replies. Gmail went crazy on me.

I thought you were trying to start on the noble cause, by adding to
the cause.

On 8/4/06, Hans F. [email protected] wrote:

James Edward G. II wrote:

The report mentions this as well:

In short, the “rivers” of whitespace, caused by using two spaces,
invariably annoy graphic designers and typographers.

That sounds like a noble cause. Maybe I’ll reconsider…

The noble cause being to annoy graphic designers and typographers?

Or maybe you meant something else.

Sorry for the two empty replies. Gmail went crazy on me.

James Edward G., Jr., wrote:

On Aug 4, 2006, at 8:20 AM, Mike H. wrote:

It would be nice if you could assume two spaces after a end of
sentence with puncuation. Generally I think that’s correct
grammar, although my grammar stinks so I could easily be wrong. If
you have to get into parsing incorrect grammar it becomes much more
difficult.

Actually, that’s an old typographical convention that we can’t seem
to shake.

What sort of perversion would make anyone want to shake
an old convention that is useful?

of a new sentence. That need is negated w/proportional space type,
hence
[it is] the typographic standard.

Most people view the posts here in a monospaced font.
If they didn’t, source code would look too chaotic.

TeX and LaTeX, for example, quite properly put extra space
after the end of a sentence. Since what we type here will
usually be displayed monospaced, a sensible person who is
trying to make his message as readable as possible will put
two spaces between sentences.

William J. wrote:

TeX and LaTeX, for example, quite properly put extra space
after the end of a sentence. Since what we type here will
usually be displayed monospaced, a sensible person who is
trying to make his message as readable as possible will put
two spaces between sentences.

Two spaces are needed even when the posts are seen in
a proportional font; without them, there is no extra space
between sentences.

Selon Matthew S. [email protected]:

On Aug 4, 2006, at 14:20, Mike H. wrote:

Ruby Q. wrote:

[…]

typographical circles over its perceived rightness, wrongness, and
pragmatic value.

That isn’t to say you shouldn’t use it, since it’ll be very accurate
in the general case, but redefining the problem to say “anything that
doesn’t use two spaces is wrong” is a bit of a dodge.

I have to add that I never ever read anything about this kind of rule !
And I am
100% sure that this rule does not exist for french typography. I suspect
that
every country will have different spacing schemes according to the
punctuation,
and if you intend to correct english written by foreigner (and a lot of
it is)
or, even better, if you want your program to work with any latin-written
language, you’d better not rely on anything like that ! (I know that I
make
loads of english typography errors because I naturally follow the french
rules… unless I make special effort)

On Aug 5, 2006, at 7:40, William J. wrote:

to shake.

What sort of perversion would make anyone want to shake
an old convention that is useful?

I would consider it a vast personal favour if we didn’t have to re-
hash this never-ending argument in the quiz thread. A quick poke
around Google should familiarise anyone who’s interested with the
basic propositions for and against using two spaces at the end of a
sentence, wikipedia makes a decent start.

matthew smillie.