Re curly quotes and forcing ascci


#1

Listers-

I'm doing some text munging for the obituaries to go online at the

newspaper i work for. I have a simple question. Is there a way to
convert some text to plain ascii? i mean the text I am processing has
curly quotes and a few other rich text chars in it. Is there a pure
ruby way of converting these chars into their plain ascii
counterparts short of a regex for each char?

Thanks-
-Ezra


#2

Ezra Z. removed_email_address@domain.invalid writes:

Listers-

I’m doing some text munging for the obituaries to go online at
the newspaper i work for. I have a simple question. Is there
a way to convert some text to plain ascii? i mean the text I
am processing has curly quotes and a few other rich text
chars in it. Is there a pure ruby way of converting these
chars into their plain ascii counterparts short of a regex
for each char?

What format is the text in? Could you maybe post a snippet, if
possible?

(And, btw, can you tell me the URL of that newspaper. I’m very
interested in putting obituraries online, since that is basically the
only reason our local newspaper gets read at all…)


#3

On Jan 14, 2006, at 8:50 AM, Christian N. wrote:

for each char?

Christian N. removed_email_address@domain.invalid http://
chneukirchen.org

Christian-

I won't be in the office until tuesday but I will post a sample

then. The url of the newspaper is http://yakimaherald.com . The whole
site runs on rails. And I have a ton of ruby code that ties together
the different departments as well. Lots of text processing between
classified/newsroom/web. Also our entire intranet runs on ruby.
Circulation/accounting/prepress/surveys and employee reviews.

Obituaries are a big traffic draw to our web site as well. Thats why

we are working on a better system. Right now the obits don’t make it
online until the day after they are in the paper and thats not right.
So instead of letting the obits make their way through the newsroom
database system, I am going to bypass it and send it straight to the
web instead. The format of the text is from an MacOS9 machine so it
has \r for line endings and uses curly quotes and a few other chars
that don’t translate well to being displayed on the web. The database
that i pull them out of is an old proprietary BaseView db and the
company is not very forthcoming in helping us use the system in ways
they didn’t envision already.

Cheers-
-Ezra


#4

On 1/13/06, Ezra Z. removed_email_address@domain.invalid wrote:

-Ezra

I suspect you’ll have to define the mapping yourself… the Unicode
characters U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE
QUOTATION MARK don’t appear to have any defined Unicode
composition/decomposition mappings.

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt for data,
http://www.unicode.org/Public/UNIDATA/UCD.html#Decomposition_Mapping
for further detail

-A