Downcase part of a string

ilhamik · October 22, 2006, 1:26pm

hi,
I want to downcase a string but without specific parts.
for example:
msg = “THIS is a Text and (NO Change HERE) HELP”

after downcase it should look like “this is a text and (NO Change HERE)
help”

I don’t want to downcase the letters in parentheses.
How can i do that, i tried it with regular expressions but can’t do
it.

Thanks for any help

ilhamik · October 22, 2006, 2:00pm

ilhamik:

hi,
I want to downcase a string but without specific parts.
for example:
msg = “THIS is a Text and (NO Change HERE) HELP”

If the parentheses occur only once:

if msg =~ /(.*?)/
$~.pre_match.downcase + $~[0] + $~.post_match.downcase
end

Kalman

ilhamik · October 22, 2006, 2:14pm

ilhamik wrote:

hi,
I want to downcase a string but without specific parts.
for example:
msg = “THIS is a Text and (NO Change HERE) HELP”

Hi,

This is kind of old school and I am sure there are nicer rubyish
solutions for it, but at least it works for multiple parentheses as
well:

original = msg.scan(/(.+?)/)
msg.downcase!
altered = msg.scan(/(.+?)/)
original.each_with_index { |stuff, i| msg.sub!(altered[i],stuff) }

ilhamik · October 22, 2006, 2:32pm

Thanks Peter, it works fine.

ilhamik · October 22, 2006, 3:43pm

On Oct 22, 2006, at 7:30 AM, ilhamik wrote:

Thanks Peter, it works fine.

You missed Tim B.'s RubyConf talk. According to him we should,
never be using the case changing methods. “Just don’t do it!”

James Edward G. II

ilhamik · October 22, 2006, 2:16pm

No, they can occur more then onece.

ilhamik · October 22, 2006, 8:02pm

Certainly not pretty with that funky regex, but it works:

msg = “THIS is a Text and (NO Change HERE) HELP (Not here Either)”

msg.gsub!(/([^(](?!())|((.?))|()[^)]*))/) do |m|
m[0] == 40 ? m : m.downcase
end

Scott

ilhamik · October 22, 2006, 11:51pm

James Edward G. II wrote:

On Oct 22, 2006, at 7:30 AM, ilhamik wrote:

Thanks Peter, it works fine.

You missed Tim B.'s RubyConf talk. According to him we should, never
be using the case changing methods. “Just don’t do it!”

James Edward G. II

Why not? What reason did he give?
Cheers

ilhamik · October 23, 2006, 3:17am

On 10/22/06, Mike D. [email protected] wrote:

Why not? What reason did he give?
The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of ‘i’, not just two… and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it’s basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.

ilhamik · October 23, 2006, 4:26am

Wilson B. wrote:

Bray says it’s basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.

Thanks Wilson, that explains everything. I’d never thought about
problems like that.
Cheers, Mike

ilhamik · October 24, 2006, 7:20pm

On Oct 22, 2006, at 8:16 PM, Wilson B. wrote:

James Edward G. II

Why not? What reason did he give?

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

Yes, this is basically it.

Tim B. feels that case changing is more or less impossible in the
practical sense. When you get around to downcasing that string a
user entered into your web form a month back, are you going to know
if that string was encoded in a Turkish local (critical info if it
contains an “i”)?

Even if it were possible, Tim suggests that it’s a performance
killer. See Java, which tries to address as many rules as it
possibly can, for proof.

James Edward G. II

ilhamik · October 24, 2006, 7:48pm

On Wed, 25 Oct 2006, James Edward G. II wrote:

James Edward G. II
one caveat that tim did not mention, and which is quite applicable to
many
small sites, is that you simply don’t always have to care. for
instance, if
your site is in english only to don’t have to care. now, i’m not saying
that
is a good idea - but a whole tons of successful business models work
that way:
many successful newspapers, for example, publish in english only. the
trick
is knowing if that’s what you want up front. if that’s unacceptable
then it
does seem like you’re screwed.

-a

ilhamik · October 24, 2006, 7:46pm

Le 23 octobre 2006 à 03:16, Wilson B. a écrit :

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

This is way off topic, but I’d like to know where he heard that. It’s
the first time for me, and I’m a native french speaker…

Fred

ilhamik · October 25, 2006, 3:33am

“no”.capitalize, Tim is right, but ruby is a “logical” language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini’s
because some people can’t wear them doesn’t seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.

ilhamik · October 25, 2006, 1:59am

F. Senault wrote:

Le 23 octobre 2006 à 03:16, Wilson B. a écrit :

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

This is way off topic, but I’d like to know where he heard that. It’s
the first time for me, and I’m a native french speaker…

That’s very interesting. So Tim is mistaken?

Hal

ilhamik · October 25, 2006, 5:55am

x1 wrote:

“no”.capitalize, Tim is right, but ruby is a “logical” language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini’s
because some people can’t wear them doesn’t seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.

I don’t think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.

Hal

ilhamik · October 25, 2006, 5:23am

On Oct 24, 2006, at 4:57 PM, Hal F. wrote:

F. Senault wrote:

Le 23 octobre 2006 à 03:16, Wilson B. a écrit :

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.
This is way off topic, but I’d like to know where he heard that.
It’s
the first time for me, and I’m a native french speaker…

That’s very interesting. So Tim is mistaken?

I’ve been told that common usage differs in Québec. -Tim

ilhamik · October 25, 2006, 6:15am

On 10/24/06, Hal F. [email protected] wrote:

It’s entirely possible I’m mis-remembering that part of Tim’s talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented ‘e’ character on it.

ilhamik · October 25, 2006, 6:29am

Wilson B. wrote:

On 10/24/06, Hal F. [email protected] wrote:

I don’t think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.

It’s entirely possible I’m mis-remembering that part of Tim’s talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented ‘e’ character on it.

That’s the way I remember it – he said that a lowercase accented
character was sometimes uppercased differently, and it varied
“from district to district.”

Earlier tonight I think he mentioned Quebec (but with a proper accent
that I don’t know how to type).

I wouldn’t be surprised if the French sometimes sneered a little at
the French spoken in Quebec, the way (sometimes) Brits make fun of
Americans, or Spanish (or Colombians) make fun of Mexicans.

But heck: Even if he was totally mistaken, his point still stands –
that capitalization is an unholy mess and is to be avoided. (Actually
he might have stated it more strongly.) Mistaken or not on that one
point, I thought the talk was excellent and informative.

Tim: Read my ch 4 when you can and give me your opinion.

Hal

ilhamik · October 25, 2006, 6:57am

Prettier regexp, paid for with two more steps:

msg = “THIS is a Text and (NO Change HERE) HELP (Not here Either)”

(")"+msg+"(").gsub(/)(.*?)(/) {|i| i.downcase}[1…-2]

martin