Downcase/uppercase for non-English characters


#1

Hello.

Can subj be done for 1-byte (non-Unicode) encodings, particularly
windows-1251 (Russian).

Are there some interpretor options or third-party libraries for
locale-specific operations?

Thanks.

Victor.


#2

Hi

As Turkish users we have same problems not only in windows-1254 also
utf-8 and i think as a japanese user Matz have same problems also. I
read a solution in a turkish forum may be u can use it. They says for
unicode use jcode=u and write your own regex for downcase/uppercase
functions. So if yu do that inthe ruby code all libraries which use
that function will use your function. But this problem will be fixed in
2.0 version of Ruby.

Serbulent UNSAL


#3

As Turkish users we have same problems not only in windows-1254 also
utf-8 and i think as a japanese user Matz have same problems also.

Hmmm… It seems to me Japanese has no upper/downcases.

I
read a solution in a turkish forum may be u can use it. They says for
unicode use jcode=u and write your own regex for downcase/uppercase
functions. So if yu do that inthe ruby code all libraries which use
that function will use your function.

Yes, I know how to use Unicode (but I don’t want), and know how to write
custom upper/downcase via String.tr (and I’ve already wrote). What I
can’t
handle, is case-insensitive Regexp-matching :-\

But this problem will be fixed in
2.0 version of Ruby.

Would hope, but can’t wait :slight_smile:

In any case, thanks for your help.

Serbulent UNSAL

Victor.


#4
  • Victor S. (removed_email_address@domain.invalid) wrote:

Hello.

Can subj be done for 1-byte (non-Unicode) encodings, particularly
windows-1251 (Russian).

Are there some interpretor options or third-party libraries for
locale-specific operations?

The Unicode property support that can be compiled in to PCRE. I know
There are PCRE bindings for Ruby as well, although I have no idea how
functional they are.

Obviously in order for this to work, you’d need to use Iconv to
convert the data into PCRE-friendly UTF-8.