Unicode roadmap?

Hello, everyone. I am sorry, I was a bit embarassed by the quantity of
text in this discussion and I may have read it not enough carefully to
firure out the answer, and it (discussion) itself seems to be a year
old, so I’ve decided to ask:

Finally, is there a convenient support for Unicode in Ruby? Or, if not,
when will it be?

I am going to develop an international website (with pages in some
european languages, including those using non-latin alphabets). I think
it should prove to be a good idea to make such a website totally in
Unicode (probably UTF-16), without using any legacy encodings at all.
The DBMS I am going to use is Oracle 10g (Express edition until it comes
to its limitations).

As well I would like to ask when the next Ruby release is planned to. If
it comes this year, I should probably try nightly builds as it seems to
be wise to start a new project targeting ea version of the next release.

Thanks in advance.

On Fri, Jun 01, 2007 at 05:29:31AM +0900, Ivan M. wrote:

Finally, is there a convenient support for Unicode in Ruby? Or, if not,
when will it be?

Well, Ruby 1.9 (which is due in December) will have some Unicode
support. (So you’ll have a chars method on strings, like with
Rails.) Matz is working on it right now even, as he posted that he
was tooling around with string.c earlier this week on his blog.

That is, nothing’s been checked in yet. Because he wants it to be
good, you see?

_why

On 2007-05-31 15:30:50 -0700, “Austin Z.” [email protected]
said:

you search as this is a hotly debated discussion.

Google is more useful for searching this than ruby-forum.com. You will
find out when there will be a new release, and the current state of
Unicode.

If it helps any, I’ve moved ~2000 web pages in an internal work project
that had mixed UTF-8/cp-1252 (in the content, not just between pages)
and ruby handled it very gracefully. I was using 1.8.5-p12 and Hpricot
(but not Hpricot’s encoding features, which last I checked are broken)
for the process.

While I’m certainly not an authority on the subject, I’ve thoroughly
battle-tested this and it works with a high degree of confidence.
Certainly better than perl and libxml2, which was our original
implementation.

On 5/31/07, Ivan M. [email protected] wrote:

Finally, is there a convenient support for Unicode in Ruby? Or, if not,
when will it be?

It depends on your definition of ‘convenient’.

The short answer is that unicode applications can be made in Ruby,
particularly Web Apps. It is not especially difficult, but it is not
‘for free’ or seamless. You generally have to use an encoding-aware
string type, or modify the existing string class to support multi-byte
characters.

A longer answer would contain references to the fact that there are
multiple options here, that web apps (Rails in particular) are ahead
of pure Ruby in terms of Unicode, and that there are actually a lot
of projects to investigate.

The hardest part of Ruby and Unicode is that not all of the libraries
support it, or that some of the meta-hackery to the string class
could break libraries that expect chars.length to equal bytes.length
(there are other examples). Some of the more popular libraries are
like this, or they inherit the encoding from your O/S settings and
cannot be driven from an API.

I am going to develop an international website (with pages in some
european languages, including those using non-latin alphabets). I think
it should prove to be a good idea to make such a website totally in
Unicode (probably UTF-16), without using any legacy encodings at all.

Well yes, but I would use UTF-8 instead. Its Unicode designed for the
web (and UTF-16 is a bit wierd in some ways - there are at least 3 kinds
of UTF-16 that I am aware of).

Rails 1.2 introduced some pretty impressive support for Unicode in the
last release, all of the major i18n plugins should be compatible with
these changes by now.

As well I would like to ask when the next Ruby release is planned to. If
it comes this year, I should probably try nightly builds as it seems to
be wise to start a new project targeting ea version of the next release.

AFAIK there is no release schedule. YARV is basically Ruby 1.9, and it
is scheduled for release around the end of the year. However there is no
firm commitment to make it the next Ruby version. Also Ruby 1.9 is going
to break/deprecate stuff - I wouldn’t develop against it, it will be a
rough experience.
Ruby 1.9 is kind of a staging release; migrating from 1.8 → 1.9 is
going
to be tricky, but 1.9 → 2.0 should be a drop in; that the intention -
isolate
the biggest changes to the 1.9 release.

If you are moving to Ruby 1.9, do it with a complete working
application.
Or better still, develop against Rails versions, not Ruby versions. Let
the
Rails team figure out the best Ruby migration strategy for you.

Richard C. wrote:

It depends on your definition of ‘convenient’.

IMHO convinient is as in C#. There I don’t have to bother how are
strings stored in memroy, they just do work and are international.

Well yes, but I would use UTF-8 instead.

Won’t there be a problem if the data is stored in UTF-16 (as far as I
know Orace, NVARCHAR uses 16-bit per symbol)

Also Ruby 1.9 is going to break/deprecate stuff - I wouldn’t develop against it
migrating from 1.8 -> 1.9 is going to be tricky

So why should anyone develop a new project against 1.8 if it is going to
be deprecated?

If you are moving to Ruby 1.9, do it with a complete working
application.

But isn’t it going to be tricky, as you’ve said?

I dont have to be moving for now as I have no line of Ruby code (I have
only an idea in my head) for today. And no Ruby experience (I am C++,
C#, Java and T-SQL developer). I’ve chosen Ruby as it seems almost good
and free.

Have I understood you correctly - you think I should make it Ruby 1.8
and then do a tricky move when it comes?

Or better still, develop against Rails versions, not Ruby versions.

This advice can prove useful. I’ll think about it.

On 5/31/07, Ivan M. [email protected] wrote:

Hello, everyone. I am sorry, I was a bit embarassed by the quantity of
text in this discussion and I may have read it not enough carefully to
firure out the answer, and it (discussion) itself seems to be a year
old, so I’ve decided to ask:

Finally, is there a convenient support for Unicode in Ruby? Or, if not,
when will it be?

There are a lot of answers to that question, and I strongly suggest
you search as this is a hotly debated discussion.

Google is more useful for searching this than ruby-forum.com. You will
find out when there will be a new release, and the current state of
Unicode.

-austin

On Jun 1, 2007, at 9:23 AM, Richard C. wrote:

them Unicode strings very easily through a library (kCODE IIRC),
unicode centric Java or C#.

2.0,

same machine.

But isn’t it going to be tricky, as you’ve said?
only an idea in my head) for today. And no Ruby experience (I am C++,
on POV).
Ruby 1.9
you can.

Or better still, develop against Rails versions, not Ruby versions.

This advice can prove useful. I’ll think about it.

regards,
Richard.

Objective-C (through the Cocoa framework) also handles Unicode
superbly. Problem is, it is not cross-platform and is in fact
strictly OS X stuff, but you could indeed use those libraries
(NSString, etc…) through RubyCocoa, but of course that is far from
convenient or optimal for most purposes.

Ideally, if major OS vendors got behind Ruby full force and put their
Unicode know-how into the codebase, things would be smoother. They’re
the ones who really have already figured out pretty good ways to
handle that stuff, and all the major scripting languages could
benefit from it.

On 6/1/07, Ivan M. [email protected] wrote:

Richard C. wrote:

It depends on your definition of ‘convenient’.

IMHO convinient is as in C#. There I don’t have to bother how are
strings stored in memroy, they just do work and are international.

It’s not that convenient. By default Ruby strings are 8-byte. You can
make
them Unicode strings very easily through a library (kCODE IIRC), and
they
will behave as unicode in a way that you don’t have to think about. You
don’t
have to use a different string type.

The problem occurs when you use code that you didn’t write that expects
strings to be single-byte. So every time you evaluate a Ruby library,
Rails
plugin or gem, you have to do more homework than you would in the
unicode centric Java or C#.

Well yes, but I would use UTF-8 instead.

Won’t there be a problem if the data is stored in UTF-16 (as far as I
know Orace, NVARCHAR uses 16-bit per symbol)

Every database worth using lets you specify the encoding of your string
and character types. Check your manuals or the Oracle forums. Anything
that is any way associated with web development supports UTF-8.

Also Ruby 1.9 is going to break/deprecate stuff - I wouldn’t develop against it
migrating from 1.8 → 1.9 is going to be tricky

So why should anyone develop a new project against 1.8 if it is going to
be deprecated?

Okay, you misunderstood me. There is a feature roadmap towards Ruby 2.0,
where major changes are coming in; the two biggest that I recall are
Unicode
support and native/pre-emptive threads. The only reasonable way to
implement
them are by altering the behaviour of core classes and the standard
library.

This will mean that Ruby code of any sophistication written for Ruby
1.8, including
many libraries is likely to break.

Ruby 1.8 is not going away. Ruby is an open language, with a public
source
repository. Unlike with .Net say, where Microsoft distribute the runtime
in
binary only-form and can make older versions difficult to get. You have
no
obligation to migrate to the most recent version, and there is no
technical
reason that multiple runtimes (application specific) cannot co-exist on
the
same machine.

Chasing the latest release is really something that you only do with
commercial
languages. It’s not something that is generally done with open
languages.

If you are moving to Ruby 1.9, do it with a complete working
application.

But isn’t it going to be tricky, as you’ve said?

It would be one hell of a lot easier than developing against a moving
target, not knowing if the issues in your code are your issues or
due to the latest release candidate.

Bleeding edge software development is for people who can spare a
lot of blood loss;

I dont have to be moving for now as I have no line of Ruby code (I have
only an idea in my head) for today. And no Ruby experience (I am C++,
C#, Java and T-SQL developer). I’ve chosen Ruby as it seems almost good
and free.

Yeah, its a great language. Make a point of checking out the JRuby
project.
Its an exceptionally well developed Ruby runtime; it is considerably
more
than an interpreter or language bridge - the JRuby guys have basically
doubled the size of the Java platform (or Ruby platform depending on
POV).
Ruby is strong where Java is weak, and vice versa.

Have I understood you correctly - you think I should make it Ruby 1.8
and then do a tricky move when it comes?

Use Rails, where the most compelling features in Ruby 1.9/2.0 are
already
present: Unicode, native concurrency (via processes) and good
performance
(via all those caching mechanisms). When the Rails guys go Ruby 1.9
you can.

Or better still, develop against Rails versions, not Ruby versions.

This advice can prove useful. I’ll think about it.

regards,
Richard.

It is very interesting