Forum: Rails-core (closed, excessive spam) Getting back to the old story of Unicode support

A2b2f4ee23989dc68529baef9cbddcd6?d=identicon&s=25 Julian 'Julik' Tarkhanov (Guest)
on 2006-06-11 09:10
(Received via mailing list)
I ranted and I raved and I tried and failed :-)

Unfortunately, my proposal to wreak havoc in the String class proved
futile - it popped up bugs that I never seen before (among others CGI
escaping and ERB broke, the latter in a rather intricate way).
Outside of these two problems the solution worked on my applications
though (but I would not recommend using it in production by now as
all the implications become clear). Also the speed overhead proved
very substantial. However, after some more tweaks I have come to the
simple idea of having proxy access to the characters instead of
subclassing. Maybe this can be useful for ActiveSupport (it is going
to be able to address the more expensive routines only where
characters are involved, and when Matz finally comes up with a good
M17N engine - doh - it will be a matter of aliasing a method to self.

Works (in general) like this:

@some_unicode_string.u.length
@some_unicode_string.u.reverse

etc. with modifications to the string in-place where necessary.

Here is the test suite
http://julik.textdriven.com/svn/tools/rails_plugin...
test/t_string_overrides.rb

I think this can be refactored nicely into the string extensions in
ActiveSupport, after that things like validates_length_of and
truncate will be able to address the characters explicitly without
resorting to regexen. And the users get a goot head start on Unicode
in their apps from the get go.

It has an implicit dependency on the Unicode gem for normalization
and capitalization, but that can be easily stubbed out to dummy
methods and made optional (having, for example, an alert in the
development log). Considering that Rails has some implicit
dependencies all by itself I don't see that as too much burden, but
this is my opinion of course.

I will gladly provide a patch shall the core find this something
noteworthy.

--
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl
8f93a872e399bc1353cc8d4e791d5401?d=identicon&s=25 Mislav MarohniÄ? (mislav)
on 2006-06-11 14:58
(Received via mailing list)
That is some nice work here, Julian. Proxy access makes this possible to
adopt while not breaking anything - I would, too, like to see this
someday
_without_ a proxy.

I have a question and a request that aren't really about the hacks (I
would
leave that to more experienced programmers) but about the plugin. First,
about 'db_unicode_client.rb' - isn't this functionality (setting client
encoding) already present by specifying 'encoding' attribute in
database.yml?
Second: when setting "content-type" header for output, could you not
force
text/html but put in a variable which value defaults to text/html so we
can
provide 'application/xhtml+xml' or other content types in specific
controllers?

Sorry to bug you with such somewhat not very relevant issues, but I feel
that this needs to be a truly universally droppable plugin and these
kind of
minor tweaks will make it such.

Keep up the good i18n work,
-Mislav
A2b2f4ee23989dc68529baef9cbddcd6?d=identicon&s=25 Julian 'Julik' Tarkhanov (Guest)
on 2006-06-12 00:31
(Received via mailing list)
On 11-jun-2006, at 14:56, Mislav MarohniÄ? wrote:

> html but put in a variable which value defaults to text/html so we
> can provide 'application/xhtml+xml' or other content types in
> specific controllers?
>
> Sorry to bug you with such somewhat not very relevant issues, but I
> feel that this needs to be a truly universally droppable plugin and
> these kind of minor tweaks will make it such.

unicode_hacks has to go, if the core will agree for the chars proxy.
As to the encoding configuration, this has to be done in the
connection adapters - the reason being, I haven't yet met an
implementation (either in Perl or PHP or Python - and I suspect AR is
no different) that would maintain a client encoding should the
connection "go away" (it means "do another query when you have to
reconnect"). Rails uses persistent connections right now, meaning
that without this I am insecure from having my NAMES reset to
something I really didn't want when the connection needs to be
reestablished (and it's common for SQL sockets on shareds to
timeout). This has to be handled by ActiveRecord's adapters IMO (if
it's not handled already).

As to the headers, Rails should just default for utf-8 headers when
$KCODE is UTF for both xml, rjs and html. See a ticket on this:

http://dev.rubyonrails.org/ticket/4975

All of this can be implemented and tested in a backwards-compatible
way and friendly for (for example) Japanese folks that want their
$KCODE set to JIS and friends, or German people who tend to rely on
ISO. The Chars abstraction also caters for this requirement by always
checking $KCODE.

--
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles O Nutter (Guest)
on 2006-06-12 02:21
(Received via mailing list)
I'll jump in to say that we on the JRuby team are also very interested
in
finding a way to support unicode. We're close to running Rails in more
general scenarios, and obviously running on top of the JVM we have the
potential for unicode support out of the box. However, we have held off
providing any API, hoping for Ruby proper to lead the way. We're not
interested in forking the community in any way or providing incompatible
functionality; but if there's an acceptable API coming out of Rails that
works well and feels right, it could be the answer.

I have not had a chance to look at Julian's work, but we'll be watching
these developments. One of the most frequent question we get from
would-be
JRuby users is "why don't you support unicode." We want to...we really
do.
Fc4da4d742a58c99a8e5e6c8561102c6?d=identicon&s=25 Thijs van der Vossen (Guest)
on 2006-06-12 09:38
(Received via mailing list)
On 11 Jun 2006, at 09:09 , Julian 'Julik' Tarkhanov wrote:
> @some_unicode_string.u.length
> @some_unicode_string.u.reverse

+1

This is very slick indeed. We at Fingertips would love to see this
added to the core.

Kind regards,
Thijs

--
Fingertips - http://www.fngtps.com

Phone: +31 (0)6 24204845
Skype: tvandervossen

MSN Messenger: thijs@fngtps.com
iChat/AOL:  t.vandervossen@mac.com
Jabber IM: thijs@jabber.org
A2b2f4ee23989dc68529baef9cbddcd6?d=identicon&s=25 Julian 'Julik' Tarkhanov (Guest)
on 2006-06-12 22:35
(Received via mailing list)
On 12-jun-2006, at 9:36, Thijs van der Vossen wrote:

> On 11 Jun 2006, at 09:09 , Julian 'Julik' Tarkhanov wrote:
>> @some_unicode_string.u.length
>> @some_unicode_string.u.reverse
>
> +1
>

Moved to http://julik.textdriven.com/svn/tools/rails_plugins/
unicode_hacks/test/t_chars.rb

--
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl
A2b2f4ee23989dc68529baef9cbddcd6?d=identicon&s=25 Julian 'Julik' Tarkhanov (Guest)
on 2006-06-14 21:03
(Received via mailing list)
I have incorporated all of the above under

http://dev.rubyonrails.org/ticket/5396

Would love to have ome feedback.
--
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl
290cf664d9e6f823fc3af57556493db7?d=identicon&s=25 PJ Hyett (Guest)
on 2006-06-16 22:57
(Received via mailing list)
Could you make this a plugin?

Thanks,
PJ
A2b2f4ee23989dc68529baef9cbddcd6?d=identicon&s=25 Julian 'Julik' Tarkhanov (Guest)
on 2006-06-17 00:34
(Received via mailing list)
On 16-jun-2006, at 22:55, PJ Hyett wrote:

> Could you make this a plugin?

It is a plugin already. Thanks for being helpful.

--
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl
290cf664d9e6f823fc3af57556493db7?d=identicon&s=25 PJ Hyett (Guest)
on 2006-06-18 06:16
(Received via mailing list)
I hadn't noticed unicode_hacks plugin was updated, thanks.

-PJ
03cb7371db223067d0daab54e1f95cb2?d=identicon&s=25 Abdur-rahman Advany (abdurrahman)
on 2006-06-18 13:03
(Received via mailing list)
Really like this! Has there been any feedback from the core?

PJ Hyett wrote:
>> It is a plugin already. Thanks for being helpful.
>> http://lists.rubyonrails.org/mailman/listinfo/rails-core
>>
> _______________________________________________
> Rails-core mailing list
> Rails-core@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails-core
>


--
Abdur-Rahman Advany
http://blog.railsdevelopment.com/
This topic is locked and can not be replied to.