I ranted and I raved and I tried and failed :-) Unfortunately, my proposal to wreak havoc in the String class proved futile - it popped up bugs that I never seen before (among others CGI escaping and ERB broke, the latter in a rather intricate way). Outside of these two problems the solution worked on my applications though (but I would not recommend using it in production by now as all the implications become clear). Also the speed overhead proved very substantial. However, after some more tweaks I have come to the simple idea of having proxy access to the characters instead of subclassing. Maybe this can be useful for ActiveSupport (it is going to be able to address the more expensive routines only where characters are involved, and when Matz finally comes up with a good M17N engine - doh - it will be a matter of aliasing a method to self. Works (in general) like this: @some_unicode_string.u.length @some_unicode_string.u.reverse etc. with modifications to the string in-place where necessary. Here is the test suite http://julik.textdriven.com/svn/tools/rails_plugin... test/t_string_overrides.rb I think this can be refactored nicely into the string extensions in ActiveSupport, after that things like validates_length_of and truncate will be able to address the characters explicitly without resorting to regexen. And the users get a goot head start on Unicode in their apps from the get go. It has an implicit dependency on the Unicode gem for normalization and capitalization, but that can be easily stubbed out to dummy methods and made optional (having, for example, an alert in the development log). Considering that Rails has some implicit dependencies all by itself I don't see that as too much burden, but this is my opinion of course. I will gladly provide a patch shall the core find this something noteworthy. -- Julian 'Julik' Tarkhanov please send all personal mail to me at julik.nl
on 2006-06-11 09:10
on 2006-06-11 14:58
That is some nice work here, Julian. Proxy access makes this possible to adopt while not breaking anything - I would, too, like to see this someday _without_ a proxy. I have a question and a request that aren't really about the hacks (I would leave that to more experienced programmers) but about the plugin. First, about 'db_unicode_client.rb' - isn't this functionality (setting client encoding) already present by specifying 'encoding' attribute in database.yml? Second: when setting "content-type" header for output, could you not force text/html but put in a variable which value defaults to text/html so we can provide 'application/xhtml+xml' or other content types in specific controllers? Sorry to bug you with such somewhat not very relevant issues, but I feel that this needs to be a truly universally droppable plugin and these kind of minor tweaks will make it such. Keep up the good i18n work, -Mislav
on 2006-06-12 00:31
On 11-jun-2006, at 14:56, Mislav MarohniÄ? wrote: > html but put in a variable which value defaults to text/html so we > can provide 'application/xhtml+xml' or other content types in > specific controllers? > > Sorry to bug you with such somewhat not very relevant issues, but I > feel that this needs to be a truly universally droppable plugin and > these kind of minor tweaks will make it such. unicode_hacks has to go, if the core will agree for the chars proxy. As to the encoding configuration, this has to be done in the connection adapters - the reason being, I haven't yet met an implementation (either in Perl or PHP or Python - and I suspect AR is no different) that would maintain a client encoding should the connection "go away" (it means "do another query when you have to reconnect"). Rails uses persistent connections right now, meaning that without this I am insecure from having my NAMES reset to something I really didn't want when the connection needs to be reestablished (and it's common for SQL sockets on shareds to timeout). This has to be handled by ActiveRecord's adapters IMO (if it's not handled already). As to the headers, Rails should just default for utf-8 headers when $KCODE is UTF for both xml, rjs and html. See a ticket on this: http://dev.rubyonrails.org/ticket/4975 All of this can be implemented and tested in a backwards-compatible way and friendly for (for example) Japanese folks that want their $KCODE set to JIS and friends, or German people who tend to rely on ISO. The Chars abstraction also caters for this requirement by always checking $KCODE. -- Julian 'Julik' Tarkhanov please send all personal mail to me at julik.nl
on 2006-06-12 02:21
I'll jump in to say that we on the JRuby team are also very interested in finding a way to support unicode. We're close to running Rails in more general scenarios, and obviously running on top of the JVM we have the potential for unicode support out of the box. However, we have held off providing any API, hoping for Ruby proper to lead the way. We're not interested in forking the community in any way or providing incompatible functionality; but if there's an acceptable API coming out of Rails that works well and feels right, it could be the answer. I have not had a chance to look at Julian's work, but we'll be watching these developments. One of the most frequent question we get from would-be JRuby users is "why don't you support unicode." We want to...we really do.
on 2006-06-12 09:38
On 11 Jun 2006, at 09:09 , Julian 'Julik' Tarkhanov wrote: > @some_unicode_string.u.length > @some_unicode_string.u.reverse +1 This is very slick indeed. We at Fingertips would love to see this added to the core. Kind regards, Thijs -- Fingertips - http://www.fngtps.com Phone: +31 (0)6 24204845 Skype: tvandervossen MSN Messenger: thijs@fngtps.com iChat/AOL: t.vandervossen@mac.com Jabber IM: thijs@jabber.org
on 2006-06-12 22:35
On 12-jun-2006, at 9:36, Thijs van der Vossen wrote: > On 11 Jun 2006, at 09:09 , Julian 'Julik' Tarkhanov wrote: >> @some_unicode_string.u.length >> @some_unicode_string.u.reverse > > +1 > Moved to http://julik.textdriven.com/svn/tools/rails_plugins/ unicode_hacks/test/t_chars.rb -- Julian 'Julik' Tarkhanov please send all personal mail to me at julik.nl
on 2006-06-14 21:03
I have incorporated all of the above under http://dev.rubyonrails.org/ticket/5396 Would love to have ome feedback. -- Julian 'Julik' Tarkhanov please send all personal mail to me at julik.nl
on 2006-06-17 00:34
On 16-jun-2006, at 22:55, PJ Hyett wrote:
> Could you make this a plugin?
It is a plugin already. Thanks for being helpful.
--
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl
on 2006-06-18 13:03
Really like this! Has there been any feedback from the core? PJ Hyett wrote: >> It is a plugin already. Thanks for being helpful. >> http://lists.rubyonrails.org/mailman/listinfo/rails-core >> > _______________________________________________ > Rails-core mailing list > Rails-core@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails-core > -- Abdur-Rahman Advany http://blog.railsdevelopment.com/