Forum: Ruby Capital Cyrillic letter in Ruby class name (UTF-8)

Posted by Vladimir Kerimov (quazar)
on 2012-04-17 17:05
Hi matz,
I am not happy to use Ruby.

All I want is to generate class by name from XML-structure, this is
correct class name, starts from capital Cyrillic letter, but it is
impossible.
Error:
class/module name must be CONSTANT

So I have suitable binding of our XML-structure on Python, I would like
to have same on Ruby. Is it so hard to allow people from another country
use they language and capital letters?

I don't want to create class RДокумент! (first ASCII, next UTF-8
Cyrillic)
I need a simple way to make classes like this: class Документ.
Classes with latin beginning with capital latin letter make developer
switch keyboard language with endless Ctrl+Shift.
This make developers completely unhappy and the way they go from Ruby
binding to Python.

We make buiseness logic written basically on C++ binding to Ruby/Python.
All we want is to load XML-structure of classes, methods and properties
from file to generate classes on Ruby/Python.
Posted by Marvin Gülker (quintus)
on 2012-04-17 17:36
(Received via mailing list)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 17.04.2012 17:05, schrieb Vladimir Kerimov:
> I need a simple way to make classes like this: class Документ.

It’s perfectly possible.

irb(main):001:0> Документ = Class.new
=> #<Class:0x00000001b5e0e0>
irb(main):002:0> obj = Документ.new
=> #<#<Class:0x00000001b5e0e0>:0x00000001b57e48>

You may want to overwrite the class’ ::inspect method to display
properly in IRB. But note that it is regarded as a local variable
rather than a constant; you could use a namespace module with module
methods (which can be called using the :: syntax) to achieve what you
want, i.e. something like

=========================
module Namespace

  X = Class.new
  def X.inspect; "Документ"; end
  def X.name; "Документ"; end

  def self.Документ
    X
  end

end

obj = Namespace::Документ.new
=========================

Note that I assign the new class to a separate constant (X) instead of
just returning Class.new in the Namespace::Документ() method, because
this would return a new class each time you call it (making checks
with #kind_of? useless).
However, using local names doesn’t seem a good idea to me altogether,
most programming languages are inspired by English and mixing it with
another language makes it look weird. Even worse, a framework like
Rails expects English nouns to be used as classnames, because it
automatically derives a whole bunch of further names from it by
inflecting the word according to English grammar rules. Local names
break such frameworks completely.

This obviously requires Ruby 1.9.

Vale,
Marvin

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPjY1TAAoJELh1XLHFkqhaQNgH/2P53eA0VeZEwzugtQGr4qHz
9Y2mcOYzVDkpOReCwMbN28FJ/jqrhDwjjPfsLqQKUi2ApxCCeRMWyhII96AA+Y7g
VXw3uipwTol+1EzWVoCXkBDHrLuKxqEDcSD28FGoL95EnO2uF9w2YnVOG8ctFtpA
QO8hxxCGsng2UxuPRCUVKbCXroz07AnEx6MD/bKcg27eVvycm11yP5yxnJQdFjay
ZvRfHAiCXBb63ImOu613+/ZG8Itx7kXr/1lnjZOnHpt9dKGClmN35crrON/wDDfD
1do7W3OlNNTJ7ZbzJ5CZ2jnAX7lKzAduBF77608Nup2bzhFoVyQ8C+dqX2rJdcg=
=Lx2r
-----END PGP SIGNATURE-----
Posted by Vladimir Kerimov (quazar)
on 2012-04-17 17:47
Here Документ is only a local variable.
Also method definition looks like it was come from hell.
I need to use UTF-8 names for buiseness logic of all other companies who 
does not care about English. All they want is suitable logic on Russian.
Simply call Документ.Прочитать(ИдО=123).
Or call Документ.Записать().
It is on Python, it could be on Ruby but here is very strange limitation 
of first character must be ASCII capital letter! ASCII! Why you allow 
UTF-8 encoding and forbid to make classes like this:

class Пользователь
    def Найти(...)
        ....
    end
end

From similar ordered XML-structure.
It is strange and unreasonable check.

By the way, we don't need ActiveRecord or Rails at all.
Posted by Bartosz Dziewoński (matmarex)
on 2012-04-17 18:17
(Received via mailing list)
W dniu 17 kwietnia 2012 17:47 użytkownik Vladimir Kerimov
<lists@ruby-forum.com> napisał:
> class Пользователь
>    def Найти(...)
>        ....
>    end
> end

Vladimir, unfortunately Unicode and internationalisation in general is
hard. In C code (that's what Ruby is written in), checking whether a
letter is an ASCII capital one is dead simple. Checking whether is it
an Unicode capital requires decoding it from whichever variant of UTF
is was encoded in and looking it up in one oh huge tables defining
various letter properties to check if it is a capital or not. You
would probably have to change parsing rules and a lot of internal code
to make it possible, since capital first letter is the only difference
between Ruby constant or variable.

I think you should try posting to Ruby's bug and feature tracker, at
http://bugs.ruby-lang.org/.

-- Matma Rex
Posted by Brian Candler (candlerb)
on 2012-04-18 00:39
Bartosz Dziewoński wrote in post #1057002:
> Vladimir, unfortunately Unicode and internationalisation in general is
> hard. In C code (that's what Ruby is written in), checking whether a
> letter is an ASCII capital one is dead simple. Checking whether is it
> an Unicode capital requires decoding it from whichever variant of UTF
> is was encoded in and looking it up in one oh huge tables defining
> various letter properties to check if it is a capital or not. You
> would probably have to change parsing rules and a lot of internal code
> to make it possible, since capital first letter is the only difference
> between Ruby constant or variable.

Please note that the above is untrue.

ruby 1.9 already does have the functions built-in to determine whether a 
letter is Unicode capital or not; however it explicitly only accepts 
ASCII upper-case for the start of constants. This was a conscious design 
decision. See http://redmine.ruby-lang.org/issues/show/1853
and the other threads linked from that Redmine issue.

If you wanted to allow Unicode capitals as the start of constants, it 
would be a very simple patch to the C source.

What I've managed to gather about upper/lower case handing in ruby 1.9 
is documented at
https://github.com/candlerb/string19/blob/master/string19.rb
(section 11)

Regards,

Brian.

P.S. I'm not saying I think this is a good design decision - in fact I 
think the whole encoding aspect of ruby 1.9 is a dog's breakfast - but 
that's purely my opinion.
Posted by Vladimir Kerimov (quazar)
on 2012-04-20 07:26
So, You talk about if I want normal name for my classes then I need to 
change Ruby source code, recompile it and be happy?! :)
No, that is wrong way.
All I need is box solution: install Ruby and use it (and be happy).
Also for all other companies what will use this binding not allowed to 
download Ruby installer wherever except official Ruby site, where is no 
patch for capital Cyrillic symbols.
No way. I need official patch or wait to fix it in next release, for now 
I can't use Ruby as high-level binding. So we still only on Python.
Posted by Peter Zotov (Guest)
on 2012-04-20 08:09
(Received via mailing list)
Vladimir Kerimov писал 17.04.2012 19:47:
> of first character must be ASCII capital letter! ASCII! Why you allow
>
> By the way, we don't need ActiveRecord or Rails at all.

If you want to use 1C, go and use 1C. While internationalized names per
se
are a very opinionated topic, in the case of Ruby they are definitely
and
explicitly bad because all other parts of ecosystem are already in
English
and this will not change (not to mention that even if such a patch gets
accepted, and I hope it will not be, keywords still will be in English
for
foreseeable future.)
Posted by Jan E. (jacques1)
on 2012-04-20 12:32
Vladimir Kerimov wrote in post #1057521:
> No way. I need official patch or wait to fix it in next release, for now
> I can't use Ruby as high-level binding. So we still only on Python.

Really, I don't get what the fuzz is all about. If one idea doesn't
work, well, then try something else. You may not get the perfect
solution, but you'll certainly manage to circumvent this problem
(especially in Ruby!). Marvins solution, for example, seems perfectly
acceptable to me. I see no problem with the variables, because there's
no real danger of overwriting them (they cannot be accessed from outside
the class).

But whining and complaining about a missing feature seems rather stupid
to me. Brian already told you that this was a conscious decision, so it
probably won't change in the near future.
Posted by Vladimir Kerimov (quazar)
on 2012-04-20 13:07
All I want is normal name for classes autogenerated from XML-structure.
Check of ASCII capital letter is first letter of class must be removed 
or fixed at least for Russian and Greek section in UTF-8.

class Документ
  def Провести(...)
    ...
  end
end

is absolutely correct with condition of first letter must be capital.
So this check is unreasonable and incorrect.
I don't want to call my class RДокумент or use some hack.

Please allow us to use normal class names, not RКлиент.
It is already work in Python binding, so all I need to make same in 
Ruby.

P.S. It is not "1C" and not related to it.
Posted by Jan E. (jacques1)
on 2012-04-20 13:23
*lol*

I'm looking forward to endless discussions whenever you think a certain 
(questionable) feature is missing:

"Hi Matz, we urgently need multiple inheritance. Please implement it as 
soon  as possible!"

"Hi Matz, please change the keywords to Russian!"

...
Posted by Vladimir Kerimov (quazar)
on 2012-04-20 14:19
It could be funny, but it is make us use method at the place where we
need to use class. Thanks for you does not forbid method name starts
from Cyrillic letter. This make Ruby illogical and move Ruby-binding
behind current Python-binding, that is logical at all.
Posted by Brian Candler (candlerb)
on 2012-04-21 21:57
Vladimir Kerimov wrote in post #1057566:
> It could be funny, but it is make us use method at the place where we
> need to use class. Thanks for you does not forbid method name starts
> from Cyrillic letter. This make Ruby illogical and move Ruby-binding
> behind current Python-binding, that is logical at all.

Matz is our benevolent dictator, Guido is Python's. Their decisions win.

You need to take each language as a whole. If on balance you prefer 
Ruby, then use Ruby; if something else has the best combination of 
features for you (of the fewest annoyances), then use that. Whichever 
one gets the job done best for you.

The decision that constants must start with ASCII A-Z is fundamental and 
unlikely to be revisited.
Posted by Vladimir Kerimov (quazar)
on 2012-04-28 15:02
Brian Candler wrote in post #1057729:
> Vladimir Kerimov wrote in post #1057566:
>
> You need to take each language as a whole. If on balance you prefer
> Ruby, then use Ruby; if something else has the best combination of
> features for you (of the fewest annoyances), then use that. Whichever
> one gets the job done best for you.
>
> The decision that constants must start with ASCII A-Z is fundamental and
> unlikely to be revisited.

All I need is suitable binding for company uses Ruby.
Condition for classes starts from ASCII A-Z for language based on UTF-8 
is slightly strange, don't you think so?
What about Japanese letters, is it in ASCII? Or may be Greek alphabet?

Cyrillic and Greek letters both have a legal capital letters in UTF-8.
This logic not differ from capital letter of Latin capital.

So language based on UTF-8 with condition "classes and modules must 
start from capital letter" must handle at least Greek and Cyrillic 
letters too.

Actually it is. If you want to stay in the logic way.
Posted by Peter Zotov (Guest)
on 2012-04-28 17:40
(Received via mailing list)
Vladimir Kerimov писал 28.04.2012 17:02:
>> and
> UTF-8.
> This logic not differ from capital letter of Latin capital.
>
> So language based on UTF-8 with condition "classes and modules must
> start from capital letter" must handle at least Greek and Cyrillic
> letters too.
>
> Actually it is. If you want to stay in the logic way.

Okay, you really want this to be done and you don't want to use a
patched version
of Ruby. (Besides that, even if you'll manage to get this change to the
core,
which is not very likely due to technical reasons, you'll have to wait
a
point release--a year or something like that.)

As a Russian developer, I think the idea is stupid, but I'm a bit
curious because
it is, on the other hand, somewhat hard. You can achieve what you want
by using
gem `polyglot' and using some other suffix for your
"Russian-enhancened" files
(like .rbr maybe), or overriding the Kernel#load and checking for a
magic comment
(like # encoding:utf-8; russian-identifiers:true). Then you'll have to
rewrite all
constant accesses with Russian to something with a prefix
(Документ->RДокумент) and
load the modified file. This modification can be done with a regexp,
through I'd
recommend using Ripper.

This is as far as you can go without modifying the interpreter. It's
easy to patch
the constant name verification code, but, as I've already said, the
change won't
appear in mainstream redistrubutables any fast.
Posted by Vladimir Kerimov (quazar)
on 2012-05-01 11:43
So your solution looks like a cry about language is incomplete.
I prefer use global functions with suitable names than create patches or
install additional gems on each server of cloud.
We need complete box version from official site without any unstable
gem.
So all we need is simply standard Ruby interpreter 1.9+ and use it and
be happy.
Now it is partially possible, we use strange methods like this:

def Документ
  RДокумент
end

Looks strange, but work without any patch/gem/whatever.
Here RДокумент is class name, as you can see we have illogical part of
code.
Very bad. But we have no choice.

Thank Matz for so amazing limitation of CONSTANT in UTF-8, cuts off
absolutely all UTF-8 class names over ASCII.

P.S. By the way, irb works absolutely bad on windows console with 
cyrillic names. Take a look on same python interactive mode, it is work 
perfectly on windows console with Russian identifiers and show they 
correctly (irb show ????? symbols). Both irb and python use interpreter 
based on UTF-8.
Posted by Ryan Davis (Guest)
on 2012-05-01 12:26
(Received via mailing list)
On May 1, 2012, at 2:44, Vladimir Kerimov <lists@ruby-forum.com> wrote:

> Thank Matz for so amazing limitation of CONSTANT in UTF-8, cuts off
> absolutely all UTF-8 class names over ASCII.

Patches welcome. Put up or shut up.
Posted by Eric Christopherson (echristopherson)
on 2012-05-03 02:32
(Received via mailing list)
Maybe Ruby identifiers should be more like Java's:

<http://stackoverflow.com/questions/4838507/why-doe...
Posted by Marc Heiler (shevegen)
on 2012-05-03 04:48
Actually he has one point.

If the internationalization was really needed then the first character 
should be not limited to an ASCII-check alone.

But then again, I dont use UTF myself so I dont care about class 
INSERT_FUNNY_RUSSIAN_CHARACTERS_HERE.

And, I have to say, Vladimir sounds a lot like a troll. Like Ilias.

I mean really, to use XML and then whine about limitations of ruby?

I doubt he even uses python lol.
Posted by Vladimir Kerimov (quazar)
on 2012-05-03 10:29
Actually I am a C++ developer, but sometimes uses Python.
I don't cry, "give me something like: a = [x*x for x in values]"
All I want is to know "WHY?!" and ask when Matz planning to fix it.
Finally:
1. Why you limit FIRST character of class by A-Z for UTF-8 oriented 
language?
2. When you planning to fix it for Cyrillic and Greek class names?

Answers like "patch yourself" are bad, I am not pure Ruby developer, and 
my patch could be of bad quality. For Ruby I just create suitable 
binding for company uses Ruby and our platform written on C++. All I 
need is to generate set of classes with Cyrillic names. Currently I use 
methods which hides classes with inproper names, just because method 
names aren't limited.
Posted by Jan E. (jacques1)
on 2012-05-03 13:19
Vladimir Kerimov wrote in post #1059387:
> Actually I am a C++ developer, but sometimes uses Python.
> I don't cry, "give me something like: a = [x*x for x in values]"
> All I want is to know "WHY?!" and ask when Matz planning to fix it.
> Finally:
> 1. Why you limit FIRST character of class by A-Z for UTF-8 oriented
> language?
> 2. When you planning to fix it for Cyrillic and Greek class names?

Is it really so hard to understand? Do we really have to repeat what we 
already said several times before?

Sorry, but I have to agree with Marc that you begin to sound like a 
troll. Nobody is *that* ignorant.

You've got the wrong guys, anyway. We don't make the decisions.
Posted by Vladimir Kerimov (quazar)
on 2012-05-12 12:54
Well, thank you all anyway.
Posted by Звонко Илић (zilic)
on 2013-02-12 08:52
Marc Heiler wrote in post #1059352:
> Actually he has one point.
>
> If the internationalization was really needed then the first character
> should be not limited to an ASCII-check alone.
>
> But then again, I dont use UTF myself so I dont care about class
> INSERT_FUNNY_RUSSIAN_CHARACTERS_HERE.
>
> And, I have to say, Vladimir sounds a lot like a troll. Like Ilias.
>
> I mean really, to use XML and then whine about limitations of ruby?
>
> I doubt he even uses python lol.


And class like this was impossible, too.

# encoding: utf8
class Über_Alles

Is this FUNNY_RUSSIAN_CHARACTERS ???
I hope that you will think again about Vladimir(Владимир) requirement!

There was more languages with non-ascii characters!


Поздрав свима!
Regards to all :)
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.