Hi,
I need to count the number of characters entered, as example I used the
"clipboard.rb" that comes in gtk-demo
If we introduce utf-8 characters (for example the character ¿) in the
first "Gtk::Entry", to paste the contents in the second "Gtk::Entry" and
look what the encoding always see that it is ASCII-8BIT, if we count the
number of characters in the string, ruby us back in this case 2.
Contrary if to paste content we make a "force_encoding("utf-8")" and
recounting again, ruby return 1.
The question: is there any parameter when we read the contents of
"Gtk::Entry" so that by default will not detect the characters as
ASCII-8BIT?
Thanks
Rafael
on 2010-02-11 09:34
on 2010-02-11 11:24
If I remember correctly Ruby 1.8 stores strings as bytes. If you
have a multibyte encoding like Unicode, the size isn't counted
as characters. It is counted as bytes. So when you store
something like ¿ it takes 2 bytes and the length is 2.
In order to count the characters you need to know the encoding
of the string. There several different ways to find the length
using different classes (all of which I have forgotten about -- sorry).
But usually you want to index into the string by character.
You can't really do this with Unicode strings. Instead I usually
split the string into an array like this:
TO_A_RE = Regexp.new('\s*',nil,'U')
string = "ã“ã‚“ã«ã¡ã‚"
a = string.split(TO_A_RE)
size = a.size
The TO_A_RE regexp is set for unicode (the 'U' option). You
can set it to other values for other encodings.
Now why is your GTK:Entry getting 8bit ASCII characters?
My guess is that the locale you are using is not unicode,
but actually 8bit ASCII. I don't know what platform you
are using, but if you are on Linux you could use
es_ES.utf8 (guessing that you are spanish???). That
should fix the problem.
Note that Ruby 1.9 is completely different. If I understand
correctly, the encoding is stored with the string and you
can actually index characters in multi-byte strings. But
I haven't used it yet.
I hope that helps. Using different string encodings in
Ruby 1.8 is considerably harder than it should be.
MikeC
on 2010-02-11 11:46
Sorry for not giving enough information about my environment: --- ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] LANG=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 --- The characters display correctly in GTK application, but the problem is to check with Ruby which is the encoding of the string: string = "¿" string.encoding -> this always returns ASCII-8BIT instead of utf-8 If we count the characters: string.length -> 2 but should be 1
on 2010-02-11 12:55
On 11 February 2010 19:46, Rg Rg <ruby-forum-incoming@andreas-s.net> wrote: > The characters display correctly in GTK application, but the problem is > to check with Ruby which is the encoding of the string: >  string = "¿" >  string.encoding -> this always returns ASCII-8BIT instead of utf-8 > If we count the characters: >  string.length -> 2 but should be 1 Ah... It looks like the encoding coding is being detected wrong. This will be a Ruby 1.9 problem, not a GTK problem. I'm afraid I don't know enough to help... MikeC
on 2010-02-11 14:00
2010/2/11 Rg Rg <ruby-forum-incoming@andreas-s.net>: > Sorry for not giving enough information about my environment: > --- >  ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] >  LANG=es_ES.UTF-8 >  LC_COLLATE=es_ES.UTF-8 > --- > I don't have this behavior. #$ ruby -v ruby 1.9.1p376 (2009-12-07 revision 26041) [x86_64-linux] #$ cat encoding.rb str = "É" puts str.encoding puts str.size puts str.length #$ ruby encoding.rb encoding.rb:1: invalid multibyte char (US-ASCII) encoding.rb:1: invalid multibyte char (US-ASCII) #$ ruby -Ku encoding.rb UTF-8 1 1 #$ If you don't want to specify -Ku, you can add encoding: utf-8 to the first two lines, see http://blog.nuclearsquid.com/writings/ruby-1-9-encodings What you describe is the behavior of 1.8, not 1.9. Double check that.
on 2010-02-11 16:33
Simon Arnaud wrote: > 2010/2/11 Rg Rg <ruby-forum-incoming@andreas-s.net>: >> Sorry for not giving enough information about my environment: >> --- >>  ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux] >>  LANG=es_ES.UTF-8 >>  LC_COLLATE=es_ES.UTF-8 >> --- >> > > I don't have this behavior. > > #$ ruby -v > ruby 1.9.1p376 (2009-12-07 revision 26041) [x86_64-linux] > #$ cat encoding.rb > str = "É" > puts str.encoding > puts str.size > puts str.length > #$ ruby encoding.rb > encoding.rb:1: invalid multibyte char (US-ASCII) > encoding.rb:1: invalid multibyte char (US-ASCII) > #$ ruby -Ku encoding.rb > UTF-8 > 1 > 1 > #$ > > If you don't want to specify -Ku, you can add encoding: utf-8 to the > first two lines, see > http://blog.nuclearsquid.com/writings/ruby-1-9-encodings > > What you describe is the behavior of 1.8, not 1.9. Double check that. I think that I haven't explained well, this behavior only happens to me when reading the chain of GTK, if I do the same example that you the behavior is the same as you. Taking as example the "./Gtk-demo/clipboard.rb" in the following section: button.signal_connect('clicked', entry) do |w, e| clipboard = e.get_clipboard(Gdk::Selection::CLIPBOARD) clipboard.request_text do |board, text, data| e.text = text # The text utf-8 is displayed correctly end end If we change the above code: "e.text = text" will exchange it for "e.text = text.encoding.to_s", this result in -> ASCII-8BIT and if we show the length of the string the result is incorrect because it treats the string as utf-8
on 2010-02-12 10:46
2010/2/11 Rg Rg <ruby-forum-incoming@andreas-s.net>: > I think that I haven't explained well, this behavior only happens to me > when reading the chain of GTK, if I do the same example that you the > behavior is the same as you. Ah, yes, I tested a little with a GTK::Entry and it gives back a string considered ASCII 8BIT. Definately a bug somewhere, but dunno if it is ruby or ruby-gnome2. I tried to look at the sources, but didn't understand where the getter was defined. Let's hope Kou can have a look at it, or give directions where to look. Simon
on 2010-02-12 13:51
Hi, In <78f5e3ec1002120140r2c429a93sdb65298fef67aea5@mail.gmail.com> "Re: [ruby-gnome2-devel-en] Clipboard problem with utf-8 and ascii-8bit" on Fri, 12 Feb 2010 10:40:10 +0100, Simon Arnaud <mazwak@gmail.com> wrote: > I tried to look at the sources, but didn't understand where the getter > was defined. > > Let's hope Kou can have a look at it, or give directions where to look. Gtk::Clipbord#request_text passes UTF-8 encoding text to callback in trunk. We need more work around encoding. e.g. we should set UTF-8 encoding to a text returned by Gtk::Entry#text. I've add RBG_STRING_SET_UTF8_ENCODING() macro. Could someone UTF-8 encoding set work in trunk? If someone sends a patch, please someone reviews and commit it into trunk. Thanks, -- kou
on 2010-02-15 12:53
Kouhei Sutou wrote: > Hi, > > In <78f5e3ec1002120140r2c429a93sdb65298fef67aea5@mail.gmail.com> > "Re: [ruby-gnome2-devel-en] Clipboard problem with utf-8 and > ascii-8bit" on Fri, 12 Feb 2010 10:40:10 +0100, > Simon Arnaud <mazwak@gmail.com> wrote: > >> I tried to look at the sources, but didn't understand where the getter >> was defined. >> >> Let's hope Kou can have a look at it, or give directions where to look. > > Gtk::Clipbord#request_text passes UTF-8 encoding text to > callback in trunk. We need more work around > encoding. e.g. we should set UTF-8 encoding to a text > returned by Gtk::Entry#text. > > I've add RBG_STRING_SET_UTF8_ENCODING() macro. Could someone > UTF-8 encoding set work in trunk? If someone sends a patch, > please someone reviews and commit it into trunk. > > Thanks, > -- > kou As I need to work in "GTK::Entry#text" I have patched the file rbgtkentry.c to use your macro. I haven't read much code ruby-gnome, so the patch is probably not quite correct, but at least it work for me. ------------------------------------- --- gtk/src/rbgtkentry.c.old 2010-02-15 12:32:01.000000000 +0100 +++ gtk/src/rbgtkentry.c 2010-02-15 12:31:29.000000000 +0100 @@ -135,6 +135,18 @@ } #endif +static VALUE +entry_request_text(self) + VALUE self; +{ + VALUE vtext = Qnil; + const gchar *text; + text = gtk_entry_get_text(_SELF(self)); + vtext = CSTR2RVAL(text); + RBG_STRING_SET_UTF8_ENCODING(vtext) ; + return vtext; +} + void Init_gtk_entry() { @@ -149,6 +161,7 @@ #endif rb_define_method(gEntry, "layout_index_to_text_index", entry_layout_index_to_text_index, 1); rb_define_method(gEntry, "text_index_to_layout_index", entry_text_index_to_layout_index, 1); + rb_define_method(gEntry, "text", entry_request_text, 0); #if GTK_CHECK_VERSION(2, 12, 0) rb_define_method(gEntry, "cursor_hadjustment", ------------------------------------- Thanks
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.