Wcswidth, ruby 1.9, and string encodings

#1

Hello all,

I need some advice about how to use wcswidth on Ruby 1.9 strings.
Wcswidth returns the column width of a string, i.e. how much of the
horizontal space on the screen a string takes up when printed to a
terminal. Being able to calculate column width is crucial for any
console-based program that deals with non-ASCII characters. For example,
Chinese characters typically take up two columns.

So, I’d like to write a gem that provides wcswidth to Ruby. I have a
simple version that works:

https://gist.github.com/729d17559ac523fabf39

Unfortunately, this code requires that the string’s encoding match your
LC_CTYPE, because mbstowcs relies on LC_CTYPE to produce wchar_t’s. If
those don’t match, you’ll either get nothing, or, possibly, a wrong
answer.

So, is there a mapping between Ruby string encodings and LC_CTYPE
values? If so, I could at least check that the conversion is done
correctly (or possibly even skip the mbstowcs step, if I know what
wchar_t is and what encoding corresponds to it–of course what wchar_t
means is system-dependent.)

Or is there a better way to do this than using mbstowcs?

Or heck, is there already a way of getting string column width in Ruby?
It seems like there should be, but I don’t see one.

Thanks!

#2

2010/5/10 William M. removed_email_address@domain.invalid:

I need some advice about how to use wcswidth on Ruby 1.9 strings.
Wcswidth returns the column width of a string, i.e. how much of the
horizontal space on the screen a string takes up when printed to a
terminal. Being able to calculate column width is crucial for any
console-based program that deals with non-ASCII characters. For example,
Chinese characters typically take up two columns.

I wrote such code in my terminfo binding.
http://github.com/akr/ruby-terminfo

Unfortunately, this code requires that the string’s encoding match your
LC_CTYPE, because mbstowcs relies on LC_CTYPE to produce wchar_t’s. If
those don’t match, you’ll either get nothing, or, possibly, a wrong
answer.

I used rb_locale_encoding() to convert to LC_CTYPE as follows:

str = rb_str_encode(str, rb_enc_from_encoding(rb_locale_encoding()),
0, Qnil);

#3

Reformatted excerpts from Tanaka A.'s message of 2010-05-09:

I wrote such code in my terminfo binding.
http://github.com/akr/ruby-terminfo

Very nice. It works when I use your git version. But the latest gem
(v0.1.1) doesn’t seem to have wcswidth.

I used rb_locale_encoding() to convert to LC_CTYPE as follows:

str = rb_str_encode(str, rb_enc_from_encoding(rb_locale_encoding()), 0, Qnil);

Perfect, that’s what I was missing.

I need one other bit of functionality too: the ability to get a
substring of a specific display width. I assume that is beyond the scope
of your terminfo package, so I am probably going to wrap the two methods
up into a gem. But if you’re planning on adding such a thing to
ruby-terminfo, let me know, and I will just use that instead.

Thanks for your help!

#4

Reformatted excerpts from William M.'s message of 2010-05-10:

I need one other bit of functionality too: the ability to get a
substring of a specific display width. I assume that is beyond the
scope of your terminfo package, so I am probably going to wrap the two
methods up into a gem.

I’ve finally pulled these together into a gem:

http://rubygems.org/gems/console

If anyone else is interested in writing i18n-capable console
applications, this should be pretty useful for you. Unfortunately I
haven’t quite gotten it to work with Ruby 1.8. Any help on that front
would be appreciated.

Blog post with a little more background:
http://all-thing.net/string-width