Hi, when I create a Ruby String from a C extension by using
“rb_str_new(s,
len)” I get a String with US-ASCII encoding.
I don’t want to call later String#force_encoding(:“UTF-8”) but instead
use the
rb_enc_str_new() function in string.c:
VALUE
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
{
VALUE str = rb_str_new(ptr, len);
rb_enc_associate(str, enc);
return str;
}
But I have no idea on how to set ‘enc’ parameter to be “UTF-8”.
How should I fill the third ‘enc’ argument?
Thanks a lot.
Iñaki Baz C. wrote:
VALUE
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
{
VALUE str = rb_str_new(ptr, len);
rb_enc_associate(str, enc);
return str;
}
But I have no idea on how to set ‘enc’ parameter to be “UTF-8”.
How should I fill the third ‘enc’ argument?
I’d say give it a pointer to an rb_encoding object.
Have a look in encoding.c, this particular function might be useful:
rb_encoding *
rb_enc_find(const char *name)
{
int idx = rb_enc_find_index(name);
if (idx < 0) idx = 0;
return rb_enc_from_index(idx);
}
El Miércoles, 2 de Diciembre de 2009, Brian C. escribió:
How should I fill the third ‘enc’ argument?
return rb_enc_from_index(idx);
}
Humm, it involves allocating memory for the rb_encoding object and so…
not
so trivial as I desired
But that’s the way. Thanks a lot.
Iñaki Baz C. wrote:
El Miércoles, 2 de Diciembre de 2009, Brian C. escribió:
How should I fill the third ‘enc’ argument?
return rb_enc_from_index(idx);
}
Humm, it involves allocating memory for the rb_encoding object
Why? AFAICS, you can just pass a pointer to an existing encoding object.
They are not mutated.
There are other examples, e.g. from io.c
#ifdef _WIN32
if (utf16 == (rb_encoding )-1) {
utf16 = rb_enc_find(“UTF-16LE”);
if (utf16 == rb_ascii8bit_encoding())
utf16 = NULL;
}
if (utf16) {
VALUE wfname = rb_str_encode(fname, rb_enc_from_encoding(utf16),
0,
Qnil);
rb_enc_str_buf_cat(wfname, “”, 1, utf16); / workaround */
data.fname = RSTRING_PTR(wfname);
data.wchar = 1;
}
else {
data.wchar = 0;
}
#endif
It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
object returned from rb_enc_find, and turns it into a VALUE
El Miércoles, 2 de Diciembre de 2009, Brian C. escribió:
They are not mutated.
VALUE wfname = rb_str_encode(fname, rb_enc_from_encoding(utf16),
It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
object returned from rb_enc_find, and turns it into a VALUE
Ok, so the rb_encoding objects already exist and I just must use a point
to
it.
Thanks a lot.