How to use rb_enc_str_new() to create a String with UTF-8 encoding?

Hi, when I create a Ruby String from a C extension by using
“rb_str_new(s,
len)” I get a String with US-ASCII encoding.

I don’t want to call later String#force_encoding(:“UTF-8”) but instead
use the
rb_enc_str_new() function in string.c:

VALUE
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
{
VALUE str = rb_str_new(ptr, len);
rb_enc_associate(str, enc);
return str;
}

But I have no idea on how to set ‘enc’ parameter to be “UTF-8”.
How should I fill the third ‘enc’ argument?

Thanks a lot.

Iñaki Baz C. wrote:

VALUE
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
{
VALUE str = rb_str_new(ptr, len);
rb_enc_associate(str, enc);
return str;
}

But I have no idea on how to set ‘enc’ parameter to be “UTF-8”.
How should I fill the third ‘enc’ argument?

I’d say give it a pointer to an rb_encoding object.

Have a look in encoding.c, this particular function might be useful:

rb_encoding *
rb_enc_find(const char *name)
{
int idx = rb_enc_find_index(name);
if (idx < 0) idx = 0;
return rb_enc_from_index(idx);
}

El Miércoles, 2 de Diciembre de 2009, Brian C. escribió:

How should I fill the third ‘enc’ argument?
return rb_enc_from_index(idx);
}

Humm, it involves allocating memory for the rb_encoding object and so…
not
so trivial as I desired :slight_smile:
But that’s the way. Thanks a lot.

Iñaki Baz C. wrote:

El Miércoles, 2 de Diciembre de 2009, Brian C. escribió:

How should I fill the third ‘enc’ argument?
return rb_enc_from_index(idx);
}

Humm, it involves allocating memory for the rb_encoding object

Why? AFAICS, you can just pass a pointer to an existing encoding object.
They are not mutated.

There are other examples, e.g. from io.c

#ifdef _WIN32
if (utf16 == (rb_encoding )-1) {
utf16 = rb_enc_find(“UTF-16LE”);
if (utf16 == rb_ascii8bit_encoding())
utf16 = NULL;
}
if (utf16) {
VALUE wfname = rb_str_encode(fname, rb_enc_from_encoding(utf16),
0,
Qnil);
rb_enc_str_buf_cat(wfname, “”, 1, utf16); /
workaround */
data.fname = RSTRING_PTR(wfname);
data.wchar = 1;
}
else {
data.wchar = 0;
}
#endif

It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
object returned from rb_enc_find, and turns it into a VALUE

El Miércoles, 2 de Diciembre de 2009, Brian C. escribió:

They are not mutated.
VALUE wfname = rb_str_encode(fname, rb_enc_from_encoding(utf16),

It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
object returned from rb_enc_find, and turns it into a VALUE

Ok, so the rb_encoding objects already exist and I just must use a point
to
it.
Thanks a lot.