Reding unicode characters?

martin_mercy2001 · March 10, 2008, 4:37am

Hi friends,

Could any one help me in writing a method which reads all Unicode
characters supported in ruby or else using regular expressions.

Thanks in advance,

Regards,
Jose Martin

martin_mercy2001 · March 10, 2008, 4:00pm

dare ruby wrote:

Hi friends,

Could any one help me in writing a method which reads all Unicode
characters supported in ruby or else using regular expressions.

Thanks in advance,

Regards,
Jose Martin

Ruby does not support unicode.

martin_mercy2001 · March 10, 2008, 4:14pm

On Mar 10, 2008, at 10:00 AM, 7stud – wrote:

Jose Martin

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e ‘p “Résumé”.jsize’
6

James Edward G. II

martin_mercy2001 · March 11, 2008, 4:15am

Is there any possibilities using regular expressions or writing own
methods for unicode charatcers?

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e ‘p “Rï¿½sumï¿½”.jsize’
6

James Edward G. II

martin_mercy2001 · March 11, 2008, 4:30am

James G. wrote:

On Mar 10, 2008, at 10:00 AM, 7stud – wrote:

Jose Martin

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e ‘p “Résumé”.jsize’
6

James Edward G. II

How does that prove the ruby supports unicode? Where are there any
unicode characters in your string?

martin_mercy2001 · March 11, 2008, 11:06am

7stud – wrote:

James G. wrote:

[…]
$ ruby -KU -r jcode -e ‘p “Rï¿½sumï¿½”.jsize’
6

James Edward G. II

How does that prove the ruby supports unicode? Where are there any
unicode characters in your string?

1/ There’s a difference between codepoints and characters, speaking of
unicode “characters” is confusing at best.

2/ “Supporting unicode” is probably meaningless (which unicode encoding
by the way?), building UTF-8 applications in Ruby is perfectly doable
thanks to jcode, regex UTF-8 support, … I know, among other things
it’s what I built my company on.

The example above obviously assumes an UTF-8 locale in the terminal you
type it…
For more data, just try size instead of jsize in the same example and
read jcode’s rdoc.

Lionel

martin_mercy2001 · March 11, 2008, 1:50pm

On Mar 10, 2008, at 10:29 PM, 7stud – wrote:

6

James Edward G. II

How does that prove the ruby supports unicode?

If the code was not character aware, it would have returned a count of
the bytes in the String (more than six). String#size, for example.

Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that’s why you see the -
KU switch to tell Ruby the encoding.

James Edward G. II

martin_mercy2001 · March 11, 2008, 5:36pm

On Tue, Mar 11, 2008 at 7:49 AM, James G. [email protected]
wrote:

James Edward G. II
I think this may have been discussed before, but -KU doesn’t work for
me on Windows XP. I get an unterminated string error with the
“Résumé” UTF-8 encoded string. I can only assume that the parser is
still interpreting the string as one byte per character. Anyone have
any ideas?

Todd

martin_mercy2001 · March 11, 2008, 6:13am

Hi,

In message “Re: Reding unicode characters?”
on Tue, 11 Mar 2008 12:29:58 +0900, 7stud –
[email protected] writes:

|How does that prove the ruby supports unicode? Where are there any
|unicode characters in your string?

Then, tell me what makes you think it’s proven.

          matz.

martin_mercy2001 · March 11, 2008, 5:50pm

Todd B. wrote:
On Tue, Mar 11, 2008 at 7:49 AM, James G. [email protected]
wrote:

James Edward G. II
I think this may have been discussed before, but -KU doesn’t work for
me on Windows XP. I get an unterminated string error with the
“Rï¿½sumï¿½” UTF-8 encoded string. I can only assume that the parser is
still interpreting the string as one byte per character. Anyone have
any ideas?

Todd

Maybe try a regex-based UTF-8 hack (Ruby 1.8.6) like here:
http://snippets.dzone.com/posts/show/4527

Cheers,
jk

martin_mercy2001 · March 11, 2008, 6:21pm

On Tue, Mar 11, 2008 at 11:49 AM, Jimmy K.
[email protected] wrote:

Maybe try a regex-based UTF-8 hack (Ruby 1.8.6) like here:
http://snippets.dzone.com/posts/show/4527

Cheers,
jk

Thanks for the pointer!

Todd

martin_mercy2001 · March 12, 2008, 12:44am

James G. wrote:

On Mar 10, 2008, at 10:29 PM, 7stud – wrote:

6

James Edward G. II

How does that prove the ruby supports unicode?

If the code was not character aware, it would have returned a count of
the bytes in the String (more than six). String#size, for example.

Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that’s why you see the -
KU switch to tell Ruby the encoding.

James Edward G. II

Ahh, I see. You think UTF-8 is unicode. And apparently you think that
when you enter a UTF-8 character in a post that everyone will see the
character you entered.

martin_mercy2001 · March 12, 2008, 2:42am

On Mar 11, 2008, at 6:46 PM, 7stud – wrote:

of
Ahh, I see. You think UTF-8 is unicode.
I this UTF-8 is an encoding of Unicode.

And apparently you think that when you enter a UTF-8 character in a
post that everyone will see the character you entered.

I think I included the -KU switch to show you exactly what was going on.

I also think it was pointless for you to be rude about this, so I
guess you succeeding in proving that what I think doesn’t always matter.

James Edward G. II
D