Reding unicode characters?

Hi friends,

Could any one help me in writing a method which reads all Unicode
characters supported in ruby or else using regular expressions.

Thanks in advance,

Regards,
Jose Martin

dare ruby wrote:

Hi friends,

Could any one help me in writing a method which reads all Unicode
characters supported in ruby or else using regular expressions.

Thanks in advance,

Regards,
Jose Martin

Ruby does not support unicode.

On Mar 10, 2008, at 10:00 AM, 7stud – wrote:

Jose Martin

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e ‘p “Résumé”.jsize’
6

James Edward G. II

Is there any possibilities using regular expressions or writing own
methods for unicode charatcers?

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e ‘p “R�sum�”.jsize’
6

James Edward G. II

James G. wrote:

On Mar 10, 2008, at 10:00 AM, 7stud – wrote:

Jose Martin

Ruby does not support unicode.

Really?

$ ruby -KU -r jcode -e ‘p “Résumé”.jsize’
6

James Edward G. II

How does that prove the ruby supports unicode? Where are there any
unicode characters in your string?

7stud – wrote:

James G. wrote:

[…]
$ ruby -KU -r jcode -e ‘p “R�sum�”.jsize’
6

James Edward G. II

How does that prove the ruby supports unicode? Where are there any
unicode characters in your string?

1/ There’s a difference between codepoints and characters, speaking of
unicode “characters” is confusing at best.

2/ “Supporting unicode” is probably meaningless (which unicode encoding
by the way?), building UTF-8 applications in Ruby is perfectly doable
thanks to jcode, regex UTF-8 support, … I know, among other things
it’s what I built my company on.

The example above obviously assumes an UTF-8 locale in the terminal you
type it…
For more data, just try size instead of jsize in the same example and
read jcode’s rdoc.

Lionel

On Mar 10, 2008, at 10:29 PM, 7stud – wrote:

6

James Edward G. II

How does that prove the ruby supports unicode?

If the code was not character aware, it would have returned a count of
the bytes in the String (more than six). String#size, for example.

Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that’s why you see the -
KU switch to tell Ruby the encoding.

James Edward G. II

On Tue, Mar 11, 2008 at 7:49 AM, James G. [email protected]
wrote:

James Edward G. II
I think this may have been discussed before, but -KU doesn’t work for
me on Windows XP. I get an unterminated string error with the
“Résumé” UTF-8 encoded string. I can only assume that the parser is
still interpreting the string as one byte per character. Anyone have
any ideas?

Todd

Hi,

In message “Re: Reding unicode characters?”
on Tue, 11 Mar 2008 12:29:58 +0900, 7stud –
[email protected] writes:

|How does that prove the ruby supports unicode? Where are there any
|unicode characters in your string?

Then, tell me what makes you think it’s proven.

          matz.

Todd B. wrote:
On Tue, Mar 11, 2008 at 7:49 AM, James G. [email protected]
wrote:

James Edward G. II
I think this may have been discussed before, but -KU doesn’t work for
me on Windows XP. I get an unterminated string error with the
“R�sum�” UTF-8 encoded string. I can only assume that the parser is
still interpreting the string as one byte per character. Anyone have
any ideas?

Todd

Maybe try a regex-based UTF-8 hack (Ruby 1.8.6) like here:
http://snippets.dzone.com/posts/show/4527

Cheers,
jk

On Tue, Mar 11, 2008 at 11:49 AM, Jimmy K.
[email protected] wrote:

Maybe try a regex-based UTF-8 hack (Ruby 1.8.6) like here:
http://snippets.dzone.com/posts/show/4527

Cheers,
jk

Thanks for the pointer!

Todd

James G. wrote:

On Mar 10, 2008, at 10:29 PM, 7stud – wrote:

6

James Edward G. II

How does that prove the ruby supports unicode?

If the code was not character aware, it would have returned a count of
the bytes in the String (more than six). String#size, for example.

Where are there any unicode characters in your string?

I entered the accented e characters in UTF-8, that’s why you see the -
KU switch to tell Ruby the encoding.

James Edward G. II

Ahh, I see. You think UTF-8 is unicode. And apparently you think that
when you enter a UTF-8 character in a post that everyone will see the
character you entered.

On Mar 11, 2008, at 6:46 PM, 7stud – wrote:

of
Ahh, I see. You think UTF-8 is unicode.
I this UTF-8 is an encoding of Unicode.

And apparently you think that when you enter a UTF-8 character in a
post that everyone will see the character you entered.

I think I included the -KU switch to show you exactly what was going on.

I also think it was pointless for you to be rude about this, so I
guess you succeeding in proving that what I think doesn’t always matter.

James Edward G. II
D