Splitting a string into characters - not bytes

okkezSS · November 4, 2010, 11:54am

I realise that this is probably a known thing but my google-fu is not
working today.

I am learning Japanese and I wanted to write an application (probably
a web app) that I can paste some Japanese into and highlight the
characters that I should know. To do this I need to take a string of
Japanese, such as “$BL$,3P$a$F$7$^$C$?!#L2$$$N$K!D!#(B” and break it
down into
characters, that is this string contains 16 characters and a space
(despite being 49 bytes long).

Of course some strings also contain latin characters mixed in, such as
“$B26$N2G$H2qOC#2(B | $B$”$M$3(B".

My home-brew solutions always seem to screw up, but then I thought.
This is Ruby! Why the heck am I reinventing the wheel?

So can someone clue me in on how to take a string of Japanese and
break it up into characters?

Please

tekhne · November 4, 2010, 12:00pm

2010/11/4 Peter H. [email protected]:

So can someone clue me in on how to take a string of Japanese and
break it up into characters?

“俺の嫁と会話２ | あねこ”.scan(/./u)
=> [“俺”, “の”, “嫁”, “と”, “会”, “話”, “２”, " ", “|”, " ", “あ”, “ね”, “こ”]

Regards,
Ammar

tekhne · November 4, 2010, 12:03pm

Thank you thank you thank you

Damn that was a simple fix, not to mention a very fast response

Thank you again

tekhne · November 4, 2010, 12:07pm

On Thu, Nov 4, 2010 at 1:02 PM, Peter H.
[email protected] wrote:

Thank you thank you thank you

Damn that was a simple fix, not to mention a very fast response

Thank you again

I was just reminded of that simple solution myself yesterday on this
very list.

Cheers,
Ammar

tekhne · November 4, 2010, 5:43pm

If you want the array you can do str.chars.to_a.

tekhne · November 4, 2010, 5:58pm

On Thu, Nov 4, 2010 at 6:42 PM, Adam P. [email protected]
wrote:

If you want the array you can do str.chars.to_a.

It’s worth noting that String#chars is not available in all versions of
ruby:

RUBY_VERSION
=> “1.9.2”
“str”.chars
=> #<Enumerator: “str”:chars>

RUBY_VERSION
=> “1.8.7”
“str”.chars
=> #Enumerable::Enumerator:0x3e3a0

RUBY_VERSION
=> “1.8.6”
“str”.chars
NoMethodError: undefined method `chars’ for “str”:String
from (irb):2

Regards,
Ammar

tekhne · November 4, 2010, 12:04pm

On Thursday 04 November 2010, Peter H. wrote:

|Of course some strings also contain latin characters mixed in, such as
|"$B26$N2G$H2qOC#2(B | $B$"$M$3(B".
|
|My home-brew solutions always seem to screw up, but then I thought.
|This is Ruby! Why the heck am I reinventing the wheel?
|
|So can someone clue me in on how to take a string of Japanese and
|break it up into characters?
|
|Please

What version of ruby are you using? In ruby 1.9, you can simply use
String#each_char:

“$BL$,3P$a$F$7$^$C$?!#L2$$$N$K!D!#(B”.each_char{|c| puts c}
=>
$BL(B
$B$,(B
$B3P(B
$B$a(B
$B$F(B
$B$7(B
$B$^(B
$B$C(B
$B$?(B
$B!#(B
$BL2(B
$B$$(B
$B$N(B
$B$K(B
$B!D(B
$B!#(B

You may need to set the encoding appropriately, however.

I don’t know how you’d do it in ruby 1.8.

I hope this helps

Stefano