How can i get the first letter of this string

luislavena · March 3, 2011, 4:41am

Hello, i’m a newbie. I have a question that how can i get the first
letter of this string: “á is the first letter”.
my solution:

string = “á is the first letter”
str = string[0…1] # [0…1] because á = 2byte

But in many case i don’t know the size of the first letter.
Any help would be much appreciated.

nghien_rbc · March 3, 2011, 4:52am

You can use the /u flag for a regex to signal utf-8 character matching,
rather than singly byte matchin…but you didn’t mention what encoding
your strings are in. The regex would look like this:

encoding: utf-8

“á is the first letter” =~ /^(.)/u
puts $1

–output:–
á

The first line is so that you can include utf-8 characters in your
source code if that is required.

You also didn’t mention what version of ruby you are using…ruby 1.8 or
1.9.

nghien_rbc · March 3, 2011, 6:43am

irb(main):023:0> bacon=“chunky”
=> “chunky”
irb(main):024:0> bacon[0]
=> 99
irb(main):025:0> bacon[0].chr
=> “c”

99 is the ascii number that represents the letter ‘c’.

I think that explains what you want.

nghien_rbc · March 3, 2011, 6:54am

That won’t work on Ruby 1.9, as bacon[0] will return the first actual
character. This technique
works on both:

bacon = “chunky”
=> “chunky”
bacon[0,1]
=> “c”

Michael E.
[email protected]
http://carboni.ca/

nghien_rbc · March 3, 2011, 8:59am

HI, there might be no easy way to work this around.take a look at ASCII
table and unicode table, try to write a map function yourself

nghien_rbc · March 3, 2011, 8:32am

thanks for your help!
My strings are in UTF-8 encoding. Ruby version 1.8.7

I just follow your instructions and got the first letter of that string.
But i still not get the remainder of string after get the first letter.

My problem now is convert the UTF-8 string to ansi string.
ex: purpose: change “âêơíũ” to “aeoiu”.
I really confuse and don’t have any way to do that.

Regards,

nghien_rbc · March 4, 2011, 1:17am

duc nguyen wrote in post #985120:

thanks for your help!
My strings are in UTF-8 encoding. Ruby version 1.8.7

I just follow your instructions and got the first letter of that string.
But i still not get the remainder of string after get the first letter.

encoding:utf-8

“á is the first letter” =~ /^(.)(.*)/u

puts “–>#{$1}<–”
puts “–>#{$2}<–”

–output:–
–>á<–
–> is the first letter<–

My problem now is convert the UTF-8 string to ansi string.
ex: purpose: change “âêơíũ” to “aeoiu”.
I really confuse and don’t have any way to do that.

encoding:utf-8

“â is the first letter” =~ /^(.)(.*)/u

conversion_hash = {
‘â’ => ‘a’,
‘ê’ => ‘e’,
‘ơ’ => ‘o’,
‘í’ => ‘i’,
‘ũ’ => ‘u’
}

puts “#{conversion_hash[$1]}#{$2}”

–output:–
a is the first letter

nghien_rbc · March 4, 2011, 3:16am

thanks for all!
I tried all instructions above and they work fine when i run from
command line. But when i insert them to my .rb file in my project and
debug, An error has happen, and i don’t know why

“á is the first letter” =~ /^(.)(.*)/u
puts “–>#{$1}<–”
puts “–>#{$2}<–”

–output in command prompt window:–
–>á<–
–> is the first letter<–

But in the log file of my project
I 03/04/2011 09:07:39:169 b0353000 APP| -->√<–
I 03/04/2011 09:07:39:169 b0353000 APP| -->° is the first letter<–

Again, i change the code:

“á is the first letter” =~ /^(…)(.*)/u
puts “–>#{$1}<–”
puts “–>#{$2}<–”

And now it works.
I 03/04/2011 09:11:03:414 b03d5000 APP| -->á<–
I 03/04/2011 09:11:03:414 b03d5000 APP| --> is the first letter<–

My project is a rhodes project. Rhodes version 2.2.6. Ruby 1.8.7.

nghien_rbc · March 4, 2011, 4:11am

Wow…that’s strange: when I run my example, no log file is created. I
wonder why that is???

Somewhere you are running some extra code that writes to a file,
and you want us to troubleshoot that code without seeing it? O…kay.
The logging code gets sent the string and the regex, but the logging
code ignores the /u flag, hence the logging code matches single bytes
rather than characters.

nghien_rbc · March 3, 2011, 11:29am

On Thu, Mar 3, 2011 at 5:32 AM, Thiago M. [email protected] wrote:

On Thu, Mar 3, 2011 at 5:54 AM, Michael E. [email protected] wrote:

That won’t work on Ruby 1.9, as bacon[0] will return the first actual
character. This technique
works on both:

bacon = “chunky”
=> “chunky”
bacon[0,1]
=> “c”

This will not work on Ruby 1.8, because the character involved is a
two-byte
character, and before 1.9 there’s nothing but bytes.

s = “á”
s.inspect #=> => “\303\241”
s[0] #=> “\303”
s[0, 1] #=> “\303”
s[0].chr #=> “\303”

Since you’re on 1.8.7, you could use

$KCODE = “U” # UTF-8
s.chars.first #=> “á”

Note that if you do not set $KCODE, you will just get “\303”.