Hello, i’m a newbie. I have a question that how can i get the first
letter of this string: “á is the first letter”.
my solution:
string = “á is the first letter”
str = string[0…1] # [0…1] because á = 2byte
But in many case i don’t know the size of the first letter.
Any help would be much appreciated.
You can use the /u flag for a regex to signal utf-8 character matching,
rather than singly byte matchin…but you didn’t mention what encoding
your strings are in. The regex would look like this:
encoding: utf-8
“á is the first letter” =~ /^(.)/u
puts $1
–output:–
á
The first line is so that you can include utf-8 characters in your
source code if that is required.
You also didn’t mention what version of ruby you are using…ruby 1.8 or
1.9.
irb(main):023:0> bacon=“chunky”
=> “chunky”
irb(main):024:0> bacon[0]
=> 99
irb(main):025:0> bacon[0].chr
=> “c”
99 is the ascii number that represents the letter ‘c’.
I think that explains what you want.
That won’t work on Ruby 1.9, as bacon[0] will return the first actual
character. This technique
works on both:
bacon = “chunky”
=> “chunky”
bacon[0,1]
=> “c”
Michael E.
[email protected]
http://carboni.ca/
HI, there might be no easy way to work this around.take a look at ASCII
table and unicode table, try to write a map function yourself
thanks for your help!
My strings are in UTF-8 encoding. Ruby version 1.8.7
I just follow your instructions and got the first letter of that string.
But i still not get the remainder of string after get the first letter.
My problem now is convert the UTF-8 string to ansi string.
ex: purpose: change “âêơíũ” to “aeoiu”.
I really confuse and don’t have any way to do that.
Regards,
duc nguyen wrote in post #985120:
thanks for your help!
My strings are in UTF-8 encoding. Ruby version 1.8.7
I just follow your instructions and got the first letter of that string.
But i still not get the remainder of string after get the first letter.
encoding:utf-8
“á is the first letter” =~ /^(.)(.*)/u
puts “–>#{$1}<–”
puts “–>#{$2}<–”
–output:–
–>á<–
–> is the first letter<–
My problem now is convert the UTF-8 string to ansi string.
ex: purpose: change “âêơíũ” to “aeoiu”.
I really confuse and don’t have any way to do that.
encoding:utf-8
“â is the first letter” =~ /^(.)(.*)/u
conversion_hash = {
‘â’ => ‘a’,
‘ê’ => ‘e’,
‘ơ’ => ‘o’,
‘í’ => ‘i’,
‘ũ’ => ‘u’
}
puts “#{conversion_hash[$1]}#{$2}”
–output:–
a is the first letter
thanks for all!
I tried all instructions above and they work fine when i run from
command line. But when i insert them to my .rb file in my project and
debug, An error has happen, and i don’t know why
“á is the first letter” =~ /^(.)(.*)/u
puts “–>#{$1}<–”
puts “–>#{$2}<–”
–output in command prompt window:–
–>á<–
–> is the first letter<–
But in the log file of my project
I 03/04/2011 09:07:39:169 b0353000 APP| -->√<–
I 03/04/2011 09:07:39:169 b0353000 APP| -->° is the first letter<–
Again, i change the code:
“á is the first letter” =~ /^(…)(.*)/u
puts “–>#{$1}<–”
puts “–>#{$2}<–”
And now it works.
I 03/04/2011 09:11:03:414 b03d5000 APP| -->á<–
I 03/04/2011 09:11:03:414 b03d5000 APP| --> is the first letter<–
My project is a rhodes project. Rhodes version 2.2.6. Ruby 1.8.7.
Wow…that’s strange: when I run my example, no log file is created. I
wonder why that is???
Somewhere you are running some extra code that writes to a file,
and you want us to troubleshoot that code without seeing it? O…kay.
The logging code gets sent the string and the regex, but the logging
code ignores the /u flag, hence the logging code matches single bytes
rather than characters.
On Thu, Mar 3, 2011 at 5:32 AM, Thiago M. [email protected] wrote:
On Thu, Mar 3, 2011 at 5:54 AM, Michael E. [email protected] wrote:
That won’t work on Ruby 1.9, as bacon[0] will return the first actual
character. This technique
works on both:
bacon = “chunky”
=> “chunky”
bacon[0,1]
=> “c”
This will not work on Ruby 1.8, because the character involved is a
two-byte
character, and before 1.9 there’s nothing but bytes.
s = “á”
s.inspect #=> => “\303\241”
s[0] #=> “\303”
s[0, 1] #=> “\303”
s[0].chr #=> “\303”
Since you’re on 1.8.7, you could use
$KCODE = “U” # UTF-8
s.chars.first #=> “á”
Note that if you do not set $KCODE, you will just get “\303”.