Am 05.03.2011 18:53, schrieb Thomas L.:
various French accented characters), then have a simple I/O where I test
myself on a conjugation. So I need to be able to read, write, and
pattern match the accented characters.
What do I have to do to make this work?
TPL
Encode your string as UTF-8 and match it against an UTF-8 regexp.
Simplest way to do this is to do something like this:
==============================
#Encoding: UTF-8
variable = “exagérer”
puts “The verb was #{variable}” if variable =~ /érer/
Ensure that your editor saves the file in UTF-8 (some don’t do this by
default, notably Window’s notepad and SciTE).
If you have the verbs in an external file (which I suppose), and that
file is encoded in UTF-8, you can do (assuming that there is one verb
per line):
=================================
#Encoding: UTF-8
verbs = File.readlines(“verbs.txt”)
puts “The verb was #{verbs.first}” if verbs.first =~ /érer/
If the file is in another encoding, e.g. Windows-1252, do
==================================
#Encoding: UTF-8
verbs = File.open(“verbs.txt”, “r:Windows-1252”){|f| f.readlines}
puts “The verb was #{verbs.first}” if verbs.first =~ /érer/
The line saying “#Encoding: UTF-8” is a so-called magic comment that
tells Ruby that it should treat the content of this file as
UTF-8-encoded text. If you leave it out, Ruby assumes your file is
encoded in ASCII-8Bit, which will cause errors as soon as you start to
use characters not defined in ASCII. As an alternative, you may start
Ruby with the -U (capital U) switch, but I didn’t try this.
Read up on String#encode and String#force_encoding if you want to
convert between encodings or change the encoding tag of a string without
actual touching of the data in it.
Since Ruby 1.9, Ruby has quite good support for encodings other than
ASCII.
Just a thought: Is there anything such as Regexp#encode?
Vale,
Marvin