Searching for latin/umlaut characters in string

Hi,

Ive read up a bit about uicode strings and the issues with rails after i
tried to save some data which contained an umlaut character. I got the
following error below when searching for a name that contained the
character é.

Unclosed quotation mark after the character string ')

Is there anyway to search a string for these types of characters and
then ill decide what to do later??

Some sort of string method like scan or include??

JB

On Thu, May 1, 2008 at 10:45 PM, John B. [email protected]
wrote:

Ive read up a bit about uicode strings and the issues with rails after i
tried to save some data which contained an umlaut character. I got the
following error below when searching for a name that contained the
character é.

Unclosed quotation mark after the character string ')

Sounds like an encoding issue with the source code itself. Are you
writing that “é” in the program? If yes, is your editor configured to
write UTF8?

Xavier N. wrote:

On Thu, May 1, 2008 at 10:45 PM, John B. [email protected]
wrote:

Ive read up a bit about uicode strings and the issues with rails after i
tried to save some data which contained an umlaut character. I got the
following error below when searching for a name that contained the
character é.

Unclosed quotation mark after the character string ')

Sounds like an encoding issue with the source code itself. Are you
writing that “é” in the program? If yes, is your editor configured to
write UTF8?

No the string is coming in via a csv file.

John B. wrote:

Xavier N. wrote:

On Thu, May 1, 2008 at 10:45 PM, John B. [email protected]
wrote:

Ive read up a bit about uicode strings and the issues with rails after i
tried to save some data which contained an umlaut character. I got the
following error below when searching for a name that contained the
character é.

Unclosed quotation mark after the character string ')

Sounds like an encoding issue with the source code itself. Are you
writing that “é” in the program? If yes, is your editor configured to
write UTF8?

No the string is coming in via a csv file.

str = “My cafe\xcc\x81 is good.”
puts str

if str.include?(“cafe\xcc\x81”)
puts “yes”
else
puts “no”
end

Xavier N. wrote:

On Fri, May 2, 2008 at 10:17 AM, John B. [email protected]
wrote:

Xavier N. wrote:

Sounds like an encoding issue with the source code itself. Are you
writing that “é” in the program? If yes, is your editor configured to
write UTF8?

No the string is coming in via a csv file.

Is the CSV file in UTF8?

Its Ansii as far as i can tell. Its actually coming from an excel
spread sheet and the user then “saves as” comma separted csv. Then its
import to the application wher i am running into the problem.

On Fri, May 2, 2008 at 10:17 AM, John B. [email protected]
wrote:

Xavier N. wrote:

Sounds like an encoding issue with the source code itself. Are you
writing that “é” in the program? If yes, is your editor configured to
write UTF8?

No the string is coming in via a csv file.

Is the CSV file in UTF8?

On Fri, May 2, 2008 at 2:52 PM, John B. [email protected]
wrote:

No the string is coming in via a csv file.

Is the CSV file in UTF8?

Its Ansii as far as i can tell. Its actually coming from an excel
spread sheet and the user then “saves as” comma separted csv. Then its
import to the application wher i am running into the problem.

If the CSV is ANSII, and the Ruby source code is ANSII, where is the
“é” coming from? :slight_smile:

Xavier N. wrote:

On Fri, May 2, 2008 at 2:52 PM, John B. [email protected]
wrote:

No the string is coming in via a csv file.

Is the CSV file in UTF8?

Its Ansii as far as i can tell. Its actually coming from an excel
spread sheet and the user then “saves as” comma separted csv. Then its
import to the application wher i am running into the problem.

If the CSV is ANSII, and the Ruby source code is ANSII, where is the
“é” coming from? :slight_smile:

Here is an example of the csv file with the word below and how it is
displayed. They are identical. It looks like the file is utf-8 after
all.

Müller

On Fri, May 2, 2008 at 4:38 PM, John B. [email protected]
wrote:

spread sheet and the user then “saves as” comma separted csv. Then its
import to the application wher i am running into the problem.

If the CSV is ANSII, and the Ruby source code is ANSII, where is the
“é” coming from? :slight_smile:

Here is an example of the csv file with the word below and how it is
displayed. They are identical. It looks like the file is utf-8 after
all.

No it is iso-8859-1. In that CSV “ü” is a single byte instead of two:

fxn@feynman:~/tmp$ od -a testcsv.csv
0000000 M ? l l e r cr nl
0000010
fxn@feynman:~/tmp$ od -a testcsv_utf8.csv
0000000 M ? ? l l e r
0000007

You’d need to fetch data from the CSV, change its encoding in memory
with iconv, and work with the result.

In general, when doing I/O like that you need to control every
involved encoding to ensure everything matches.