French sentences appearing weird in Rails Website

I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il

au Cameroon?"\nEnglish: 3. How many regions are there in Cameroon?\n"

Can someone assist please?

I am thinking on following lines:

2. str = str.gsub('"', '')

3. **Need to add a line which replaces \\ in the str above to just

**

4. str = str.force_encoding("iso-8859-1")

5. str = str.encode('UTF-8')

In step 3, I was thinking of something like

str = str.gsub(/\\\\/, "\\")

OR somehow if possible push output of puts or a similar function back
to str example:

> puts str

---

French: 3. Combien de r\xC3\xA9gions y a-t-il au Cameroon?

English: 3. How many regions are there in Cameroon?

but even that works. Can someone please assist?

On Wed, May 15, 2013 at 6:30 AM, UA [email protected] wrote:

I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il

au Cameroon?"\nEnglish: 3. How many regions are there in Cameroon?\n"

Can someone assist please?

Wow, this took a while to suss out. I really hate character encodings
and translations, but here we are.

So, the problem basically lies in the fact that the encoded character
is doubly escaped:

irb(main):159:0> ‘\xC3\xA9’
=> “\xC3\xA9”

whereas the other characters are escaped just once:

irb(main):160:0> “\n”
=> “\n”
irb(main):161:0> “"”
=> “"”

what I came up with seems sort of kludgy:

  1. Double escape the singly-escaped characters:

irb(main):166:0> new_str = str.gsub(/"/,‘\"’).gsub(/\n/,‘\n’)
=> “— \nFrench: \"3. Combien de r\xC3\xA9gions y a-t-il au
Cameroon?\"\nEnglish: 3. How many regions are there in
Cameroon?\n”

  1. Run it through an eval:

irb(main):167:0> eval “new_str = "#{new_str}"”
=> “— \nFrench: "3. Combien de rgions y a-t-il au
Cameroon?"\nEnglish: 3. How many regions are there in Cameroon?\n”

On Wednesday, 15 May 2013 07:30:14 UTC-4, UA wrote:

I have a Rails app. One of my clients is importing French Text which
is appearing weirdly. Check below example:

1. str = "--- \nFrench: \"3. Combien de r\\xC3\\xA9gions y a-t-il

au Cameroon?"\nEnglish: 3. How many regions are there in Cameroon?\n"

Can someone assist please?

Where is this text coming from? Because that string looks like YAML,
complete with the opening “—”. \xC3\xA9 is the UTF-8 encoding of
codepoint U+00E9, “small letter e with acute”, something you’d expect in
French text.

If you do YAML.load(str) in 1.9 or higher, this is what appears:

irb: YAML.load(str)
===> {“French”=>“3. Combien de rgions y a-t-il au Cameroon?”,
“English”=>“3. How many regions are there in Cameroon?”}

–Matt J.

On May 16, 2013 8:23 AM, “Matt J.” [email protected] wrote:

Can someone assist please?

Where is this text coming from? Because that string looks like YAML,
complete with the opening “—”. \xC3\xA9 is the UTF-8 encoding of
codepoint U+00E9, “small letter e with acute”, something you’d expect in
French text.

If you do YAML.load(str) in 1.9 or higher, this is what appears:

irb: YAML.load(str)
===> {“French”=>“3. Combien de rgions y a-t-il au Cameroon?”,
“English”=>“3. How many regions are there in Cameroon?”}

–Matt J.

That’s what I thought originally, too.

When I copied the OP’s string as written, and fed it to YAML.load, it
flubbed the translation, reversing thebyte order. As far as I can tell,
I
have UTF-8 set everywhere.

So I’m not sure why it works for you but not for me…