Encoding issue (I think): All apostrophes have changed to these odd characters after a database swit

I’m wondering if anyone can give any insight into how I could resolve
the problem on this website:

basically, all the ’ are supposed to be apostrophes ( ’ ), and
quotes are messed up too…

Is it possible to run some command in the “rails console production”
to fix this?

Really appreciate any help!!

So… the old database used what type of encoding? latin1? And the
new one uses utf8?

Is it a problem if some articles were already cleaned by doing a
search and replace, e.g. swapping all ’ for its corresponding proper
symbol?

Really appreciate the help!

On May 29, 2:19pm, Frederick C. [email protected]

On May 31, 1:01pm, daze [email protected] wrote:

So… the old database used what type of encoding? latin1? And the
new one uses utf8?

It’s your database, you tell me!

Is it a problem if some articles were already cleaned by doing a
search and replace, e.g. swapping all ’ for its corresponding proper
symbol?

Depends. if you have replaced with pure ascii then it’s not a problem.
if not (ie for a given column and table you have a mix of encodings)
then you will have made things worse.

Fred

On May 31, 9:41am, Frederick C. [email protected]
wrote:

Is it a problem if some articles were already cleaned by doing a
search and replace, e.g. swapping all ’ for its corresponding proper
symbol?

Depends. if you have replaced with pure ascii then it’s not a problem.
if not (ie for a given column and table you have a mix of encodings)
then you will have made things worse.

Oh… thank you for answering me! Basically, some people ran search-
and-replaces for these:

"| replaced with :
– replaced with -
’ replaced with ’
“ replaced with "
replaced with "

I apologize for my lack of expertise, but have things been replaced
“with pure ascii”? (What exactly is that…?)

One other thing:
Does the encoding or interpretation of encoding vary from browser to
browser? See, I went to school and checked out the site - only to
find this odd symbol located after double-quotes… it looked like two
squares on top of each other…
Yet, at home, or outside of school, I do not see this symbol anywhere.
Why might this be?

(Thank you again!)

If the browser is using a typeface (font) that doesn’t include the
precise character that your page encoding and HTML require, then you
won’t see that character. The glyph you describe sounds like the
“missing glyph” character, and that’s why I’m guessing you’re seeing it.

Another layer to this cake.

Walter

On May 31, 4:02pm, Walter D. [email protected] wrote:

If the browser is using a typeface (font) that doesn’t include the
precise character that your page encoding and HTML require, then you
won’t see that character. The glyph you describe sounds like the
“missing glyph” character, and that’s why I’m guessing you’re seeing it.

But the missing glyph character should be replaced with nothing. In
fact, if I could do so easily, I would just delete all these “missing
glyph” characters…
Is there anything you recommend I do about the missing glyph? I mean
I don’t even see it on my home computer… only on the computers at
school.

The missing glyph character is a feature of many different fonts – it
means literally, “I don’t have any glyph by that name in my table”.
The way you “get rid of it” is by providing an encoding and
substitution escapes that convert the wide, wild world of Unicode
typography into something that the more limited browser/OS
combinations can handle.

There are fonts that specialize in having suitably large collections
of characters to print nearly anything besides Klingon. These will
often have the word Unicode in their name. Many, if not most, core Mac
fonts are Unicode-aware, and if you are writing out a CSS font-family
that you mean to cover the most possible characters, you will add the
Microsoft variants of those to your list:

font-family: “Lucida Grande”, “Lucida Sans Unicode”, “Lucida Sans”,
Lucida, Geneva, Verdana, sans-serif;

In order for even this font family to work, the user will have to
install a modern version of their operating system, and maybe a modern
browser, and you can’t know or control that at all. But you can and
should declare a character encoding through your DOCTYPE and meta
tags, and your server should send a content-type header that includes
a charset attribute. All of these should match the encoding within
your database, and within the other content served by your Web server.
One charset to rule them all!

If your data is stuck in a particular charset, and you can’t figure
out how to convert it into UTF-8, then you need to modify everything
– starting with Rails – to recognize that the content is in that
encoding, and to treat it as such. Then you also need to specify the
encoding in the generated HTML, so your /layouts/application.html.erb
or local equivalent should have this line in it somewhere:

Walter

On May 29, 6:45pm, daze [email protected] wrote:

That does indeed look like an encoding issue. I assume that your ’
were in fact curly quotes. This kind of thing can happen when there is
a mismatch between the encoding the database is using and what rails
is using.

For example if rails is using utf8, but the database connection is set
to CP1252 then in order to save the curly quote character, your ruby
script would send the bytes E3 80 99 which is the utf8 sequence for
the uncode right single quotation mark (U2019).
If your db connection is set to be latin1 (or any similar single byte
encoding) then it will happily store that byte sequence as it is.

If now your app were to start doing the right thing and ask the db for
utf8 then converts what it things is latin1 (but is actually already
utf8) into utf8 a second time and so you get garbage (in cp1252 E3 80
99 is ’ which is what I see on your website). In order to fix this
you typically want to tell the database to reinterpret the contents of
text columns as utf8. How exactly depends on your database, but in
mysql something like

alter table foos MODIFY some_column BLOB;
alter table foos MODIFY some_column TEXT CHARACTER SET utf8;

will reinterpret whatever is in some_column as utf8. This might not be
exactly what you need - experiment with your data to see exactly what
what has happened - I once had a case where text was going through
this double encoding process twice so I had to repeat the above
commands twice to straighten out the data). Once you’ve sorted things
out, make sure you don’t fall into this hole again by making sure that
all your databases and tables have their default encoding set to utf8

Fred

On May 31, 8:48pm, daze [email protected] wrote:

“with pure ascii”? (What exactly is that…?)
Those are pure ascii characters (ie can be represented by a 7 bit
integer. Latin1, UTF8, etc. all represent these characters in the same
way so you shouldn’t get a problem

One other thing:
Does the encoding or interpretation of encoding vary from browser to
browser? See, I went to school and checked out the site - only to
find this odd symbol located after double-quotes… it looked like two
squares on top of each other…
Yet, at home, or outside of school, I do not see this symbol anywhere.
Why might this be?

as walter says, that sounds the missing glyph character, ie “you’ve
asked me to display a character that I can’t display”

Fred