Weird characters, like –, displaying


#1

I’m getting weird characters displaying on my pages, e.g.: –.

I’m using MySQL and my database encoding is set to utf8 and I set the
character set for my pages to render as utf8 with .

Does anyone know how to get rid of these characters?

Thanks,
Ben


#2

On 4/30/07, Ben W. removed_email_address@domain.invalid wrote:

I’m getting weird characters displaying on my pages, e.g.: â€".

I’m using MySQL and my database encoding is set to utf8 and I set the
character set for my pages to render as utf8 with .

Sure your browser is using the right encoding? If not, try setting the
http Content-Type header. More reliable for me than a meta tag.

Isak


#3

Sure your browser is using the right encoding? If not, try setting the
http Content-Type header. More reliable for me than a meta tag.

Yes, the page seems to be set up fine. I’ve also tried installing the
BrowserFilters plugin, but that did not work either.

I have a feeling it may be the case that when these characters were
originally put into the database, they were somehow entered incorrectly.
I seem to be able to enter the correct characters–i.e., curly
quotes–myself and they save and display fine. Is there any way to do
some sort of search and replace on these guys?

Thanks for the help.


#4

You should also make sure that the browser is submitting the
information as UTF8. If I recall, it’s sufficient to declare the page
with the form in it as utf8, but you might want to double-check that.

Search and replace is going to be tough unless there’s an existing
script around for doing that.

One thing you could do is try writing a script in Java (or Ruby) that
reads a known bad row from the database and converts it various ways
and prints it out until you know exactly what conversion you’re going
to need.

Programmatically detecting the messed-up strings seems like it would
be more difficult, though, unless there are some clear constraints on
what those strings should contain (i.e., something you can run a regex
on), which might be the case if it’s, say, a validated form field.

Jun-Dai