I use MySQL and making sure it is UTF-8 and in my view the character
set is also UTF-8. But when I display the text whose input came from
either an antiword.exe or WIN32OLE output of a MS Word document in a
textarea. Text fail to show immediately after a strange character that
shows up in rails console as \267. And I went back to Word to see what
this is (looked it up by its position). And it is a dot sort of
floating in middle of the line. Sort of like how they display chapters
or whatever they call it of the Bible. like 12-7[dot]Matthrew
doc=“This is a pipe, but \267 this is not a pipe”
This is a pipe, butIt just sort of STOPS rendering the rest of the text.
I can’t possibly ask my clients to remove that so to convenient me. I
have been on a 38 hours hunt to try to find some solutions to it.
Some says remove all [^[:print:]] matches. Which I can do and find a
way to at least preserve the \n\r’s. But then again, I do want to
preserve also as much of the original document as possible. I mean,
what if they use umlauts the o with " on top.