Hi,
Trying to learn ruby, I am writing a script to migrate from a pybloxsom
to wordpress. As you may know, pybloxsom stores all entries and comments
in text files under a directory hierachy. Mi idea is to read all those
files (the subdirectories store the categories) and inject them in the
mysql database wordpress uses.
So far, I have been able to read all the posts and comments but I am
having some problems injecting them in mysql (BTW, I am using the mysql
module). The problem, I guess, is with some sort of encoding with the
text.
Basicaly I have two problems:
-
Accented characters. For example, if I have a accented vowel like “í”
they are not properly inserted into the mysql table and would get weird
characters. I guess that if I do a function that substitute every single
of these characters for its html entity (ie. í) would work, but I
guess there must be a more appropriately way to do it, right? Anything
to do with the encoding? -
Also, I have this problem that wordpress interprets \n characters (I
guess). For example, if I have a post like the following:
This is an example of an image.
would turn into:
This is an example of an <img
src=“image.jpg”> image.
interpreting the \n character right after <img, inserting the br tag
which breaks the HTML. I thought that If I would delete all the \n
characters it would be fine, but the thing is that there are some posts
with pre labels where \n are required.
Any idea on this?
Anyway, thanks in advance!