String Encoding / Importing Feeds

Howdy,

I have a Rails app that grabs from RSS feed info and then tries to
insert it into a database.

The problem I’m having is that some feeds appear to use funky
characters and my INSERTs are failing. The actual error is below.

Any idea how I can make this work reliably, not really knowing if a
feed will have these or not?

I’m using PostgreSQL and the DB was initialized as UTF-8.

Any help is very much appreciated.

feeds#update_feeds (ActiveRecord::StatementInvalid) “PGError: ERROR:
invalid byte sequence for encoding "UTF8": 0xb9\n: INSERT INTO
feed_items ("item_id", "updated_at", "title", "item_updated",
"description", "feed_id", "item_link", "created_at") VALUES
(NULL, ‘2007-05-27 20:40:32.433422’, ‘Quote du jour’, ‘2007-05-25
15:57:55.000000’, ‘

<a href="http://www.blah.lhah.com">My Name</
a>: There\271s really only one rule for community as far as I\271m
concerned, and it\271s this - in order to call some gathering of
people a "community", it is a requirement that if you\271re a
member of the community, and one day you stop showing up, people will
come looking for you to see where you went.

’, 6, ‘feed_url’,
‘2007-05-27 20:40:32.433422’)”

You need to encode your string rather than using it raw. Then reverse
the process on read.

Probably the easiest and safest is to use base64 encoding, or to place
the string in a blob rather than a string field.

Michael

Hunter,
Did you ever find a solution to this problem?
I am having very similar issues:

RAW RESPONSE TEXT:
[Salut Alex, écoute c’est Alex je teste un peu et puis bonjour français
françaises. ]
UTF-8 Response text:
[Salut Alex, écoute c’est Alex je teste un peu et puis bonjour français
françaises. ]
Unique Id: 1212337033-59
SQL (0.000081) BEGIN
GlobalInbox Update (0.000000) PGError: ERROR: invalid byte sequence
for encoding “UTF8”: 0xe9636f
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by
“client_encoding”.
: UPDATE global_inboxes SET “created_at” = ‘2008-06-01 12:17:13.416834’,
“voicemail_status_id” = 4, “deleted_at” = NULL, “voicemail_folder_id” =
1, “deleted” = ‘f’, “sender_cid” = ‘953794484’, “conversion_to_text” =
'Salut Alex, écoute c’est Alex je teste un peu et puis bonjour français
françaises. ', “notes” = NULL, “voicemail_id” = 491, “updated_at” =
‘2008-06-01 12:18:09.957716’, “user_id” = 28 WHERE “id” = 491

The output above is from my production.log file.
The RAW response is the text that is output to the console by simply
printing the string with the data.
the UTF-8 is the .chars method called on the string type.

I am unable to insert the data into the DB using a string type or .chars
method.

Did you find a solution?

Hunter H. wrote:

Howdy,

I have a Rails app that grabs from RSS feed info and then tries to
insert it into a database.

The problem I’m having is that some feeds appear to use funky
characters and my INSERTs are failing. The actual error is below.

Any idea how I can make this work reliably, not really knowing if a
feed will have these or not?

I’m using PostgreSQL and the DB was initialized as UTF-8.

Any help is very much appreciated.

feeds#update_feeds (ActiveRecord::StatementInvalid) “PGError: ERROR:
invalid byte sequence for encoding "UTF8": 0xb9\n: INSERT INTO
feed_items ("item_id", "updated_at", "title", "item_updated",
"description", "feed_id", "item_link", "created_at") VALUES
(NULL, ‘2007-05-27 20:40:32.433422’, ‘Quote du jour’, ‘2007-05-25
15:57:55.000000’, ‘

<a href="http://www.blah.lhah.com">My Name</
a>: There\271s really only one rule for community as far as I\271m
concerned, and it\271s this - in order to call some gathering of
people a "community", it is a requirement that if you\271re a
member of the community, and one day you stop showing up, people will
come looking for you to see where you went.

’, 6, ‘feed_url’,
‘2007-05-27 20:40:32.433422’)”