Forum: Ruby on Rails String Encoding / Importing Feeds

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
A7db9ec803b5895ae5f916a74e2db329?d=identicon&s=25 Hunter Hillegas (Guest)
on 2007-05-27 22:56
(Received via mailing list)
Howdy,

I have a Rails app that grabs from RSS feed info and then tries to
insert it into a database.

The problem I'm having is that some feeds appear to use funky
characters and my INSERTs are failing. The actual error is below.

Any idea how I can make this work reliably, not really knowing if a
feed will have these or not?

I'm using PostgreSQL and the DB was initialized as UTF-8.

Any help is very much appreciated.

feeds#update_feeds (ActiveRecord::StatementInvalid) "PGError: ERROR:
invalid byte sequence for encoding \"UTF8\": 0xb9\n: INSERT INTO
feed_items (\"item_id\", \"updated_at\", \"title\", \"item_updated\",
\"description\", \"feed_id\", \"item_link\", \"created_at\") VALUES
(NULL, '2007-05-27 20:40:32.433422', 'Quote du jour', '2007-05-25
15:57:55.000000', '<p><a href=\"http://www.blah.lhah.com\">My Name</
a>: <i>There\271s really only one rule for community as far as I\271m
concerned, and it\271s this - in order to call some gathering of
people a \"community\", it is a requirement that if you\271re a
member of the community, and one day you stop showing up, people will
come looking for you to see where you went.</i></p>', 6, 'feed_url',
'2007-05-27 20:40:32.433422')"
588ab1c0a5610a7e160a3b101abb91e6?d=identicon&s=25 MichaelLatta (Guest)
on 2007-05-27 23:27
(Received via mailing list)
You need to encode your string rather than using it raw.  Then reverse
the process on read.

Probably the easiest and safest is to use base64 encoding, or to place
the string in a blob rather than a string field.

Michael
F8ade9f82dcb97b36f64aa0644f6f9f9?d=identicon&s=25 Carlos Lenz (lewdsilver)
on 2008-06-01 20:28
Hunter,
Did you ever find a solution to this problem?
I am having very similar issues:

RAW RESPONSE TEXT:
[Salut Alex, écoute c'est Alex je teste un peu et puis bonjour français
françaises. ]
UTF-8 Response text:
[Salut Alex, écoute c'est Alex je teste un peu et puis bonjour français
françaises. ]
Unique Id: 1212337033-59
  SQL (0.000081)   BEGIN
  GlobalInbox Update (0.000000)   PGError: ERROR: invalid byte sequence
for encoding "UTF8": 0xe9636f
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by
"client_encoding".
: UPDATE global_inboxes SET "created_at" = '2008-06-01 12:17:13.416834',
"voicemail_status_id" = 4, "deleted_at" = NULL, "voicemail_folder_id" =
1, "deleted" = 'f', "sender_cid" = '953794484', "conversion_to_text" =
'Salut Alex, écoute c'est Alex je teste un peu et puis bonjour français
françaises. ', "notes" = NULL, "voicemail_id" = 491, "updated_at" =
'2008-06-01 12:18:09.957716', "user_id" = 28 WHERE "id" = 491

The output above is from my production.log file.
The RAW response is the text that is output to the console by simply
printing the string with the data.
the UTF-8 is the .chars method called on the string type.

I am unable to insert the data into the DB using a string type or .chars
method.

Did you find a solution?



Hunter Hillegas wrote:
> Howdy,
>
> I have a Rails app that grabs from RSS feed info and then tries to
> insert it into a database.
>
> The problem I'm having is that some feeds appear to use funky
> characters and my INSERTs are failing. The actual error is below.
>
> Any idea how I can make this work reliably, not really knowing if a
> feed will have these or not?
>
> I'm using PostgreSQL and the DB was initialized as UTF-8.
>
> Any help is very much appreciated.
>
> feeds#update_feeds (ActiveRecord::StatementInvalid) "PGError: ERROR:
> invalid byte sequence for encoding \"UTF8\": 0xb9\n: INSERT INTO
> feed_items (\"item_id\", \"updated_at\", \"title\", \"item_updated\",
> \"description\", \"feed_id\", \"item_link\", \"created_at\") VALUES
> (NULL, '2007-05-27 20:40:32.433422', 'Quote du jour', '2007-05-25
> 15:57:55.000000', '<p><a href=\"http://www.blah.lhah.com\">My Name</
> a>: <i>There\271s really only one rule for community as far as I\271m
> concerned, and it\271s this - in order to call some gathering of
> people a \"community\", it is a requirement that if you\271re a
> member of the community, and one day you stop showing up, people will
> come looking for you to see where you went.</i></p>', 6, 'feed_url',
> '2007-05-27 20:40:32.433422')"
This topic is locked and can not be replied to.