Forum: Ruby on Rails dump and import MySQL table w/ accents

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
96146b7a23174e2e024c06a49f845bb8?d=identicon&s=25 Jeffrey L. Taylor (Guest)
on 2009-05-26 19:51
(Received via mailing list)
I am switching to a composite primary key (string and user ID) from the
Rails
conventional auto-incrementing integer primary ID.  The table is large
(2.5
million records) and I'd rather not discard the contents.  The
composite_primary_key gem doesn't appear to support altering the table
with a
migration to do its magic, only creating a table from scratch.  So I
dumped
the table with mysqldump, ran the migration (table looks good), and am
trying
to repopulate the table.  It has accented characters and is complaining
about
duplicates, apparently around words with and without accents, e.g.,
'jose' and
'josé'.  I've been deleting one by hand from the dump, but it is tedious
and
very slow.  Emacs crawls when dealing with very large files with very
long
lines.

I just don't understand why the accents are causing problems.  The
string
column is utf8_general_ci collation, just like other fields in the
database
with strings with accents.  What do I need to specify so it will import
the
dump?  Is there a problem with strings with accents in composite
indexes?

The table is created with a Rails migration, but everything else is pure
MySQL
utilities.

TIA,
  Jeffrey
B0f6756b62559d43213636dde11bb785?d=identicon&s=25 BenH (Guest)
on 2009-05-26 20:12
(Received via mailing list)
Ensure your database.yml file has a line like:

encoding: utf8

On May 26, 10:50 am, "Jeffrey L. Taylor" <r...@abluz.dyndns.org>
96146b7a23174e2e024c06a49f845bb8?d=identicon&s=25 Jeffrey L. Taylor (Guest)
on 2009-05-26 20:55
(Received via mailing list)
Quoting BenH <benhami@gmail.com>:
>
> Ensure your database.yml file has a line like:
>
> encoding: utf8
>

Thank you.  It has for over a year.  The data dumped was UTF-8, but
something
is preventing it from being re-imported.

Sigh,
  Jeffrey
96146b7a23174e2e024c06a49f845bb8?d=identicon&s=25 Jeffrey L. Taylor (Guest)
on 2009-05-29 00:36
(Received via mailing list)
Quoting Jeffrey L. Taylor <ror@abluz.dyndns.org>:
> very slow.  Emacs crawls when dealing with very large files with very long
> lines.
>
> I just don't understand why the accents are causing problems.  The string
> column is utf8_general_ci collation, just like other fields in the database
> with strings with accents.  What do I need to specify so it will import the
> dump?  Is there a problem with strings with accents in composite indexes?
>

The answer is mixed and bits of it don't make sense.  If I work with the
MySQL
client (mysql), I can create records with primary keys identical, except
for
an accent.  Doing the same thing in the Rails console throws an
DuplicateKey
exception.  Using mysqldump to dump the contents of a table, changing
the
table so the primary key is a integer-string composite and trying to
repopulate the table from the mysqldump output also barfs on a duplicate
key
exception.

It is non-optimal, but I can live for the moment with a table with two
non-unique indexes.

Jeffrey
67ecd92df094d1e1e571dd199cdd1aac?d=identicon&s=25 Wisccal Wisccal (wisccal)
on 2009-05-30 09:42
Jeffrey L. Taylor wrote:

>> The string column is utf8_general_ci collation, ...

As far as I understand, utf8_general_ci is case-insensitive. Maybe, you
want to give utf8_bin a shot.

mysql> create table t(i int, v varchar(10) collate utf8_general_ci,
constraint t_pk primary key(i, v));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t select 1, 'jose' union all select 1, 'josé';
ERROR 1062 (23000): Duplicate entry '1-josé' for key 1
mysql> drop table t;
Query OK, 0 rows affected (0.00 sec)

mysql> create table t(i int, v varchar(10) collate utf8_bin, constraint
t_pk primary key(i, v));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t select 1, 'jose' union all select 1, 'josé';
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from t;
+---+-------+
| i | v     |
+---+-------+
| 1 | jose  |
| 1 | josé  |
+---+-------+
2 rows in set (0.00 sec)
67ecd92df094d1e1e571dd199cdd1aac?d=identicon&s=25 Wisccal Wisccal (wisccal)
on 2009-05-30 09:44
Wisccal Wisccal wrote:
> Jeffrey L. Taylor wrote:
>
> As far as I understand, utf8_general_ci is case-insensitive.

I meant to say "accent-insensitive"...
96146b7a23174e2e024c06a49f845bb8?d=identicon&s=25 Jeffrey L. Taylor (Guest)
on 2009-06-01 09:09
(Received via mailing list)
Quoting Wisccal Wisccal <rails-mailing-list@andreas-s.net>:
>
> Wisccal Wisccal wrote:
> > Jeffrey L. Taylor wrote:
> >
> > As far as I understand, utf8_general_ci is case-insensitive.
>
> I meant to say "accent-insensitive"...

Thank you.  I didn't think MySQL would messup that badly.  I'll try this
out
this week.  I just checked and my development and production machines
have
different collation, which explains the inconsistent results I was
seeing in
my tests.

Jeffrey
This topic is locked and can not be replied to.