SOT MySQL index question

How should I index the terms table for maximum speed? It doesn’t have
to be
Rails migration doable. E.g. a primary key of (user_id, article_id,
fnv) is
okay. fnv is a 63 bit Fowler-Noll-Vo hash.

def self.neighbors(user_id, article_id)
sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM "

"tokens AS t1 JOIN tokens AS t2 "
"ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id =
#{user_id} AND "
"t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id}
GROUP BY t1.article_id "
“ORDER BY cosim DESC LIMIT 3”
connection.select_rows(sql)
end

TIA,
Jeffrey

Jeffrey L. Taylor wrote in post #958953:

How should I index the terms table for maximum speed?

How can we tell you? You neglected to say how you’re using that
table…or is the query below the only one you’re interested in?

It doesn’t have
to be
Rails migration doable.

But it will be, since adding indices generally is.

E.g. a primary key of (user_id, article_id,
fnv) is
okay. fnv is a 63 bit Fowler-Noll-Vo hash.

def self.neighbors(user_id, article_id)
sql = "SELECT t1.article_id, SUM(t1.freq * t2.freq) AS cosim FROM "

"tokens AS t1 JOIN tokens AS t2 "
"ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id =
#{user_id} AND "
"t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id}
GROUP BY t1.article_id "
“ORDER BY cosim DESC LIMIT 3”
connection.select_rows(sql)
end

Run EXPLAIN SELECT on this query (or whatever your DB’s equivalent is).
See where it’s doing full table scans and add indices as appropriate.

TIA,
Jeffrey

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Quoting Marnen Laibow-Koser [email protected]:

Jeffrey L. Taylor wrote in post #958953:

How should I index the terms table for maximum speed?

How can we tell you? You neglected to say how you’re using that
table…or is the query below the only one you’re interested in?

Correct, this is the one I care about. All others are trivial in terms
of
resources compared to this one.

It doesn’t have
to be
Rails migration doable.

But it will be, since adding indices generally is.

Composite keys are not supported in stock Rails. And I will not switch
DB
servers next week, so having the index creation in portable form is not
a necessity.

#{user_id} AND "
"t1.scoring = 1 AND t2.scoring = 0 AND t2.article_id = #{article_id}
GROUP BY t1.article_id "
“ORDER BY cosim DESC LIMIT 3”
connection.select_rows(sql)
end

Run EXPLAIN SELECT on this query (or whatever your DB’s equivalent is).
See where it’s doing full table scans and add indices as appropriate.

mysql> explain extended SELECT t1.article_id, SUM(t1.freq * t2.freq)
FROM tokens AS t1 JOIN tokens AS t2 ON t1.token = t2.token AND
t1.user_id = 1 AND t2.user_id = 1 AND t1.scoring = 1 AND t2.scoring = 0
GROUP BY article_id;
±—±------------±------±-----±--------------±-----±--------±-----±------±---------±---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| filtered | Extra |
±—±------------±------±-----±--------------±-----±--------±-----±------±---------±---------------------------------------------+
| 1 | SIMPLE | t1 | ALL | user_id | NULL | NULL | NULL | 34773
| 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | t2 | ALL | user_id | NULL | NULL | NULL | 34773
| 100.00 | Using where; Using join buffer |
±—±------------±------±-----±--------------±-----±--------±-----±------±---------±---------------------------------------------+
2 rows in set, 1 warning (0.07 sec)

mysql> show warnings;
| Level | Code | Message

| Note | 1003 | select tv2sql_development.t1.article_id AS
article_id,sum((tv2sql_development.t1.freq *
tv2sql_development.t2.freq)) AS SUM(t1.freq * t2.freq) from
tv2sql_development.tokens t1 join tv2sql_development.tokens t2 where
((tv2sql_development.t2.scoring = 0) and
(tv2sql_development.t1.scoring = 1) and (tv2sql_development.t2.user_id
= 1) and (tv2sql_development.t1.user_id = 1) and
(tv2sql_development.t2.token = tv2sql_development.t1.token)) group by
tv2sql_development.t1.article_id |

TIA,
Jeffrey

Jeffrey L. Taylor wrote in post #959191:

Quoting Marnen Laibow-Koser [email protected]:

Jeffrey L. Taylor wrote in post #958953:

How should I index the terms table for maximum speed?

How can we tell you? You neglected to say how you’re using that
table…or is the query below the only one you’re interested in?

Correct, this is the one I care about. All others are trivial in terms
of
resources compared to this one.

It doesn’t have
to be
Rails migration doable.

But it will be, since adding indices generally is.

Composite keys are not supported in stock Rails.

Composite primary keys are not (but there’s a plugin). Composite
indices are. You can specify multiple columns in add_index.

And I will not switch
DB
servers next week, so having the index creation in portable form is not
a necessity.

Never write SQL in Rails unless there’s no other way.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Sent from my iPhone

Quoting Marnen Laibow-Koser [email protected]:

of

Composite primary keys are not (but there’s a plugin). Composite
indices are. You can specify multiple columns in add_index.

Sorry I left out a constraint that turned out to be critical. This is
the
primary and only index on this table. I’ve tried the plug-in. It was
not
reliable in my use scenario.

And I will not switch
DB
servers next week, so having the index creation in portable form is not
a necessity.

Never write SQL in Rails unless there’s no other way.

If it is feasible in Rails, I do it that way. However, I was explicitly
lifting the constraint of doable in Rails for this usecase. If the
usecase
included dozens of servers and multiple DB server software, then staying
in
Rails would make sense. I have one production server (with test
database) and
one or two development setups (with test databases). Setup in the
database
client program is very doable. Fighting to shoehorn a usecase that
Rails does
not handle into Rails is not productive.

Jeffrey

On Nov 4, 12:56am, “Jeffrey L. Taylor” [email protected] wrote:

"ON t1.fnv = t2.fnv WHERE t1.user_id = #{user_id} AND t2.user_id =
mysql> explain extended SELECT t1.article_id, SUM(t1.freq * t2.freq) FROM tokens
AS t1 JOIN tokens AS t2 ON t1.token = t2.token AND t1.user_id = 1 AND t2.user_id =
1 AND t1.scoring = 1 AND t2.scoring = 0 GROUP BY article_id;
±—±------------±------±-----±--------------±-----±--------±-----+
-------±---------±---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows |
filtered | Extra |
±—±------------±------±-----±--------------±-----±--------±-----+
-------±---------±---------------------------------------------+
| 1 | SIMPLE | t1 | ALL | user_id | NULL | NULL | NULL | 34773 | 100.00
| Using where; Using temporary; Using filesort |
| 1 | SIMPLE | t2 | ALL | user_id | NULL | NULL | NULL | 34773 | 100.00
| Using where; Using join buffer |
±—±------------±------±-----±--------------±-----±--------±-----+
-------±---------±---------------------------------------------+
2 rows in set, 1 warning (0.07 sec)

you could try an index on user_id,scoring, and you definitely want one
on token (any time you do a join you want an index on the columns you
join on). You might try adding article_id to that first index to see
if you can get the DB to use that to help by the grouping

Fred

Jeffrey L. Taylor wrote in post #960038:

Quoting Marnen Laibow-Koser [email protected]:

of

Composite primary keys are not (but there’s a plugin). Composite
indices are. You can specify multiple columns in add_index.

Sorry I left out a constraint that turned out to be critical. This is
the
primary and only index on this table. I’ve tried the plug-in. It was
not
reliable in my use scenario.

Then here’s what I would suggest. Keep the composite index, but do not
make it the primary key. Use a surrogate key (that is, a Rails-style id
field) instead. You’ll be happier. Rails will be happier. Composite
primary keys tend to be annoying (yes, I’ve worked with them
successfully, but I’m glad I no longer have to), and anyway surrogate
keys are nicer because you can guarantee that they’ll never ever change
over the life of the record.

And I will not switch
DB
servers next week, so having the index creation in portable form is not
a necessity.

Never write SQL in Rails unless there’s no other way.

If it is feasible in Rails, I do it that way. However, I was explicitly
lifting the constraint of doable in Rails for this usecase. If the
usecase
included dozens of servers and multiple DB server software, then staying
in
Rails would make sense. I have one production server (with test
database) and
one or two development setups (with test databases). Setup in the
database
client program is very doable.

But it’s a bad idea since it defeats automation. The right way to do
this – if you must – is to put raw SQL in the migration and set the
schema dumper to use SQL. That way you can still use migrations and the
schema file.

Fighting to shoehorn a usecase that
Rails does
not handle into Rails is not productive.

On the contrary, it’s very productive. That’s how many of the best
Rails plugins got developed.

But in this case, all you need to do is add a surrogate key.

Jeffrey

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]