On the cluster mailing list (cluster@lists.mysql.com ,
thread "slow selects") there was recently a very good example of how to optimize queries on Cluster. Thanks to Nick Keefen for raising this problem and Jeff Sturm for the answer how to solve it!
In short the problem is that the Optimizer in the MySQL server does not get adequate statistics from the data nodes about table sizes, indexes etc. This makes the Optimizer clueless in some cases how to order tables in joins, and also in some cases which is the best indexes to use.
So here is the problem that Nick highlighted:
When the tables are stored in MyISAM:
mysql> SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) limit 3;
3 rows in set (0.32 sec)
phpfox_user is about 100000 rows and phpfox_videos is 170000 rows large
Trying to run the same query on my cluster machine, i get
3 rows in set (20.47 sec)
Why is this? Let's look at the EXPLAIN of the queries:
MyISAM explain:
mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) ORDER BY v.vid_time DESC LIMIT 3;
+----+-------------+-------+--------+---------------+---------
+---------+---------------------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------
+---------+---------------------+--------+----------------+
| 1 | SIMPLE | v | ALL | vid_userid | NULL | NULL | NULL | 135025 | Using filesort |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | nktest.v.vid_userid | 1 | |
+----+-------------+-------+--------+---------------+---------
+---------+---------------------+--------+----------------+
NDB explain:
mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) ORDER BY v.vid_time DESC LIMIT 3;
+----+-------------+-------+------+---------------+-----------
-+---------+-----------------+-------+------------------------
---------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-----------
-+---------+-----------------+-------+------------------------
---------+
| 1 | SIMPLE | u | ALL | PRIMARY | NULL | NULL | NULL | 82124 | Using temporary; Using filesort |
| 1 | SIMPLE | v | ref | vid_userid | vid_userid | 4 | bb2_phpfox.u.id | 1 | |
+----+-------------+-------+------+---------------+-----------
-+---------+-----------------+-------+------------------------
---------+
Note the additional 'Using temporary' clause for NDB. Nearly all of the 20 seconds taken by the query fall at 'copying to temporary table'.
The problem is that the Optimizer has changed the order (in case of NDB) how tables should be joined!
Jeff Sturm replied to Nick with a remedy:
Looks like the query optimizer is putting the tables in the wrong order.
(This isn't uncommon in my experience, as NDB seems to have less
information than other engines to optimize queries at runtime.)
Try modifying your query with STRAIGHT_JOIN syntax, i.e.:
SELECT v.vid_id, v.vid_title, u.user FROM phpfox_videos AS v
STRAIGHT_JOIN phpfox_user AS u ON v.vid_userid = u.id ORDER BY
v.vid_time DESC LIMIT 3;
and the problem was solved. The query now runs in split second.
Moreover, in some cases you also want to use
USE INDEX
or
FORCE INDEX
.
There are situations where the Optimizer wants to use e.g index
a_idx
to resolve the query, but really index
b_idx
is the one it should use!
Thus you can use:
SELECT * FROM t1 FORCE INDEX(b_idx) , t2 WHERE ...
Good luck! And yes, one day the Optimizer will get the good statistics!