Thursday, October 30, 2008

MySQL Cluster Sandbox

There is also a sandbox for MySQL Cluster!

If you have Linux installed (I have not tested on Solaris) and want to get familiar with MySQL Cluster, or setup a development environment all on your localhost, then MySQL Cluster Sandbox is for you!

Here is the README!

The whole installation is in a single directory and easy to remove.

Optimizing Queries on Cluster

On the cluster mailing list ( , thread "slow selects") there was recently a very good example of how to optimize queries on Cluster. Thanks to Nick Keefen for raising this problem and Jeff Sturm for the answer how to solve it!

In short the problem is that the Optimizer in the MySQL server does not get adequate statistics from the data nodes about table sizes, indexes etc. This makes the Optimizer clueless in some cases how to order tables in joins, and also in some cases which is the best indexes to use.

So here is the problem that Nick highlighted:

When the tables are stored in MyISAM:
mysql> SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = limit 3;
3 rows in set (0.32 sec)

phpfox_user is about 100000 rows and phpfox_videos is 170000 rows large

Trying to run the same query on my cluster machine, i get
3 rows in set (20.47 sec)

Why is this? Let's look at the EXPLAIN of the queries:

MyISAM explain:

mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = ORDER BY v.vid_time DESC LIMIT 3;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | v | ALL | vid_userid | NULL | NULL | NULL | 135025 | Using filesort |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | nktest.v.vid_userid | 1 | |

NDB explain:

mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = ORDER BY v.vid_time DESC LIMIT 3;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | u | ALL | PRIMARY | NULL | NULL | NULL | 82124 | Using temporary; Using filesort |
| 1 | SIMPLE | v | ref | vid_userid | vid_userid | 4 | | 1 | |

Note the additional 'Using temporary' clause for NDB. Nearly all of the 20 seconds taken by the query fall at 'copying to temporary table'.

The problem is that the Optimizer has changed the order (in case of NDB) how tables should be joined!

Jeff Sturm replied to Nick with a remedy:

Looks like the query optimizer is putting the tables in the wrong order.
(This isn't uncommon in my experience, as NDB seems to have less
information than other engines to optimize queries at runtime.)

Try modifying your query with STRAIGHT_JOIN syntax, i.e.:

SELECT v.vid_id, v.vid_title, u.user FROM phpfox_videos AS v
STRAIGHT_JOIN phpfox_user AS u ON v.vid_userid = ORDER BY
v.vid_time DESC LIMIT 3;

and the problem was solved. The query now runs in split second.
Moreover, in some cases you also want to use USE INDEX or FORCE INDEX.
There are situations where the Optimizer wants to use e.g index a_idx to resolve the query, but really index b_idx is the one it should use!
Thus you can use:
SELECT * FROM t1 FORCE INDEX(b_idx) , t2 WHERE ...

Good luck! And yes, one day the Optimizer will get the good statistics!

Tuesday, October 21, 2008


I get a few questions how to set DiskPageBufferMemory.

Before discussing that let's recap a few things:
  • Non-indexed attributes (or explicitly defined in CREATE TABLE..) are stored on disk in table spaces.
  • Indexed attributes are stored in DataMemory
  • The hash of the Primary Key is stored in the IndexMemory
Determining how much IndexMemory/DataMemory/Tablespaces that are needed, you can use the dimensioning toolkit.

Now you can setup Undobuffers etc according to my post on disk data a while back.

But what about the DiskPageBufferMemory?! If you have used innodb, then you know the innodb_buffer_pool. This is basically the same. You need to set it big so that you cache as much as possible of the active (most frequently used) data set on disk.

We need to know what the data node is using memory for so that we can set the DiskPageBufferMemory appropriately:
  • Internal buffers (RedoBuffer, send buffers): 900MB (if you use the config files from the config tool)
  • OS will use ~1GB (this really too much and can be trimmed)
  • DataMemory: X MB (derived from the dimensioning tool), say 2GB (for the example below)
  • IndexMemory: Y MB (derived from e.g the dimensioning tool), say 256MB (for the example below)
If I have 8GB of RAM in my computer, I would set the DiskPageBufferMemory to:

DiskPageBufferMemory = 0.8 x [ 8GB - 1GB (OS) -900MB (internal buffers) - 2GB (DataMemory) - 256MB (IndexMemory) ] = ~3072MB

I take 0.8x just to leave some slack if I need to reconfig som other buffer later. After a while, when the system is tuned in I can then chose to use those extra MB to extend the DiskPageBufferMemory.

And yes, the statistics to monitor the DiskPageBufferMemory (and all other params as well) are missing. But behold, it will come!

And let me know if you have tested disk data with SSD devices!

Tuesday, October 07, 2008

CMON binaries for Solaris 10 (sparc 64-bit)

I have built binaries for Solaris 10 (sparc 64-bit). You can download them at
Moreover a version 1.2.1 of cmon was also released as it fixed some nuisances (fixed so it is possible to actually build it on Solaris :).

Read the release notes for more info.

Upgrade from MySQL Cluster 6.2 to 6.3

Is it possible to upgrade from MySQL Cluster 6.2 to MySQL Cluster 6.3? Of course!
Can it be done online? Sure!

Here is how!

I started off as with the cluster in the picture below:

All you need to do is as follows:
  • Copy out the new binaries (MySQL Cluster 6.3) on to all hosts
  • Install the new version (please note that you have not restarted anything yet - we are about too). The install phase is just to copy out the binaries of the new version to all hosts and make sure that you are referencing the new version of the Cluster binaries in scripts, PATHs etc.
After that you need to:
  1. restart both (all) management servers at the same time!
    (TRICK! If the management servers are using a connect-string that only references itself, then you can restart them one at a time. Otherwise they risk picking up the other ones configuration! So if you have two hosts where the management servers run, A and B, then the connect strings should be:
    • For the management server on host A: ndb_mgmd -c "A"
    • For the management server on host B: ndb_mgmd -c "B"
    • And NOT ndb_mgmd -c "A;B" , then you risk that they pick up the wrong configuration data (unless you stop them both at the same time and then restart them) and it would be really bad!
  2. ndb_mgm -e "show" --> verify that all of them have version 6.3.x. The data nodes and mysql servers will still show they are version 6.2.x
  3. restart the data nodes (one at a time) until you have restarted all of them
  4. verify with ndb_mgm -e "show" that the data nodes all have the correct version (6.3.x)
  5. restart the mysql servers (one at a time) until you have restarted all of them.
  6. verify with ndb_mgm -e "show" that the data nodes all have the correct version (6.3.x)
While I am restarting the cluster (i am using the configuration tool and monit and cmon) you see the following in cmon:

And when the rolling restart was complete I have:

Actually, for a short period of time, the web browser will show a blank page for the duration when the management servers are stopped and restarting!

If you had used the configuration tool to generate scripts and a configuration for MySQL Cluster 6.2, then you can generate a new configuration and scripts for MySQL Cluster 6.3 (the two configurations must be identical in terms of number of nodes (data memory, data nodes, management servers, mysql servers) and then use those scripts:
  1. sh ./ (download and compiles latest MySQL Cluster 6.3 - works probably only for Linux)
  2. sh ./
or if you can find the correct version at then you can do
  1. sh ./ <name of binary dist(rpms does not work>
  2. sh ./ (sh ./ if you are using monit to guard your processes).
I used the approach above together with cmon.

Friday, October 03, 2008

cmon 1.2 - released

You can now download cmon 1.2.2 at
Thank you everyone for your feedback.
Bugs fixed in cmon 1.2.2:
Bugs fixed in cmon 1.2.1:
  • Fixed a number of issues with building cmon on Solaris.
  • Updated the compile instructions adding a note how to build it on Solaris.

Bugs fixed in cmon 1.2:
  • --logfile - specifying a custom logfile was not working earlier and cmon exitted. Now this is fixed!
  • new configure option (--with-wwwroot): ./configure --with-wwwroot=</path/to/wwwroot/>
  • make install-extra: install cmon/www/ files into wwwroot directory and also install initd/cmon into /etc/init.d/cmon. This requires that you are root.
  • jpgraph-1.26 is bundled (in www/cmon) and is installed with make install-extra.