Thursday, July 30, 2009

ndb_restore - best practice

A really simple best practice:

Make sure there are NO mysql servers connected to the Cluster while you do the restore!
It can cause the restore to fail badly and potentially even your cluster.

Also, make sure (if you use disk data) to remove all data files in the datadirs of the data nodes as they are not removed when doing an --initial ... I have written a bug report about that...

(i will update the severalnines/config scipts to include this best practice!)

Wednesday, July 01, 2009

Problems with .FRM files, auto-discovery and MySQL Cluster

There are some bug reports on the auto-discovery protocol in MySQL Cluster.
The idea of the auto-discovery protocol is to fetch the .frm files for the NDB tables stored in the data dictionary of the data nodes, and put them in the data directory of the mysql server.

However, sometimes (not always, which makes it more difficult to reproduce and hence fix), the auto-discovery seems to make strange things (from this bug report):
After shuting down and restoring my cluster I get the following error.

090211 9:59:26 [Note] NDB: mismatch in frm for panel.gatewayquestions, discovering...
090211 9:59:26 [Note] NDB Binlog: DISCOVER TABLE Event: REPL$panel/gatewayquestions
090211 9:59:26 [Note] NDB Binlog: logging ./panel/gatewayquestions (UPDATED,USE_WRITE)

This is due to the files already being in the mysql data directory. After the error the
frm does not match the data in memory this causes the following.
When running select count(*) from tablename;
You will get an accurate count.
When running select * from table name;
You get an error Can't find record in tablename.
I have seen it as well at some customers usually with bigger installations and many tables.

My current recommendation (work around) is to delete the FRM files associated with the NDB tables in the mysql server data directory before you start the mysql server(s).

So this is what i always include in my MySQL server startup scripts (and is included in the Configurator scripts):
files=`find $mysql_datadir  -name "*.ndb"`
for f in $files
do
x=`basename $f .ndb`
#make sure we leave out ndb_binlog_index and ndb_schema since they are myisam tables
if [ "$x" == "ndb_binlog_index" ] || [ "$x" == "ndb_schema" ] ;
then
echo "Ignoring $x"
else
y=`echo $f | sed -e 's#ndb#frm#'`
rm -rf $f
rm -rf $y
fi
done
#start the mysqld here
If I want to restore data I usually:
  1. stop the cluster
  2. start the data nodes with --initial
  3. stop the mysql servers (make sure they are not started)
  4. restore the cluster data
  5. start the mysql servers (clearing out whatever .frm files coming from the ndb tables)