Wednesday, July 01, 2009

Problems with .FRM files, auto-discovery and MySQL Cluster

There are some bug reports on the auto-discovery protocol in MySQL Cluster.
The idea of the auto-discovery protocol is to fetch the .frm files for the NDB tables stored in the data dictionary of the data nodes, and put them in the data directory of the mysql server.

However, sometimes (not always, which makes it more difficult to reproduce and hence fix), the auto-discovery seems to make strange things (from this bug report):
After shuting down and restoring my cluster I get the following error.

090211 9:59:26 [Note] NDB: mismatch in frm for panel.gatewayquestions, discovering...
090211 9:59:26 [Note] NDB Binlog: DISCOVER TABLE Event: REPL$panel/gatewayquestions
090211 9:59:26 [Note] NDB Binlog: logging ./panel/gatewayquestions (UPDATED,USE_WRITE)

This is due to the files already being in the mysql data directory. After the error the
frm does not match the data in memory this causes the following.
When running select count(*) from tablename;
You will get an accurate count.
When running select * from table name;
You get an error Can't find record in tablename.
I have seen it as well at some customers usually with bigger installations and many tables.

My current recommendation (work around) is to delete the FRM files associated with the NDB tables in the mysql server data directory before you start the mysql server(s).

So this is what i always include in my MySQL server startup scripts (and is included in the Configurator scripts):
files=`find $mysql_datadir  -name "*.ndb"`
for f in $files
do
x=`basename $f .ndb`
#make sure we leave out ndb_binlog_index and ndb_schema since they are myisam tables
if [ "$x" == "ndb_binlog_index" ] || [ "$x" == "ndb_schema" ] ;
then
echo "Ignoring $x"
else
y=`echo $f | sed -e 's#ndb#frm#'`
rm -rf $f
rm -rf $y
fi
done
#start the mysqld here
If I want to restore data I usually:
  1. stop the cluster
  2. start the data nodes with --initial
  3. stop the mysql servers (make sure they are not started)
  4. restore the cluster data
  5. start the mysql servers (clearing out whatever .frm files coming from the ndb tables)

Monday, June 29, 2009

How to upgrade from 6.3 to 7.0

In order to upgrade from 6.3 to 7.0 you must follow these rules:
  1. IT IS ONLY POSSIBLE TO UPGRADE FROM 6.3 to at least 7.0.6!
  2. I would recommend to upgrade from 6.3.x to 6.3.latest before doing the upgrade. If the upgrade does not work then and you have a good config.ini , you have probably hit a bug.
  3. You cannot upgrade from 6.3 to the multi-threaded binary of 7.0.6 in one go.
    You have to upgrade from 'ndbd' (6.3) --> 'ndbd' (7.0.6)
    Then you can do 'ndbd' (7.0.6) --> 'ndbmtd' (7.0.6)
So don't try to upgrade to 7.0.5 - it will fail.
As usual when upgrading (version, configuration variables etc) you must do a rolling restart and restart:
  1. ndb_mgmd (management servers)
  2. ndbd (data nodes)
  3. mysqld (mysql servers / direct api applications)
Distribute the 7.0.6 binaries
Copy the binaries to each host (i.e., replace existing 6.3 binaries) in the cluster.

Upgrading the Management Server(s)

In 6.3 you started the management server as (e.g):
ndb_mgmd -f /etc/mysql/config.ini
In 7.0.6 you have to start them slightly differently, but first you have to stop both (if you have two, but you really should) management servers (because of this bug):
killall ndb_mgmd
And.. I never use the management client to start/stop nodes, because I think it works sometimes which makes it difficult to file bugs on, and hard to rely on.

When you have killed both you can then do:
ndb_mgmd -f /etc/mysql/config.ini --configdir=/etc/mysql --reload
The management server in 7.0.6 writes a binary config file in --configdir, which is a configuration cache. --reload will reload the config.ini and update that cache.

Upgrading the Data Nodes

Restart the data nodes on each machine one at a time:
killall ndbd
ndbd --ndb-nodeid=X --ndb-connectstring="managementhostA;managementhostB"
ndb_waiter --ndb-connectstring
="managementhostA;managementhostB"
# when ndb_waiter exits, you can restart the next data node.

Upgrading the MySQL Servers/Direct APIs

Stop and start the mysql servers:
killall mysqld  #(or mysqladmin shutdown , both sends signal 15)
mysqld_safe --defaults-file=/etc/config/my.cnf
Upgrading to multithreaded 7.0.6 data nodes

When the Cluster has been restarted with 7.0.6 you can upgrade to multithreaded data nodes.
The binary for the multithreaded data node is called 'ndbmtd'.

You need to set in config.ini (on both management servers):
[ndbd default]
..
MaxNoOfExecutionThreads=#number of cores you have (up to 8)
..
Then stop both management servers
killall ndb_mgmd
When you have killed both you can then start both by doing:
ndb_mgmd -f /etc/mysql/config.ini --configdir=/etc/mysql --reload
and then restart the data nodes one at a time:
killall ndbd
ndbmtd --ndb-nodeid=X --ndb-connectstring="managementhostA;managementhostB"
ndb_waiter --ndb-connectstring
="managementhostA;managementhostB"
# when ndb_waiter exits, you can restart the next data node.
Using Severalnines Configurator

If you have a Severalnines installation of MySQL Cluster 6.3, then you can upgrade in the following way:
  1. Make sure you have have the files:
    mysqlcluster-xy/cluster/scripts/install/.s9s/hostnames
    mysqlcluster-xy/cluster/scripts/install/.s9s/ndb_hostnames
    mysqlcluster-xy/cluster/scripts/install/.s9s/mysql_hostnames
    mysqlcluster-xy/cluster/scripts/install/.s9s/mgm_hostnames
    Otherwise you have a too old version of the Configurator package and you need to generate a new one.
  2. Generate a new Config using 7.0.6 with the same nodes, data memory, data dirs etc.
    Select "no MT" when asked for "Number of cores:"


  3. Install the package
    Let's assume you have 6.3 in:
    /root/mysqlcluster-63/
    Install mysqlcluster-70.tar.gz
    cd /root/
    tar xvfz mysqlcluster-70.tar.gz
    cd mysqlcluster-70/cluster/scripts/install
    Copy the .s9s catalog from 6.3 so you get the hostnames
    cp  -r /root/mysqlcluster-63/cluster/scripts/install/.s9s  .
  4. Download the binary or build the source:
    ./download-binary.sh
    or
    ./download-and-compile.sh
  5. Install and perform a rolling restart
    ./install-cluster.sh
    cd ..
    ./rolling-restart.sh
  6. Change from ndbd --> ndbmtd
    vi ../config/config.ini
    Locate MaxNoOfExecutionThreads and, un-comment it if needed and set it to as many cores as you have:
    [ndbd default]
    ...
    MaxNoOfExecutionThreads=8
    ...
    Then change in the scripts from ndbd -> ndbmtd
    sed -i 's#libexec/ndbd#libexec/ndbmtd#g' *.sh
    Important! Don' leave out 'libexec' above!
    And do a rolling restart:
    ./rolling-restart.sh
  7. Luckily it is much easier to upgrade from e.g 7.0.6 to 7.0.x and I will show when 7.0.7 is out how to do that in two lines.