Johan Andersson's Cluster and HA Blog

Thursday, September 03, 2009

MySQL Cluster on two hosts - options and implications

Considering that you have the following hosts and MySQL Cluster deployed in the following ways:

A: ndb_mgmd, ndbd , mysqld
B: ndb_mgmd, ndbd , mysqld

Now you have a couple of options on how to handle potential network partitioning/split brain which could occur if the computer A and B lose contact with each other. Let's see the implications of these options.

Option 1: ArbitrationRank=1, no NIC bonding

If ndb_mgmd on A is elected as the arbitrator and host A crashes, then data node on host B will also die (cannot reach arbitrator).
If ndb_mgmd on A is elected as the arbitrator and host B lose contact with host A, then the data node on B will shut down since it can't reach the arbitrator. For further discussions, call this situation X.
If you are in situation X and if you restart data node on B and you have not fixed the link), then it will start isolated (after some timeouts have kicked in, StartPartionedTimeout and StartPartialTimeout). Now you are in situation Y.
If you had an application writing data (and assume this application crashed when the connection from B -> A was lost) then the following can happen if you are in situation Y
If you do "select count(*) from t1" on mysqld host A, then it can differ from a "select count(*) from t1" on mysqld on host B, because the data node on B never got the message to flush the redo buffer to the redo log.
Now you have ended up in a very bad situation. Kill data node on host B. Fix network, restart data node with --initial.
However, this could be combined with STONITH to kill off one of the data nodes.

Option 2: ArbitrationRank=0, no NIC bonding

no arbitrators are used - no protection from split brain.
if host A lose contact with host B, then the data nodes will be split
Application can write to A and B, but e.g A will _never_ see the changes from B, and vice versa
The data nodes will drift apart and be out of sync (inconsistent). Very bad.
When the link comes up again the data nodes A and B will not reconnect to each other.
System administrator must kill one of the data nodes and restart it with --initial.

Option 3: ArbitrationRank=0, NIC bonding + redundant switch.

no arbitrators are used - no protection from split brain.
Same problems as Option 2) if a) both network links are broken or b) both switches crashes. However, the likelihood of this happening is very small.

Option 4: Aribtration=WaitExternal (new in MySQL Cluster 7.0.7)

External arbitrator is used: You, a sys admin, a dba, or a process.
When the data nodes lose contact with each other, they are read-only for ArbitrationDelay before unlocking (and becoming writable).
This means that you have ArbitrationDelay (configurable) time to decide which data node that should live and who should die.
The data node that you killed can then be restarted with "ndbd", i.e, without --initial.
"You" in this case can be an external program, sysadmin etc..
If you use an external program must run on a third computer.

Option 5: ArbitrationRank=1 and have ndb_mgmd on remote site

If you use asynchronous replication between SITE_A and SITE_B then it could be possible to have the management server there with ArbitrationRank=1, and then potentially local management servers on host A and B that only has configuration data (i.e, ArbitrationRank=0 for those).
This must be tested to see if the link between SITE_A and SITE_B is "good enough". The link should also be reliable.

Option 6: ArbitrationRank=1 and have a third host (called C)

This is the best and recommended approach (and best if even have a computer D that is like C for redundancy matters).
Run the management server on host C. Setup would look like:
A: ndbd + mysqld
B: ndbd + mysqld
C: ndb_mgmd (AribtrationRank=1) + mysqld

And the winner is...

What is the "best" option:

Option 6) - get a third computer - simple
Option 3), Option 4) - "you do it, but be careful"
Option 5) - must be tested
Option 1) (atleast the cluster will crash avoiding inconsistencies)
Option 2)

I am arguing with myself whether or not Option 1) should be on either 2nd or 3rd place, atleast it is simple and easy to understand.

Of course redundant NICs (bonding) and redundant switches can be used in all of the options above, even it is not explicitly said so for some of them.

Upgrade to 7.0.7 (with the Configurator)

MySQL Cluster 7.0.7 was released as a source distribution 1st of Sept 2009. You should upgrade if you can build from source or use the Configurator.

The Configurator v2.9 has been updated to use this version.

If you are already using the Configurator and build from source you can upgrade from MySQL Cluster 7.0.6 to 7.0.7 (and you are recommended to upgrade because of the following changes/fixes) in four steps. Here is how:

1. run the upgrade-706-to-707-src.sh script (put it in install/ and chmod u+x ./upgrade-706-to-707-src.sh):

[cluster01]# pwd
/root/mysqlcluster-70-master/cluster/scripts/install
[cluster01]# chmod u+x ./upgrade-706-to-707-src.sh
[cluster01]# ./upgrade-706-to-707-src.sh
Upgrading scripts
done - now run ./download-and-compile.sh

2. run the script download-and-compile.sh

[cluster01]# ./download-and-compile.sh
After some time it will finish compiling:

mysql-7.0.7-linux-x86_64/sql-bench/test-alter-table
mysql-7.0.7-linux-x86_64.tar.gz created
Removing temporary directory
Downloaded binary distribution to ../../repo

(the 'downloaded binary' should really read 'Copied binary').

3. When you get prompted with the following question answer 'y' (yes):

Do you want to install the binaries on the hosts now
(else run 'install-cluster.sh' later)? (y/n):y

Important! Do not run the the bootstrap.sh script!!
4. change directory to the scripts/ to directory and run the ./rolling-restart.sh script

[cluster01]# pwd
/root/mysqlcluster-70-master/cluster/scripts/
[cluster01]# ./rolling-restart.sh

This in an online procedure so it requires no downtime of the Cluster.
If you are using binary distributions then you have to wait, because they are not ready yet.
Good luck!

Ps - Configurator 2.9 (released 2nd of September 2009) contains some minor fixes:

Adding of [mysqld] slot for administration purposes
./start-backup now backup the configuration files (config.ini, my.cnf and some internal files) as well.

Johan Andersson's Cluster and HA Blog

Thursday, September 03, 2009

MySQL Cluster on two hosts - options and implications

Upgrade to 7.0.7 (with the Configurator)

Blog Archive

About Me

Good resources

Blogs