Monday, September 24, 2012

garbd - How to avoid network partitioning or split brain in a Galera Cluster

Network partitioning (aka split brain) is something you don’t want to happen, but what is the problem really?

Sometimes, users set up a Galera Cluster (or Percona XtraDB Cluster) containing an even number
of nodes, e.g, four or six nodes, and then place half of the nodes in one data center and the other half in another data center. If the network link goes down between the two data centers, then the Galera cluster is partitioned and half of the nodes cannot communicate with the other half. Galera’s quorum algorithm
will not be able to select the ‘primary component’, since both sets have the same number of nodes. You then end up with 2 sets of nodes that are ‘non-primary’, and effectively, none of the nodes would be available for database transactions.

Below is a screenshot of ClusterControl for Galera. I have simulated a network partition by simply killing two of the nodes at the same time.

Network partitioning / split brain
Half of the nodes got disconnected from the other nodes. The nodes staying up cannot determine the state of the other nodes, e.g, if they are really down (dead process), or if it was just the network link that went down and the processes are still up. Continuing then would be unsafe, and the two separated halves could potentially drift away, and you would have an inconsistent database.

At this point, our 2 nodes are pretty useless; at least one node should be in state ‘synced’ before it can be accessed. ClusterControl will address this situation by first recovering the first two nodes in ‘non-Primary’ state and setting them to Primary. It will then resync the two crashed nodes when they are back.

In order to avoid this, you can install an arbitrator (garbd). garbd is a stateless daemon which acts as a lightweight group member. In a quorum situation where half of the nodes are not available, garbd will help avoid a split-brain.

In ClusterControl, garbd is a managed process (will be automatically restarted  in case of failure) and can easily be installed from the deployment package:


Now that we have installed garbd in our 4-node test cluster, it will now consist of 1+4 nodes. Killing 2 nodes will not affect the whole cluster, as the remaining nodes will have the majority, see the picture below The minority will be excluded from the cluster, and when they are up and ready to rejoin the cluster, they will go through a recovery protocol before being accepted into the cluster.

Majority win - garbd + two survivors > two failed nodes
Hopefully this gives an idea how and why garbd comes in handy.

Tuesday, September 11, 2012

MySQL Cluster to InnoDB Replication Howto

In this blog post I will show you how to setup a replication from MySQL Cluster  (ndbcluster) to a regular MySQL Server (InnoDB). If you want to understand the concepts, check out part 7 of our free MySQL Cluster training.

First of all we start with a MySQL Cluster looking like this, and what we want to do is to setup replication server to the Reporting Server (InnoDB slave).

MySQL Cluster is great at scaling large numbers of write transactions or shorter key-based read querise, but not so good at longer reporting or analytical queries. I generally recommend people to limit analytical/reporting queries on the MySQL Cluster, in order to avoid slowing down the realtime access to the cluster. A great way of doing that is to replicate the MySQL Cluster data to a standalone MySQL Server.  

To achieve that, we will need a replication server. All data written into NDBCLUSTER is sent as events to the replication server. A MySQL Server can be turned into a replication server by specifying --log-bin. The replication server then produces a binary log, which can be replicated to a standalone InnoDB. 

(NOTE: For redundancy, it is possible to have 2 replication servers. We will cover that in a separate blog.)

Replication Layer Configuration
In the my.cnf of the replication server you should have the following:
#server-id must be unique across all mysql servers participating in replication.
You may want to skip  the binlog-do-db=.., if you want to replicate all databases, but, if you want to replicate a particular database, make sure you also replicate the mysql database in order to get some very important data on the slave.

Restart the replication server for the settings to have effect.
Grant access to the Slave:

GRANT REPLICATION SLAVE ON *.* TO 'repl'@'ip/hostname of mysqld m' 

InnoDB Slave Configuration
The first requirement on the InnoDb slave is that it must use the mysqld binary that comes from the MySQL Cluster package. If you already have a MySQL 5.5 installed that is not clustered, you need to upgrade it to the Cluster version of it. E.g, by doing:

sudo rpm -Uvh MySQL-Cluster-server-gpl-7.2.7-1.el6.x86_64.rpm
sudo rpm -Uvh MySQL-Cluster-client-gpl-7.2.7-1.el6.x86_64.rpm

The InnoDB slave should have the following:


If you want the InnoDb to further replicate to a set of slaves, then you should set log-slave-updates=1 otherwise you can set it to 0 (log-slave-updates=0). Thatt is all, restart the slave.

You must also create the following table on the Innodb Slave:

use mysql;
CREATE TABLE `ndb_apply_status` (
`server_id` int(10) unsigned NOT NULL,
`epoch` bigint(20) unsigned NOT NULL,
`log_name` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`start_pos` bigint(20) unsigned NOT NULL,
`end_pos` bigint(20) unsigned NOT NULL,

CHANGE MASTER TO MASTER_HOST='ip/hostname of the replication server', MASTER_USER='repl', MASTER_PASSWORD='repl';

Staging the InnoDB Slave with Data
Now you need to stage the InnoDB slave with data. What you need to do is to disable traffic to NDBCLUSTER in order to get a consistent snapshot of the data. As there are no clusterwide table locks in NDBCLUSTER you have two options:
  1. Block the Loadbalancer from sending any transactions to MySQL Cluster
  2. Make all SQL nodes READ ONLY, by locking all tables on ALL MySQL servers (if you use NDBAPI applications, then option 1) or shutting down the applications is the the only option):  FLUSH TABLES WITH READ LOCK;
So on all MySQL Servers in the Access Layer do:

Ensure, by looking at the replication server,  that no writes are made to the NDBCLUSTER by looking at the SHOW MASTER STATUS:
mysql> show master status;
| File          | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
| binlog.000008 | 859092065 |              |                  |
1 row in set (0.00 sec) 
Run the SHOW MASTER STATUS; a couple of times until you see the Position not changing any more

Then RESET the replication server, so you have a good clean slate to start from:

Now use mysqldump two times to get  :
  1. one dump with the schema
  2. another dump with the data 
mysqldump --no-data --routines --trigggers > schema.sql
mysqldump --no-create-info --master-data=1 > data.sql

Of course you can dump out only the databases you are interested in.

When the dumps have finished, you can enable traffic to NDBCLUSTER again. You can do on ALL SQL nodes:


Point is that you can enable traffic to NDBCLUSTER again.

Now, change the ENGINE=ndbcluster to ENGINE=innodb in schema.sql:

sed -i.bak 's#ndbcluster#innodb#g' schema.sql

Copy the schema.sql and data.sql to the slave, and load in the dump file to the InnoDb slave.

Finally you can start replication, on the InnoDB slave you can now do:


And hopefully all will be fine :)