Thursday, April 09, 2009

Multi-source replication with MySQL Cluster

With MySQL Cluster it is possible to aggregate data from many MySQL Servers using Replication. Here is how.


E.g, you might have a sensor network, where each sensor writes data into a mysql server. The problem is that you have a quite few of these sensors, and you want to do aggregate this data, e.g. in order to do data mining on the combined data set.

The standard MySQL server does not support multi-source (i.e, one slave server can be connected to many masters), but Cluster can be used for this, since you can have many mysql servers connected to Cluster acting as slaves.

It is also possible to extend this - e.g, to replicate the aggregated data from Cluster to e.g, Infobright for datamining.

I did a small PoC at home to show how to use MySQL Cluster for multi-source replication, and here is what I did:
  • Installed MySQL Cluster on the host computer (called barracuda)- I used the MySQL Cluster 6.3 slave medium sandbox. The Cluster is going to act as a slave collecting data from two master mysql servers. The sandbox I am using comes with two mysql servers that are acting as slaves in this setup.
  • Installed VirtualBox and two VMs (Linux, Ubuntu)
    In each VM (server01 and server02) I installed one mysql server.
    These two mysql servers are acting as masters.
Creating the tables

on the Cluster:

CREATE TABLE `sensor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sensor_id` int(11) NOT NULL DEFAULT '0',
`data` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`,`sensor_id`),
KEY `sensor_id` (`sensor_id`)
) ENGINE=ndbcluster partition by key(sensor_id);


and on the masters:

CREATE TABLE `sensor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sensor_id` int(11) NOT NULL DEFAULT '0',
`data` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`,`sensor_id`),
KEY `sensor_id` (`sensor_id`)
) ENGINE=innodb;

(btw - why does not innodb let me create the table if i change the order of 'id' and ' sensor_id' ???)
Then it was just a matter of setting up the replication links between the each master and slave:
server01 (master1) ---> barracuda (slave1)
server02 (master2) ---> barracuda (slave2)


Setup the GRANTs on master1 and master2:

GRANT REPLICATION SLAVE ON *.* TO 'repl'@'barracuda' identified by 'password'

Resetting master1 and master2

Just to make it simple to have a good position to start from in the binary logs (show master status would have worked to but it would have been more to wrote..)
RESET MASTER

On slave1

CHANGE MASTER TO MASTER_HOST='server01', MASTER_USER='repl', MASTER_PASSWORD='password';
START SLAVE;

On slave2

CHANGE MASTER TO MASTER_HOST='server02', MASTER_USER='repl', MASTER_PASSWORD='password';
START SLAVE;


Generating load on the masters

On server01 and server02 I then started bencher to insert data into the sensor table:
test@server01: src/bencher -s /tmp/mysql.sock -t 1 -l 10000000 -q "INSERT INTO sensor(sensor_id,data) VALUES (1, <1024B of data>)"
test@server02: src/bencher -s /tmp/mysql.sock -t 1 -l 10000000 -q "INSERT INTO sensor(sensor_id,data) VALUES (2, <1024B of data>)"


And the masters where looking like this:

(please not that I did not try to stress the system that much here since I only have one computer)

And the slaves on Cluster:

No comments: