Tuesday, April 28, 2009

Online configuration update and a new script

If you are using the scripts from the Configurator (and yes, the broken link is now fixed so it is actually possible to get the config package..), and want to tweak your configuration on a started Cluster then you can do like written below.

There is also a new script in Configurator 2.2 (released 28th April 2009, PST1153AM , GMT07:53) that lets you check the values of different config parameters. Its use is described below.

Pretend we want to extend the DataMemory from 1024M to 1536M:
  1. cd mysqlcluster-70/cluster/scripts
  2. vi ../config/config.ini
    This is the master config.ini file and will be copied out to the management server(s). Don't change /etc/mysql/config.ini, as that will have no effect.
  3. Set DataMemory=1536M then save and exit
  4. sh rolling-restart.sh
  5. When it has finished, verify that the Cluster has the new settings by running query-param.sh:
    sh query-param.sh DataMemory
    Parameter=DataMemory
    Data Node 3 : 1610612736
    Data Node 4 : 1610612736
    Please note that the output is in bytes.
The rolling-restart.sh script implements this procedure described in the reference manual.
Many other parameters can be changed using the above steps (e.g, MaxNoOfConcurrentOperations etc). Be careful though that you have memory available (the script does not check for that yet).

Parameters that change the file system one way or another (NoOfFragmentLogFiles, FragmentLogFileSize, Datadir, BackupDataDir, FileSystem*, InitFragmentLogfiles) must be changed with an initial rolling restart (all parameters are listed here in the ref manual):

Pretend we want to change the size of the Redolog (NoOfFragmentLogfiles).
From the beginning we have:
./query-param.sh NoOfFragmentLogFiles
Parameter=NoOfFragmentLogFiles
Data Node 3 : 6
Data Node 4 : 6

Then we modify our "template" config.ini and go ahead:
  1. vi ../config/config.ini
    This is the master config.ini file and will be copied out to the management server(s)
  2. Set NoOfFragmentLogFiles=7 then save and exit
  3. sh rolling-restart-initial.sh
  4. When it has finished, verify that the Cluster has the new settings by running query-param.sh:
    ./query-param.sh NoOfFragmentLogFiles
    Parameter=NoOfFragmentLogFiles
    Data Node 3 : 7
    Data Node 4 : 7
Have fun upgrading but remember - don't try to start with a smaller configuration on an already existing configuration (i.e., it is safe to increase parameters, but be very careful if you decrease).

Monday, April 27, 2009

MC 7.0 New Features, LDAP For MySQL Cluster - UC2009 sessions

The presentations from the sessions MySQL Cluster 7.0 - New Features and LDAP For MySQL Cluster - back-ndb, and all other sessions on MySQL Cluster, are available for download from www.mysqlconf.com.

Saturday, April 25, 2009

Easiest way to try out new MySQL Cluster 7.0.5

Here is how to install and start MySQL Cluster 7.0.5 in eight steps:
  1. Go to the configurator and setup the configuration you wish to have. Make sure you select "MySQL Cluster 7.0.x" in the first drop down.
  2. Copy the package you either download or receive by email to one of the computers that will run one of the management servers.
  3. tar xvfz mysqlcluster-70.tgz
  4. cd mysqlcluster-70/scripts/install
  5. Download MySQL Cluster 7.0 from MySQL website and put it in mysqlcluster-70/scripts/install
  6. sh dist-tgz.sh
  7. sh bootstrap.sh
  8. cd .. && sh start-cluster-initial.sh
Okay, step 1) requires that you setup shared ssh keys and a bit of typing and clicking, but gives you a production class configuration, and scripts to manage the cluster from a single point. Including backups and restores of MySQL Cluster!

I have mostly tested the scripts on Linux - please let me know about issues on other platforms!

Configurator - new version 2.1

The Configurator has undergone some serious fixing as 2.0 had some issues.
Here is a brief list of the enhancements:
  • error handing and progress reports - I have rewritten almost all scripts with better error handling and progress reports. E.g, if a node fails to start during a rolling restart, then the rolling restart script is aborted and you get a suggestion how to recover. If you run start-cluster-initial.sh on an already started cluster, it will refuse to run. There are many more changes like this to check the status of the nodes (both data nodes and sql nodes) during startup
  • download link to package - When you have entered the email address, then you will also get a link where to download the package, in case you experience problems with receiving the email (I have raised an issue to the hosting company about this)
  • restructure - I have also moved the install scripts to scripts/install and the monit scripts to scipts/extra. The install_initd_linux.sh uses chkconfig and it does not work on all linux distros. Going to fix that.
  • bugs - fixed a number of small bugs and annoyances.

Thursday, April 16, 2009

max_rows in cluster

If you are going to load a lot of records into Cluster, don't forget to set max_rows!
My colleague, Yves at BigDBAhead, has also blogged about this, but I also ran into the same problem recently.

I did try to populate 100M records on a 4 node cluster, and the data nodes went down with the following error message in the error logs:

"2304 Array index out of range"

So the error message is crap - and my opinion is that there should be a proper error message propagated up to the mysql server. There is a bug report on this.

Simplified, what the error message means is that you have run out of "index slots" in the Hash Table storing the hashes of the Primary Keys. This is because each table is divided into a number of partitions, and each partition can only store X number of records in the hash table (about 100M).

The workaround when creating huge tables is to set max_rows=<2 x the number of records that you plan to store in the table>. 2x is just to be safe. Internally this will create more partitions. Internally the cost of this in terms of memory is negligible.

Example:
CREATE TABLE t1 (
a INTEGER PRIMARY KEY
) ENGINE=ndbcluster MAX_ROWS=1000000000

Thursday, April 09, 2009

Mac OSX support for Cluster Sandbox & Configurator

Now you can use the Sandboxes and Configurator on Mac OSX.

You need to download the macos.sh script.
  • Sandbox users - Copy the script into the
    mysqlcluster-XXX/scripts/
    catalog
  • Configurator users - Copy the script into the
    mysqlcluster-XXX/cluster/scripts/
    catalog
Then run it:

sh macos.sh

before you run any other script.

This will convert all the LD_LIBRARY_PATH to DYLD_LIBRARY_PATH and also use 'tar xvfz' instead of 'zcat' which does not work in MAC OSX.

You also need to download apple xcode and wget.

Multi-source replication with MySQL Cluster

With MySQL Cluster it is possible to aggregate data from many MySQL Servers using Replication. Here is how.


E.g, you might have a sensor network, where each sensor writes data into a mysql server. The problem is that you have a quite few of these sensors, and you want to do aggregate this data, e.g. in order to do data mining on the combined data set.

The standard MySQL server does not support multi-source (i.e, one slave server can be connected to many masters), but Cluster can be used for this, since you can have many mysql servers connected to Cluster acting as slaves.

It is also possible to extend this - e.g, to replicate the aggregated data from Cluster to e.g, Infobright for datamining.

I did a small PoC at home to show how to use MySQL Cluster for multi-source replication, and here is what I did:
  • Installed MySQL Cluster on the host computer (called barracuda)- I used the MySQL Cluster 6.3 slave medium sandbox. The Cluster is going to act as a slave collecting data from two master mysql servers. The sandbox I am using comes with two mysql servers that are acting as slaves in this setup.
  • Installed VirtualBox and two VMs (Linux, Ubuntu)
    In each VM (server01 and server02) I installed one mysql server.
    These two mysql servers are acting as masters.
Creating the tables

on the Cluster:

CREATE TABLE `sensor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sensor_id` int(11) NOT NULL DEFAULT '0',
`data` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`,`sensor_id`),
KEY `sensor_id` (`sensor_id`)
) ENGINE=ndbcluster partition by key(sensor_id);


and on the masters:

CREATE TABLE `sensor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sensor_id` int(11) NOT NULL DEFAULT '0',
`data` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`,`sensor_id`),
KEY `sensor_id` (`sensor_id`)
) ENGINE=innodb;

(btw - why does not innodb let me create the table if i change the order of 'id' and ' sensor_id' ???)
Then it was just a matter of setting up the replication links between the each master and slave:
server01 (master1) ---> barracuda (slave1)
server02 (master2) ---> barracuda (slave2)


Setup the GRANTs on master1 and master2:

GRANT REPLICATION SLAVE ON *.* TO 'repl'@'barracuda' identified by 'password'

Resetting master1 and master2

Just to make it simple to have a good position to start from in the binary logs (show master status would have worked to but it would have been more to wrote..)
RESET MASTER

On slave1

CHANGE MASTER TO MASTER_HOST='server01', MASTER_USER='repl', MASTER_PASSWORD='password';
START SLAVE;

On slave2

CHANGE MASTER TO MASTER_HOST='server02', MASTER_USER='repl', MASTER_PASSWORD='password';
START SLAVE;


Generating load on the masters

On server01 and server02 I then started bencher to insert data into the sensor table:
test@server01: src/bencher -s /tmp/mysql.sock -t 1 -l 10000000 -q "INSERT INTO sensor(sensor_id,data) VALUES (1, <1024B of data>)"
test@server02: src/bencher -s /tmp/mysql.sock -t 1 -l 10000000 -q "INSERT INTO sensor(sensor_id,data) VALUES (2, <1024B of data>)"


And the masters where looking like this:

(please not that I did not try to stress the system that much here since I only have one computer)

And the slaves on Cluster: