Friday, April 27, 2012

MySQL Cluster: Disk Data Config



If you are using the Disk Data feature of MySQL Cluster you may find this useful. As you may know the default way of storing data in MySQL Cluster is in-memory. However, if you have tables that don't require realtime access and/or are growing larger than your available memory then you can use disk data tables. I have written a bunch of posts about disk data tables before:

However, it is quite hard to configure Disk Data tables correctly. There are some parameters to keep an eye on:
  • UNDO LOG - MySQL Cluster undo logs every write access to the disk data tables. You can grow this resource online (ALTER .. ADD ..)
  • REDO LOG - as for in-memory tables  every write is REDO logged. You can grow the REDO log with a  Rolling restart (NoOfFragmentLogFiles). The REDO LOG is shared between the disk data tables and the in-memory tables.
  • REDO BUFFER - specified as RedoBuffer=32M or higher in config.ini.  The REDO BUFFER is shared between the disk data tables and the in-memory tables.
  • UNDO BUFFER -  created from SharedGlobalMemory resource and is set at LOGFILE GROUP creation. WARNING!! Can not be changed at runtime! The LOGFILE GROUP must be recreated, and it requires that the table space (and all its data files) is dropped before.
  • TABLESPACE - can be extended at runtime.  ALTER .. ADD..
  • DiskPageBufferMemory - like the innodb_buffer_pool. The bigger the better. Can be tuned with a Rolling Restart.

For stability the most common mistakes are to set UNDO LOG and UNDO BUFFER too small.
What can happen then?

Stability and Recovery

If UNDO BUFFER is too small symptoms are:
  • Failed writes to the Disk Data tables. 
  • In worst case you can hit crashing bugs.

If UNDO LOG is too small symptoms are:
  • Failed writes to the Disk Data tables. In worst case you can hit crashing bugs.
  •  Node recovery may not be possible - the recovering node will crash.
    Severalnines has assisted a number of client to perform  a number of recovery when this has been the case.
  • If there is a Cluster crash due to a too small UNDO LOG you will most likely have to restore from backup.

How can it be possible for MySQL Cluster to fail to recover the node, when it was possible to load it with data?

The problem is that your write pattern when writing data at runtime into MySQL Cluster is different from the write pattern at recovery.
At recovery the recovering data node will copy data (over the network) from the other node in the node group. It will do this as fast as it can, table by table.

Thus, it will be a lot of data coming into the recovering data node (like a bulk write) and in this case then the UNDO LOG may be too small to handle the write log (you can argue why UNDO and REDO log while doing recovery but but..). If the UNDO LOG is over run, then the node will crash.

Configuration

As usual this depends a lot  on the access patterns you have so it is hard to give exact numbers but if you rely on disk data tables you should have:
  • [NDBD DEFAULT] SharedGlobalMemory=1024M
    Disk operation records and other meta data is allocated from this buffer. The more records you have in the disk data tables the bigger this buffer must be. 1024M is a good start.
  • [NDBD DEFAULT] DiskPageBufferMemory=4096M
    A
    s large as you can (remember that in-memory tables, and for disk data tables, indexed attributes resides in DataMemory so you need to have enough DataMemory.
    The important when tuning the DiskPageBufferMemory is to get a good hit ratio. If you use ClusterControl you will see the hit ratio (picture coming soon).
    4096M is a good start.
  • UNDO_BUFFER_SIZE=32M  or 64M (same size as you have for the RedoBuffer)
    This is specified in the CREATE LOGFILE GROUP ... 
  • UNDO LOG space should be 0.5 x NoOfFragmentLogFileSize x 4 x FragmentLogFileSize (this may take it to the extreme but better safe than sorry).
  • [NDBD DEFAULT]  DiskIoThreadPool=8    (2 is default)   
  • UndoIndexBuffer/UndoDataBuffer  - those parameters are deprecated but still in the manual (I have grepped the 7.2.4 source and find no trace of it. It was a long time those were used. When LCP was done different in Cluster. So no need to set these.

Please not that you can always add more UNDO LOG files online by issuing
 ALTER LOGFILE GROUP  ADD UNDOFILE 'undo_X.dat INITIAL_SIZE=1024M ENGINE=NDBCLUSTER;
However, if the Cluster crashes, then it is too late.

Disk Layout
  • Put the Redo log + LCP  on separate disk spindles ( A)
  • Put the Undo log on separate disk spindles  (FileSystemPathUndoFiles) (B)
  • Put the Data files (table space) on separate disk spindles (FileSystemPathDataFiles) (C)

In many cases you can combine B and C on the same disks, and if you have a good disk array, you can of course combine A,B and C. 

Bottom line is, watch your disks and utilized they are. That will determine if you have to make disk reconfigurations. Also, beware of trial and error before getting the DiskPageBufferMemory and SharedGlobalMemory right.

Wednesday, April 25, 2012

MySQL Cluster: How to load it with data


After you have setup and properly configured MySQL Cluster you typically want to load data into it. There are a couple of ways to load data into MySQL Cluster, e.g, from dumpfiles or from csv files.

If you expect loading data into MySQL Cluster will be as fast as on MyISAM then you have the wrong expectations unless you parallelize your data loading. 

Also, MYISAM/INNODB stores the data in local buffers/disk and in one copy only (asynchronous replication does not count as it is asynchronous) whereas MySQL Cluster (NDBCLUSTER engine) stores two copies of the data. Synchronous replication between the two copies adds ~ 2x overhead and you have network between mysql servers and data nodes.

So, to load a dump file into MySQL Cluster is bound to be slower than MYISAM/INNODB. Also loading data files can be error prone.

Here is what we at Severalnines usually do to avoid trial and terror of configuration options and to make sure we don't have to make X tries to load my Y GB dump file. We want to do it once, and get it right then, just because it takes time to make multiple iterations:
  • Load in the schema in MYISAM/INNODB on one of the MySQL Servers connected to the data nodes.Please Note that your data is NOT yet stored in the NDBCLUSTER storage engine and is at this point only accessible from the mysql server where you imported the MYISAM/INNODB tables.
  • Check for redundant indexes and remove them to reduce storage footprint. Some ORMs have a tendency for adding redundant indexes. Personally I think it is faster to make the transformations when the tables are in MYISAM/INNODB than in NDBCLUSTER, especially if there are data in the tables.
    You can use Percona's pt-duplicate-key-checker  (wget percona.com/get/pt-duplicate-key-checker). See an example below at the end. It makes it dead easy to find and drop the redundant indexes. Redundant indexes are good to remove also because writes will be faster (less indexes to update) and to reduce the risk that the optimizer will pick the wrong index.
  • Check data types - do you have BLOB/TEXT where you really can use VARBINARY/VARCHAR ? BLOBs/TEXTs can slow down performance with a factor 2-3x (at least). Sometimes BLOBs/TEXTs are overused in many applications.
When you have washed your schema then you can simply do:

ALTER TABLE tablename ENGINE=ndbcluster;

If you get an error message such as:

mysql> alter table events  engine=ndbcluster;
ERROR 1297 (HY000): Got temporary error 1217 'Out of operation records in local data manager (increase MaxNoOfLocalOperations)' from NDBCLUSTER

Then you can change the ndb_batch_size (a lower batch size will make it longer time to ALTER the table from MYISAM/INNODB to NDBCLUSTER because more roundtrips are needed between the  MySQL Server and the data nodes):

mysql> set  ndb_batch_size=8*1024*1024;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table events  engine=ndbcluster;
Query OK, 67748 rows affected (4.00 sec)
Records: 67748  Duplicates: 0  Warnings: 0

32MB is the max value for ndb_batch_size.

Now, to furthermore increase the speed of the ALTER you can perform two or more ALTERs in parallel from two or more connections:
mysql conn 1 > alter table events     engine=ndbcluster;
mysql conn 2 > alter table employees  engine=ndbcluster;
mysql conn 3 > alter table randomdata engine=ndbcluster;

Finally do:
ANALYZE TABLE ;

on all tables to build index statistics (MySQL Cluster 7.2 will love you for this and you will hate MySQL Cluster 7.2 if you don't).

However, if you do too many in parallel you will most likely run into problems like:
  • Redolog files are overloaded  (you have too small redo log and too slow disks to handle the write load generated by the ALTER)
  • Out of Redo buffer (you have too slow disks to handle the write load generated by the ALTER) -  it will probably not matter if you increase the RedoBuffer as your disk subsystem it too slow and cannot "consume" what you "produce".
  • Out of operation records   ( increase MaxNoOfConcurrentOperations or MaxNoOfLocalOperations), at Severalnines we set this to 100000 by default, you can of course increase this but it costs memory. If you increase it you should change:
    MaxNoOfConcurrentOperations, then MaxNoOfLocalOperations will automatically be set to 1.1 x the value of MaxNoOfConcurrentOperations. With MaxNoOfConcurrentOperations=100000 it will take 100MB of RAM on each data node. MaxNoOfConcurrentOperations=200000 will take 200MB of RAM on each data node.
If you have a CSV file you can do the same: Load the data first into MYISAM/INNODB tables and then ALTER. 

Why first load it into MYISAM/INNODB?
Well, i think it makes it easier and gives better control. If you have a 30GB dump file and many tables you want to load, then it is easier to track errors and to make use of the ndb_batch_size parameter to make it actually load without errors, i.e, you will save time, even though you will use some extra time to first load in the data into MYISAM/INNODB:

Example run of ./pt-duplicate-key-checker:

./pt-duplicate-key-checker  --host=ubuntu02 --user=cmon --password=xxxxxxx --databases=test2 --tables=t1
# ##################################################################
# test2.t1                                                                
# ##################################################################
# a is a duplicate of PRIMARY
# Key definitions:
#   KEY `a` (`a`)
#   PRIMARY KEY (`a`),
# Column types:
#  `a` int(11) not null
# To remove this duplicate index, execute:
ALTER TABLE `test2`.`t1` DROP INDEX `a`;
# ##################################################################
# Summary of indexes                                                      
# ##################################################################

# Size Duplicate Indexes   0
# Total Duplicate Indexes  1
# Total Indexes            2