Monday, May 21, 2007

Good configuration

I get the opportunity see a lot of different MySQL Cluster configurations and most of them does not configure the disk subsystem in a good way. E.g, the redo log is way to often not correctly dimensioned. Wrongly dimensioned buffers affects system stability!!

There are also some buffers related to communication and disk, that is important to setup. At the end here you will find a good configuration template that you can use, but first we will have a brief discussion about the disk


Configuring the disk

In MySQL Cluster, it is very important to dimension the redo log properly in order for System restart and Node restarts to work. In MySQL Cluster, the redo log must be large enough to accomodate changes that happens over three Local Checkpoints (LCP). When the third LCP has been written, then the redo log can be recycled. The LCP is the image of the DataMemory which is written to disk, so if you have filled up 8GB of DataMemory, then one LCP will take approx. 8GB of disk space. Writing the LCP takes time, and in 5.1 it is written with a speed determined by DiskCheckpointSpeed. DiskCheckpointSpeed is 10MB/s by default writing an LCP of 8GB will then take 8192MB/10MB/s = ~820seconds. MySQL Cluster wiill write three LCPs, and clearly we need to have a REDO log big enough to handle changes that happens over three LCPs, which is then 2460 seconds. However, we should have some head room in the redo log, so we pretend that we should save the redo log for four LCPs. That is 3280 seconds of changes!

Pretend that you write 1MB/s to a two node, two replica cluster (remember, that provisioning or restore data is often the heaviest write phase in a database). Then the size of the REDO log should be atleast:

Size of redo log = 1MB/s * 3280seconds = 3300MB

Moreover, the redo log file is segmented over a number of 64MB large segments on disk. How many segments you have is set by NoOfFragmentLogFiles! So the NoOfFragmentLogFiles determines the size of the REDO log.

NoOfFragmentLogFiles=3300MB/64MB=52

But, way to often the NoOfFragmentLogFiles is left to the default. Not good, and system stability is at stake.

A heuristic can also be used to determine the size of the redo log, if you don't have all data, or if you have ample of free disk space:

NoOfFragmentLogFiles= DataMemory (in MB) * 6 / 64 MB

If you have DataMemory=8G and want to use the heuristic, then:
NoOfFragmentLogFiles= 8192 * 6 / 64 MB = 768
are needed!

RedoBuffer, which sits in front of the disk should also be increased if you have the resources (highly recommended) from its default value to:
RedoBuffer=32M

Configuration template
Most things can be left with default values, but these things are worth considering:


[tcp default]
SendBufferMemory=2M

[ndbd default]
##avoid swapping, not good at all
LockPagesInMainMemory=1

## Very recommended to increase RedoBuffer:
RedoBuffer=32M


## Set the correct size of the redo log (heuristic):
NoOfFragmentLogFiles= 6 * / 64MB



In the my.cnf file, set:

[mysqld]
...
ndb-force-send=1
ndb-use-exact-count=0
engine-condition-pushdown=1
...

What's next?

If you have any questions, are interested in more info, please contact me at <> and I will try to address your inquiries.

6 comments:

nava said...

I'm interesting your solution.
and I have any question ;-)

1.In heuristic solution (= * 6 / 64 MB) "6" is ?

miru said...

Hi,
I am getting this errror. what does this mean? I am using MySQL5.0.45. Is there any variables that can be tuned to get this solved? This happens when i try to load huge data into the server.

ERROR 1114 (HY000) at line 2055: The table 'tablename' is ful l

My id is chandru.dba at gmail dot com

Regards,
Chandru.

Johan Andersson said...

Hi Chandru,
This indicates that you have either too little DataMemory or IndexMemory available to store the data set. When you get the error please do:

ndb_mgm -e "all dump 1000"

DataMemory and IndexMemory usage will be printed in the cluster log (in the datadir of the management server, often called ndb_1_cluster.log , but this depends on your setup).

You are very welcome to send me the numbers you get.

In my next post (thanks for highlighting this issue) I will explain the details of this.

Best regards,
johan

Hector said...

Hello Johan,

When I execute du -h -s ndb_3_fs on one of my datanodes the ndb_3_fs subdirectory has 61GB out of 63GB used. The disk space for this node is currently at 94%. There are two nodes in this cluster and running Mysql v. 5.045. NoOfFragmentLogFile is set to 1500. I have a feeling that the old REDO log files are not being overwritten by the new ones. Can you shed any light on
the disk space issue? Any suggestions for freeing up disk space in the ndb_3_fs directory would be greatly appreciated. Thanks

Johan Andersson said...

Hector,

How much DataMemory do you have?

br
johan

Hector said...

Data usage is 21%(35203 32K pages of total 163840)
Index usage is 2%(5799 8K pages of total 262176)
__________________________________________
Config.ini from management node
[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=5120M
IndexMemory=2048M
RedoBuffer=768M
NoOfFragmentLogFiles=1500
_______________________________________
free -m (on datanode)
total used free shared buffers cached
Mem: 7920 7697 222 0 102 2888
-/+ buffers/cache: 4707 3213
Swap: 1983 217 1766