Tuesday, April 20, 2010

Tuning your Cluster with ndbinfo (7.1) part 1 of X

The new ndbinfo interface in 7.1 is really useful to assist in tuning MySQL Cluster. Here is an example (more will follow):

I started with one test where I inserted two blobs (1KB + 1KB) in one table.
From 16 threads (colocated with one mysqld, two data nodes, separate computers) and one application driving the load I reached about 6960TPS, and the utilization of the redo buffers (controlled by the parameter RedoBuffer in config.ini) looked like:

mysql< select * from ndbinfo.logbuffers;
+---------+----------+--------+----------+----------+--------+
| node_id | log_type | log_id | log_part | total | used |
+---------+----------+--------+----------+----------+--------+
| 3 | REDO | 0 | 1 | 50331648 | 196608 |
| 3 | REDO | 0 | 2 | 50331648 | 294912 |
| 3 | REDO | 0 | 3 | 50331648 | 131072 |
| 3 | REDO | 0 | 4 | 50331648 | 229376 |
| 4 | REDO | 0 | 1 | 50331648 | 229376 |
| 4 | REDO | 0 | 2 | 50331648 | 262144 |
| 4 | REDO | 0 | 3 | 50331648 | 163840 |
| 4 | REDO | 0 | 4 | 50331648 | 229376 |
+---------+----------+--------+----------+----------+--------+
8 rows in set (0.01 sec)
Which is basically nothing.

I then increased the load and inserted 2 x 5120B BLOBs (from 16 threads one MySQL server), and run with an insert speed of 4320TPS:
mysql< select * from ndbinfo.logbuffers;
+---------+----------+--------+----------+----------+----------+
| node_id | log_type | log_id | log_part | total | used |
+---------+----------+--------+----------+----------+----------+
| 3 | REDO | 0 | 1 | 50331648 | 11468800 |
| 3 | REDO | 0 | 2 | 50331648 | 31522816 |
| 3 | REDO | 0 | 3 | 50331648 | 42008576 |
| 3 | REDO | 0 | 4 | 50331648 | 43057152 |
| 4 | REDO | 0 | 1 | 50331648 | 14090240 |
| 4 | REDO | 0 | 2 | 50331648 | 17432576 |
| 4 | REDO | 0 | 3 | 50331648 | 10321920 |
| 4 | REDO | 0 | 4 | 50331648 | 12615680 |
+---------+----------+--------+----------+----------+----------+

Above you can see that the redo buffers are used (the load will be spread around, and it is hard to catch a moment where the load is even on all buffers), and now the application started to throw the error "Got temporary error 1221 'REDO buffers overloaded (increase RedoBuffer)' from NDBCLUSTER (1297)"

I can now follow the instruction to increase the REDO buffer, but would it help in this case?
No, no and no.
The disk is too slow to keep up and cannot write out to disk in the same rate as the application writes out.

'iostat' gives:
< iostat -kx 1

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cciss/c0d1 0.00 27796.00 0.00 1454.00 0.00 115196.00 158.45 12.03 8.25 0.66 95.30
dm-0 0.00 0.00 0.00 29270.00 0.00 117080.00 8.00 274.79 9.33 0.03 95.20
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


And here you can see that the disks are quite utilized. This means that I have two options now if I want to be able to sustain the 4320TPS insert load:
  • Increase the number of data nodes (computers) so instead of having two computers, I should have four so that I spread the load across more hardware
  • Improve my disk subsystem (add better disks, e.g, to have 2-4 disk spindles to spread the load on), or by having the REDO log on device cciss/c0d1 and the the LCP on device cciss/c0d0.
The CPU, could that also been an bottleneck in this case? No, it was not the issue. The CMVMI thread (one of the data nodes threads) was spending 44.4% polling data from the other nodes, and it is reading in quite large packets so that is why it was the heaviest user of CPU of the data node threads.
5453 root      20   0 6594m 4.1g 6956 R 44.4 51.9   4:05.64 ndbmtd
5471 root 20 0 6594m 4.1g 6956 S 32.5 51.9 3:39.07 ndbmtd
5474 root 20 0 6594m 4.1g 6956 R 26.6 51.9 2:25.55 ndbmtd
5475 root 20 0 6594m 4.1g 6956 S 23.7 51.9 2:25.01 ndbmtd
5476 root 20 0 6594m 4.1g 6956 R 23.7 51.9 2:20.83 ndbmtd
5473 root 20 0 6594m 4.1g 6956 R 21.7 51.9 2:26.57 ndbmtd

4 comments:

Stewart Smith said...

Very glad to see NDBINFO being useful in easily determining what the real issue is in a system.

Nishant Mittal said...

Hi Johan,
I was wondering if you can guide me with the following issue...

We have a MySQL Cluster with 1 data node (64-bit it has 16 cores and 48GB of RAM). its operating in diskless mode (config file attached).

We are trying to insert about 30K records/sec. the table has the following schema..

CREATE TABLE `TickStore`.`trades` (
`id` int(10) unsigned NOT NULL,
`bsid` bigint(20) unsigned NOT NULL,
`date` char(8) NOT NULL,
`time` char(12) NOT NULL,
`price` float unsigned NOT NULL,
`size` mediumint(8) unsigned NOT NULL,
`code` char(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `Index_2` (`bsid`,`date`,`time`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

after running for about an hour I get the following error message...

Error in ndbapi.cpp, line: 137, code: 410, msg: REDO log files overloaded, consult online manual (decrease TimeBetweenLocalCheckpoints, and|or increase NoOfFragmentLogFiles).

I tried decreasing the TimeBetweenLocalCheckpoints and increasing NoOfFragmentLogFiles still the same result (after restarting the cluster with --initial).

---Config file---
# Options affecting ndbd processes on all data nodes:
[ndbd default]
NoOfReplicas=1 # Number of replicas (This must match the number of data nodes!!!)
DataMemory=38000M # How much memory to allocate for data storage
IndexMemory=9000M # How much memory to allocate for index storage
# For DataMemory and IndexMemory, we have used the
# default values. Since the "world" database takes up
# only about 500KB, this should be more than enough for
# this example Cluster setup.
Diskless=1
TimeBetweenLocalCheckpoints=5
NoOfFragmentLogFiles=100

[tcp default]
#portnumber=2202 # This the default; however, you can use any port that is free
# for all the hosts in the cluster
# Note: It is recommended that you do not specify the port
# number at all and allow the default value to be used instead

# Management process options
[ndb_mgmd]
hostname=192.168.42.91 # Hostname or IP address of management node
datadir=/var/lib/mysql-cluster # Directory for management node log files

# First Data Node (ID=2 because the management node autmatically obtains ID=1)
[ndbd]
Id=2
hostname=192.168.42.91
datadir=/usr/local/mysql/data

# MySQLD Processing Nodes (workhorses) - Defining 4 will allow 4 to join the cluster
# from any location on the network. We use 4 because our data nodes also have
# mysqld running, so we are going to use them to process as well.
[mysqld]
[mysqld]
[mysqld]
[mysqld]

========================================

I need help to see if this high insert rate is something the cluster can not support or if I am doing something wrong.

any pointers would be appreciated.

thanks
Nishant

Johan Andersson said...

hi,
sorry for my late comment..

I do recommend that you get a config from http://www.severalnines.com/config

It will set redo log and redo buffer correctly.

If the problem still persist, get faster disks or increase number of data nodes to spread the load on!

good luck!

Johan Andersson said...

I was a bit quick - i see you run diskless=1, interesting..


Please try to create the table(s) as:

set ndb_table_no_logging=1;
//create your tables
set ndb_table_no_logging=0;

With the diskless=1, the data nodes will still do the LCP + REDO except the last step which is to write to disk. I need to check how you could get the error without touching the disk.

The disk is better avoided with ndb_Table_no_logging=1 than diskless=1, as the LCP + REDO code will not be executed.