Wednesday, November 03, 2010

DiskPageBufferMemory tuning and disk data statistics in MySQL Cluster 7.1.9

From MySQL Cluster 7.1.9 (not yet released) it is possible to get better stats on disk data tables. In fact, the statistics makes it possible to tune the DiskPageBufferMemory parameter (similar to innodb_bufferpool), in order to avoid disk seeks. It is much (understatement) faster to fetch data from the DiskPageBufferMemory than disk.

Here is an example/tutorial how to use this information and how to check the hit ratio of the DiskPageBufferMemory. Next time, I will explain about other counters you can get from ndbinfo.diskpagebuffer.

Finally, no more educated guesswork is needed.

Let's take an example.

I have a table t1 with 650000 record
CREATE TABLE `t1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data1` varchar(512) DEFAULT NULL,
`data2` varchar(512) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster TABLESPACE ts_1 STORAGE DISK ENGINE=ndbcluster
data1 and data2 is non-indexed columns so they will be placed on disk.
Then doing Random reads on this table with bencher:
 src/bencher -r 100000 -t 1 -q \
"SET @x=FLOOR(1 + (RAND() * 650000)); SELECT * FROM t1 WHERE id=@x"
DiskPageBufferMemory=64M
The initial performance after 340 seconds into the run:
Thread 0 - 274 tps (average tps measured after 335 secs)
Thread 0 - 273 tps (average tps measured after 340 secs)
and the hit ratio:
mysql> SELECT
node_id,
100*(sum(page_requests_direct_return)/
(sum(page_requests_direct_return)+sum(page_requests_wait_io))) as hit_ratio_pct
FROM diskpagebuffer GROUP BY node_id;
+---------+---------------+
| node_id | hit_ratio_pct |
+---------+---------------+
| 3 | 93.3577 |
| 4 | 93.6565 |
+---------+---------------+

2 rows in set (0.01 sec
A hit ratio of ~93% - not so great.

So let's increase the DiskPageBufferMemory ( i am using the configurator scripts for this)
# change DiskPageBufferMemory to 512M in config.ini
vi ../config/config.ini
# do a rolling restart of the data nodes
./rolling-restart.sh --layer=storage
## the sit back and wait for the data nodes to be rolled..
hostx: (324s elapsed) Node 3: starting (Last completed phase 100) (mysql-5.1.51 ndb-7.1.9)
hostx: Data node 3 is STARTED
## then wait for node 4..
hosty: (98s elapsed) Node 4: starting (Last completed phase 4) (mysql-5.1.51 ndb-7.1.9)

Increasing DiskPageBufferMemory=512M
Of course, while doing the rolling restart, the same traffic is still happening on the cluster. Let's check what the hit_ratio is now (while restarting node 3).

695 seconds into the run we have:
mysql>  "see above"
+---------+---------------+
| node_id | hit_ratio_pct |
+---------+---------------+
| 4 | 91.2639 |
+---------+---------------+
1 row in set (0.00 sec)
and perf is...
Thread 0 - 200 tps (average tps measured after 690 secs)
Thread 0 - 200 tps (average tps measured after 695 secs)
not good at all.

After increase to DiskPageBufferMemory=512M

When the rolling restart has completed the ./rolling-restart.sh prints out:
Cluster: Cluster has performed a rolling restart  
The DiskPageBufferMemory has now been changed to 512M.

Let's check what hit ratio we get:
mysql>  "see above"
+---------+---------------+
| node_id | hit_ratio_pct |
+---------+---------------+
| 3 | 98.2487 |
| 4 | 98.6386 |
+---------+---------------+
2 rows in set (0.01 sec)
Hit ratio is picking up and performance as well:
Thread 0 - 1141 tps (average tps measured after 1310 secs)
Thread 0 - 1149 tps (average tps measured after 1315 secs)
Now, we just have to wait for a while and let the system enter a steady state.
Then we can check back on the hit ratio and perhaps increase the DiskPageBufferMemory even more.

If you have 99.5 -- 99.9% hit ratio i would say you are good.

After 1780 seconds into the run we have:
+---------+---------------+
| node_id | hit_ratio_pct |
+---------+---------------+
| 3 | 99.2035 |
| 4 | 99.5251 |
+---------+---------------+
2 rows in set (0.00 sec)

and
Thread 0 - 1669 tps (average tps measured after 1775 secs)
Thread 0 - 1674 tps (average tps measured after 1780 secs)
... wow .. tuning the DiskPageBufferMemory really helped. And 3300 secs into the run, perf and hit ratio is still climbing:
Thread 0 - 2329 tps (average tps measured after 3305 secs)
Thread 0 - 2330 tps (average tps measured after 3310 secs

Good luck!