Wednesday, September 29, 2010

MySQL Cluster - Performance (SELECT on PK)

In this post I want to show three things:
  1. How many single row SELECTs per second (on the PRIMARY KEY, no batching) you can do on on a Cluster with two data nodes
  2. Show how MySQL Cluster scales with threads and mysql servers
  3. How ndb_cluster_connection_pool affects performance
Next post will be what happens to INSERTs, and then UPDATEs.

Setup
  • two data nodes
  • one to four mysql servers
  • interconnected with Gig-E (single NIC)
deployed on six computers (of varying quality, see below). www.severalnines.com/bencher was co-located with each mysql servers to drive the load. The reads were a PK SELECT like:

SELECT data1,data2 FROM t1 WHERE id=[random];
data1 and data2 are each 256B, so in total 512B was read. There was 1M records in total in table t1, but this does not matter as the SELECT is on the PK (hash index) so it could have been 100B rows (if i have had the storage for it).

Table looks like:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data1` varchar(512) DEFAULT NULL,
`data2` varchar(512) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster
but only 256B of each 512B varchar was filled with data.

The configuration was generated using www.severalnines.com/config with the following options:
  • MySQL Cluster 7.1.7 (src)
  • Cluster Usage: Option 3) High read/High write
  • Cores: 8
  • Multi-connection: 16 (so that i later could easily change between 1,8,16 connections in the pool)
  • DataMemory=2500M
Hardware
  • Data nodes deployed on: Dual CPU Quad core Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
  • 2 mysql servers deployed on: Dual CPU Dual core with Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
  • 2 mysql servers deployed on: Dual CPU Quad core Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
Results


A few notes:
  • conn= 8 means ndb_cluster_connection_pool=8
  • 1 app means one pair of bencher + mysqld co-located one server
  • See below why 4 apps does not give >200K tx/s

Conclusions
  • Going from one mysql server to two mysql server gives a 99.9% throughput increase (32 threads).
    At this stage I wanted to try with four mysql servers, as the two mysql servers were completely CPU bound at this stage.

  • Going from two mysql servers to four mysql servers gives a 66% throughput increase.

    However, two of mysql servers were located on dual core, dual cpu, instead of dual cpu - quad core. The Dual CPU/Dual core servers were saturated already at 16 threads connected to MySQL.

    I expect, with similar hardware to get well above 200000tx/s. The MySQL servers on the dual cpu/quad core machines could do 52000tx/s, but the dual cpu/dual core could only handle ~ 38000tx/s when executing the SELECT from 32 threads.

    Hence, more than 200000tx/s should not be impossible, which would be a 92% increase in throughput compared to 2 mysql servers.

  • ndb_cluster_connection_pool=8 is a good value

  • ndb_cluster_connection_pool=1 is is really bad

  • ndb_cluster_connection_pool=16 does not give any real benefit, i can imagine it will be worse actually with more data nodes and more mysql servers b/c each data node has to read from more sockets. OTOH, epoll might help. Must be tested more :)

  • On my dual cpu/quad core machines the mysql server could scale up to 32 threads and then it hit the roof (100% CPU util including the benchmark program)

  • On my dual cpu/dual core machines the mysql server could scale up to16 threads and then it hit the roof (100% CPU util including the benchmark program)

  • CPU-bound -- the mysql servers got completely CPU bound with 32 threads

  • It would be fun to test with 16 or 24 core machines, but I don't have those. It would also be fun to test with faster CPUs.

  • The data nodes were not saturated on the peak results, but pay attention to the TC thread as in this case it accounts for most CPU util within the data node.

  • Latency numbers - i have them if someone is interested.
How to improve this more:
  • More cores, faster cores
  • more mysql servers (to scale out more)
  • eventually more data nodes
  • inifiniband
  • use ndbapi (nosql)
I think, despite MySQL Cluster being a relational database and despite we are using SQL, it is possible to get some decent numbers, and good scalability.

Thursday, September 02, 2010

Cluster - spins/contentions and thread stuck in..

I get a number of question about contentions/"stuck in..". So here comes some explanation to:
  • Contention
  • Thread Stuck in
  • What you can do about it
In 99% of the cases the contentions written out in the out file of the data nodes (ndb_X_out.log) is nothing to pay attention to.

sendbufferpool waiting for lock, contentions: 6000 spins: 489200
sendbufferpool waiting for lock, contentions: 6200 spins: 494721


Each spin is read from the L1 cache (4 cycles on a Nehalem (3.2GHz), so about a nanosecond).
1 spin = 1.25E-09 seconds (1.25ns)

In the above we have:
(494721-489200)/(6200-6000)= 27 spins/contention
Time spent on a contention=27 x 1.25E-09=3.375E-08 seconds (0.03375 us)

So we don't have a problem..

Another example (here is a lock guarding a job buffer (JBA = JobBuffer A, in short it handles signals for heartbeats and some other small things, all traffic goes over JobBuffer B).

jbalock thr: 1 waiting for lock, contentions: 145000 spins: 3280892543
jbalock thr: 1 waiting for lock, contentions: 150000 spins: 3403539479


(3403539479-3280892543)/(150000-145000)=24529 spins/contention
Time spent on a contention: 3.06613E-05 seconds (30.66us )

This is a bit higher than I would have expected and I think more analysis is needed. However, i tend not to get these contentions on a busy system.

Ndb kernel thread X is stuck in ..

Ndb kernel thread 4 is stuck in: Job Handling elapsed=100 Watchdog: User time: 82 System time: 667
Ndb kernel thread 4 is stuck in: Job Handling elapsed=200

Watchdog: User time: 82 System time: 668
Ndb kernel thread 4 is stuck in: Job Handling elapsed=300
Watchdog: User time: 82 System time: 669
Ndb kernel thread 4 is stuck in: Job Handling elapsed=400
Watchdog: User time: 82 System time: 670

Here the important is to look at how User time and System time behaves.
If User time is constant (as it is here - 82ms), but the System time is growing (667, 668 etc) which indicates that the OS kernel is busy.
Slow network? Sub-optimal kernel version? NIC drivers? swapping? some kernel process using too much cpu?

If User time is growing it is probably because the ndb kernel is overloaded.

What can you do about this?
  • In config.ini:
    RealtimeScheduler=1

    LockExecThreadToCPU=[cpuids]
  • check that cpuspeed is not running ( yum remove cpuspeed )
  • .. and finally ask us to optimize more!
Also, pay attention if you get the contentions on an idle Cluster or a busy Cluster.