- Contention
- Thread Stuck in
- What you can do about it
sendbufferpool waiting for lock, contentions: 6000 spins: 489200
sendbufferpool waiting for lock, contentions: 6200 spins: 494721
Each spin is read from the L1 cache (4 cycles on a Nehalem (3.2GHz), so about a nanosecond).
1 spin = 1.25E-09 seconds (1.25ns)
In the above we have:
(494721-489200)/(6200-6000)= 27 spins/contention
Time spent on a contention=27 x 1.25E-09=3.375E-08 seconds (0.03375 us)
So we don't have a problem..
Another example (here is a lock guarding a job buffer (JBA = JobBuffer A, in short it handles signals for heartbeats and some other small things, all traffic goes over JobBuffer B).
jbalock thr: 1 waiting for lock, contentions: 145000 spins: 3280892543
jbalock thr: 1 waiting for lock, contentions: 150000 spins: 3403539479
(3403539479-3280892543)/(150000-145000)=24529 spins/contention
Time spent on a contention: 3.06613E-05 seconds (30.66us )
This is a bit higher than I would have expected and I think more analysis is needed. However, i tend not to get these contentions on a busy system.
Ndb kernel thread X is stuck in ..
Ndb kernel thread 4 is stuck in: Job Handling elapsed=100 Watchdog: User time: 82 System time: 667
Ndb kernel thread 4 is stuck in: Job Handling elapsed=200
Watchdog: User time: 82 System time: 668
Ndb kernel thread 4 is stuck in: Job Handling elapsed=300 Watchdog: User time: 82 System time: 669
Ndb kernel thread 4 is stuck in: Job Handling elapsed=400 Watchdog: User time: 82 System time: 670
Here the important is to look at how User time and System time behaves.
If User time is constant (as it is here - 82ms), but the System time is growing (667, 668 etc) which indicates that the OS kernel is busy.
Slow network? Sub-optimal kernel version? NIC drivers? swapping? some kernel process using too much cpu?
If User time is growing it is probably because the ndb kernel is overloaded.
What can you do about this?
- In config.ini:
RealtimeScheduler=1
LockExecThreadToCPU=[cpuids] - check that cpuspeed is not running ( yum remove cpuspeed )
- .. and finally ask us to optimize more!
2 comments:
Hello,
This is a very interesting post, thanks a lot.
I was wondering what's going on on my ndbmtd cluster. I have now some answers. I've made some changes on my cluster configuration as you suggest but I still have many errors like :
send lock node 17 waiting for lock, contentions: 200 spins: 166653
send lock node 19 waiting for lock, contentions: 200 spins: 172652
sendbufferpool waiting for lock, contentions: 20000 spins: 149623
recvlock waiting for lock, contentions: 1 spins: 46
recvlock waiting for lock, contentions: 2 spins: 1652
node 17,19 are my mysqld servers.
I still don't understand why I'm getting this type of errors.
Do you have some information about this ?
My Cluster is really really slow when I'm using ndbmtd. When I'm using ndbd, it's really not the same performance.
Thanks a lot for your blog!
It's hard to tell from the information here why your cluster is slow.
I'd recommend a thorough check of your environment (swapping, numa) , queries, schema, where the time is spent (mysqld, data nodes, network).
BR
johan
Post a Comment