E.g, to abort transactions in the PREPARE phase (Cluster uses a variant of the Two Phase Commit protocol) that has been started on the Data node that fails.
Various protocols (if the failed data node is the master) must also fail-over.
These things are carried out quite swiftly (configurable, but a couple of seconds by default but it also depends on the size of transactions that in the prepare phase that needs to be aborted).
I wanted to check how long time it takes to do node failure handling with many tables (16000+) in the cluster, because in a "normal" database with a couple of hundred tables, then the time spent on node failure handling is not a factor.
In this case I have two data nodes:
- Node 3 (master)
- Node 4
Node 3 was killed, and the Node 4 then has to do node failure handling:
2009-09-16 14:23:41 [MgmSrvr] WARNING -- Node 4: Failure handling of node 3 has not completed in 15 min. - state = 6Eventually, we get the message :
2009-09-16 14:24:41 [MgmSrvr] WARNING -- Node 4: Failure handling of node 3 has not completed in 16 min. - state = 6
2009-09-16 14:25:41 [MgmSrvr] WARNING -- Node 4: Failure handling of node 3 has not completed in 17 min. - state = 6
2009-09-16 14:26:13 [MgmSrvr] INFO -- Node 4: Communication to Node 3 opened
2009-09-16 14:26:13 [MgmSrvr] INFO -- Node 4: Communication to Node 3 openedWe cannot start to recover Node 3 until this message has been written out in the cluster log!
If you try to connect the data node before this, then the management server will say something like "No free slot found for node 3", which basically means in this case that the node failure handling is ongoing.
Now we have got the message and we can now start data node 3 again and it will recover
Why is it taking long time:
- The surviving data nodes that are handling the failures must write for each table that node three has crashed. This is written to disk. This is what takes time when you have many tables (about 17 minutes with >16000 tables) so about 1000 tables per minute.. So in most databases it does not take that much time. Thanks Jonas for you explanation.