Tuesday, October 16, 2007

Memory allocation and deallocation in MySQL Cluster

In this post I will try to explain how memory management works inside MySQL Cluster.
Memory management differs between different versions of MySQL Cluster.

The general idea is as follows:
  • Data (and ordered indexes) is stored in the DataMemory section. The DataMemory is organized in pages where each page is 32KB.
  • You can view the DataMemory as a pool of pages consisting of X number of 32 KB pages.
  • A table is associated with one or more pages from the DataMemory (pool of pages) depending on how much data is stored in the table.
That is the general stuff, now let's see how this is used.


MySQL Cluster 6.2/6.3/7.0
Except a number of optimizations and new features the memory allocator has also been revamped and now allocates pages on a per page basis.

Allocation
  1. If the table is empty and there is an insert - one page will be allocated from the page pool (DataMemory) and associated with that table.
  2. Subsequent inserts will fill up the data page until it is full.
  3. The allocator allocates one page at a time!
Deallocation
  • Currently, a delete resulting in a free page, will not return the free page to DataMemory page pool! It will still be associated with the table it was allocated to. Thus, a "DELETE FROM table1" will not free up any pages. The pages are still bound to the table. From MySQL Cluster 6.3 there is per page de-allocation.
  • truncate/drop table will release the pages to the DataMemory page pool.
  • Rolling restarts of the data nodes will not defragment empty pages, except for pages that store VAR* attributes (VAR* attrs are stored separate from fixed size attributes).
MySQL 5.0 (don't use)

Allocation
  1. If the table is empty and there is an insert - one page will be allocated from the page pool (DataMemory) and associated with that table.
  2. Subsequent inserts will fill up the data page until it is full.
  3. When the page is full the memory allocator allocating pages to a table will grow the number of pages associated with the table with 18.75%! This means that if a table is using 100 pages and all those pages are full, then the next insert will force the allocator to allocate another 18.75 (19 rounded up) pages from the DataMemory page pool to the table, resulting in that the table now has 119 pages associated with it.
Moreover, you can have free DataMemory (if you look using ndb_mgm -e "all dump 1000" , then check the cluster log what it prints out) , but still fail to do one insert with the error message "Table full", because the allocator cannot grow the table with 18.75%, because there are not enough free pages in the DataMemory page pool!!

E.g, the table is currenty uses 1000 pages, next allocation will force the table to allocate 200 additional pages, but there are only 199 pages available in the DataMemory page pool! Then you will get the "Table full" error message. However, another table may of course be able to do the allocation.

Best practice is to never fill up your database with more than 80% of the DataMemory because of this, and the fact that during System Restart the data nodes may recover the data in another order than you inserted it (thus the alllocator may have succeeded to insert the data into the database, but during recovery the allocation pattern may, and is likely to, be different), thus it may be impossible to recover your data in case of a cluster failure.

Deallocation
  • A delete resulting in a free page, will not return the free page to DataMemory page pool! It will still be associated with the table it was allocated to. Thus, a "DELETE FROM table1" will not free up any pages. The pages are still bound to table1, and subsequent inserts will be made on the pages already associated with the table.
  • truncate/drop table will release the pages to the DataMemory page pool.
  • In 5.0 - rolling restarts of the data nodes will reclaim and defragment empty pages.
  • In 5.1 - rolling restarts of the data nodes will not defragment empty pages. This has to do with a new faster node recovery protocol, called Optimized Node Recovery.
Please note that you can use ndb_mgm -e "all report mem" to view the memory utilization.

Thanks Jonas and Tomas at MySQL for your input on this.

2 comments:

unique said...

How to recover if the datamemory is full?..

Johan Andersson said...

Not sure what your cluster is in right now (started, stopped..)..

Started:
The easiest is to increase the DataMemory in config.ini and restart the management servers and then the data nodes one by one.

Stopped:
Increase the DataMemory a little bit and restart the management server and the data nodes (just by typing 'ndbd').