Johan Andersson's Cluster and HA Blog: memory management

Tuesday, August 04, 2009

OPTIMIZE TABLE on Cluster (revisited)

Jonas just wrote a patch to this bug on OPTIMIZE TABLE, and the issue that was also discussed in this blog post. Jonas also fixed this bug when he was at it.

Before, OPTIMIZE TABLE hardly freed up any pages and to defragment you had to do a rolling restart of the data nodes.

Now, there is only a 2% discrepancy between OPTIMIZE TABLE and doing a rolling restart. This is great stuff.
This will fix will make it into 6.3.26 and 7.0.7.

See below for details:

Creating two tables, t5 and t6:


CREATE TABLE `t5` (
`id` varchar(32) CHARACTER SET utf8 NOT NULL,
`name` varchar(32) NOT NULL DEFAULT '',
`unit` int(11) DEFAULT NULL,
PRIMARY KEY (`id`,`name`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

And then the following table:


'CREATE TABLE `t6` ( `id` varchar(32) CHARACTER SET utf8 NOT NULL, `name` varchar(32) NOT NULL DEFAULT '', c0 int unsigned not null, c1 int unsigned not null, data2 varchar(255),  `unit` int(11) DEFAULT NULL, PRIMARY KEY (`id`,`name`) ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

Then we use hugoLoad (a test program that isn't built by default in the src distribution of cluster) to load it with 1M records.


./hugoLoad -r 1000000 -d test t5
./hugoLoad -r 1000000 -d test t6

Memory usage after population:


Node 2: Data usage is 45%(14890 32K pages of total 32768)
Node 2: Index usage is 22%(3722 8K pages of total 16416)
Node 3: Data usage is 45%(14890 32K pages of total 32768)
Node 3: Index usage is 22%(3722 8K pages of total 16416)

Delete 500K records from t5.


mysql> set ndb_use_transactions=0;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (2.73 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.13 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.11 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.13 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (2.97 sec)

Memory usage after deletion:


Node 2: Data usage is 45%(14890 32K pages of total 32768)
Node 2: Index usage is 19%(3120 8K pages of total 16416)
ndb_mgm> Node 3: Data usage is 45%(14890 32K pages of total 32768)
Node 3: Index usage is 19%(3121 8K pages of total 16416)

Run OPTIMIZE TABLE:


mysql> optimize table t5;
+---------+----------+----------+----------+
| Table   | Op       | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t5 | optimize | status   | OK       |
+---------+----------+----------+----------+
1 row in set (1 min 41.94 sec)

CPU UTIL during OPTIMIZE TABLE:
User:23.8% Sys: 4.9% ndbd

Memory usage after OPTIMIZE TABLE:

                  
ndb_mgm> Node 2: Data usage is 39%(13090 32K pages of total 32768)
Node 2: Index usage is 19%(3120 8K pages of total 16416)
Node 3: Data usage is 39%(13090 32K pages of total 32768)
Node 3: Index usage is 19%(3121 8K pages of total 16416)

Delete 500K (50%) of the records from t6:


mysql> set ndb_use_transactions=0;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (2.77 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.05 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.12 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.19 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.24 sec)

Memory usage after deletion:


Node 2: Data usage is 39%(13090 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 39%(13090 32K pages of total 32768)
Node 3: Index usage is 15%(2522 8K pages of total 16416)

Run OPTIMIZE TABLE:


mysql> optimize table t6;
+---------+----------+----------+----------+
| Table   | Op       | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t6 | optimize | status   | OK       |
+---------+----------+----------+----------+
1 row in set (1 min 45.83 sec)

Memory usage after OPTIMIZE TABLE:


Node 2: Data usage is 28%(9249 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 28%(9249 32K pages of total 32768)
Node 3: Index usage is 15%(2522 8K pages of total 16416)

After restart of node 3:


Node 2: Data usage is 28%(9249 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 26%(8568 32K pages of total 32768)
Node 3: Index usage is 13%(2236 8K pages of total 16416)

Only two percent diff between rolling restart and OPTIMIZE TABLE. Excellent. Thank you Jonas for fixing this.

Monday, March 16, 2009

Memory (de)allocation,(de)fragmentation, and chkfrag!

(this blog post applies to MySQL Cluster 6.3 and MySQL Cluster 6.4/7.0)

A while ago I discussed Memory Allocation in MySQL Cluster. Here it comes again, but we also discuss fragmentation and a tool to help analyze if there is fragmentation.

The main problem I want to address here is that "I filled up 100% of my DataMemory, then deleted 40% and I cannot create a new table!!". Now this is not such a common problem, but can happen in systems with a lot of insert/deletes (especially on VAR* columns), and can be an issue if you need to create a new table. If the allocation patterns are always on the same set of tables, then this is less of an issue.

Allocation

If the table is empty and there is an insert - one page will be allocated from the page pool (DataMemory) and associated with that table.
Subsequent inserts will fill up the data page until it is full.
NEW: Attributes in a table are divided into a fixed size part, and a variable part (if the table has VAR* or even DYNAMIC attributes). The fixed size part is stored together with the record header and from there is a reference to the VAR*/DYNAMIC part. The VAR*/DYNAMIC part and the FIXED part are stored on different pages and are never mixed!
The allocator allocates one page at a time!
If there is a hole on an already allocated page, and this hole fits the data that should be inserted, then this hole will be used.
5) means that space within a page that is associated with a table will be reused.

Deallocation

From MySQL Cluster 6.3 there is per page de-allocation, so an empty page after a DELETE will release the pages to the DataMemory page pool.
TRUNCATE/DROP TABLE will release the pages to the DataMemory page pool.

Fragmentation/Defragmentation

The pages will be fragmented when you insert/delete record.
OPTIMIZE TABLE can help to defragment pages within a table, and free (empty) pages will be put on the page pool (see Deallocation above).
In MySQL Cluster - rolling restarts of the data nodes will not defragment the FIXED size pages, but VAR*/DYNAMIC pages will be defragmented.

But how do you know if you have fragmentation or not? A new tool, chkfrag (beta) can help you.

Back to the problem: "I filled up 100% of my DataMemory, then deleted 40% and I cannot create a new table!!". Why is that?

Pretened that we have a MySQL Cluster up and running. We have only four pages and all of them are being used (figure 1):

We cannot create a new table at this stage, because we don't have an free pages that can be used to store records.

Now we delete 40% of the records. This frees up space on the pages (shown in figure 2). We can still not create a new table, but if we can of course insert new records into t1 provided they fit. OPTIMIZE TABLE can help to reduce fragmentation.

In its current form and foreseeable future, OPTIMIZE TABLE, will only compact the VAR* pages. Fixed size will not be compacted. And frankly, it is not great on defragmentation either. A number of IFs and buts about record sizes etc mak
es it rather inefficient (in most cases). It would be good to have the option to run it more aggressive. Figure 3 shows what it could look like after OPTIMIZE TABLE.

Currently, the most efficient to defragmentation method is simply to perform a rolling restart (figure 4).. and chkfrag let's you know if you need to!

Example
Below follows an example on fragmentation and how chkfrag can help:


CREATE TABLE `t5` (
`id` varchar(32) CHARACTER SET utf8 NOT NULL,
`name` varchar(32) NOT NULL DEFAULT '',
`unit` int(11) DEFAULT NULL,
PRIMARY KEY (`id`,`name`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

Then we use hugoLoad (a test program that isn't built by default in the src distribution of cluster) to load it with 1M records.


./hugoLoad -r 1000000 -d test t5

Checking fragmentation:

johan@homecomputer:~/dev/chkfrag$ src/chkfrag
Gathering information how much DataMemory is being used
Used DataMemory calculated:    338MB    10816 pages (32KB)
Used DataMemory reported:    324MB    10368 pages (32KB)
Total DataMemory reported:    512MB    16384 pages (32KB)
Percentage of VAR* in database: 42.01 percent
There is no fragmentation!

In this case there is a discrepancy between Calculated DataMemory and real data memory.
That is ok, but when multiplying the difference with the percentage of VAR* in the database, then the fragmenation is too small, i.e, less than 10MB. Btw, this value is taken out of the blue..so there is no science with that.

deleting 200000 random records and checking fragmentation again:


Gathering information how much DataMemory is being used
Used DataMemory calculated:    270MB    8640 pages (32KB)
Used DataMemory reported:    324MB    10368 pages (32KB)
Total DataMemory reported:    512MB    16384 pages (32KB)
Percentage of VAR* in database: 41.85 percent
Fragmentation of VAR* in database: 16.67 percent
Possibly lost memory (locked in by tables): 22.60MB

Recommendation:
1. Run OPTIMIZE TABLE on tables with a lot of VARCHAR/VARBINARY.
If you are not happy with the result of this then do step 2)!
2. Perform a rolling restart.

Running the OPTIMIZE TABLE:


mysql> optimize table t5;
+---------+----------+----------+----------+
| Table   | Op       | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t5 | optimize | status   | OK       |
+---------+----------+----------+----------+
1 row in set (3 min 42.44 sec)

Checking the fragmentation:


Gathering information how much DataMemory is being used
Used DataMemory calculated:    270MB    8640 pages (32KB)
Used DataMemory reported:    324MB    10368 pages (32KB)
Total DataMemory reported:    512MB    16384 pages (32KB)
Percentage of VAR* in database: 41.85 percent
Fragmentation of VAR* in database: 16.67 percent
Possibly lost memory (locked in by tables): 22.60MB

Recommendation:
1. Run OPTIMIZE TABLE on tables with a lot of VARCHAR/VARBINARY.
If you are not happy with the result of this then do step 2)!
2. Perform a rolling restart.

Hmm... OPTIMIZE TABLE does not work so great (i have however seen it reclaimin some pages sometimes..).
And after a rolling restart:


Gathering information how much DataMemory is being used
Used DataMemory calculated:    270MB    8640 pages (32KB)
Used DataMemory reported:    272MB    8704 pages (32KB)
Total DataMemory reported:    512MB    16384 pages (32KB)
Percentage of VAR* in database: 41.48 percent
Fragmentation of VAR* in database: 0.74 percent
Possibly lost memory (locked in by tables): 0.83MB

We have reclaimed 324MB-272MB=52MB!

Conclusion:
A dba should never have to perform a rolling restart in order to defragment (and arguably in no other case either). OPTIMIZE TABLE should be improved and perhaps have a "FULL" flag, indicating how thorough it should be.

Johan Andersson's Cluster and HA Blog

Tuesday, August 04, 2009

OPTIMIZE TABLE on Cluster (revisited)

Monday, March 16, 2009

Memory (de)allocation,(de)fragmentation, and chkfrag!

Blog Archive

About Me

Good resources

Blogs