Thursday, March 19, 2009

online add node - preliminary numbers

I want to measure the time it takes to do various ONLINE ADD NODE operations in MySQL Cluster 7.0 and especially how long time it takes to repartition the data onto the newly added nodes. So here comes the first post on the subject.

First off: Go from two data nodes to four using ONLINE ADD NODE and REORGANIZE PARTITION of 2GB tables, and no traffic system.

Setup for each data node:
  • Dual cpu - quad core running at 2.33GHz.
  • 8GB of RAM
  • DataMemory=4096MB
  • MaxNoOfExecutionThreads=8
  • Configuration generated by Configurator
  • Bound execution threads to cores (LockExec/Maint....Thread..)
  • Completely distributed setup.
Created the following table and filled it with 3M rows which gives a table of 2GB in size:

CREATE TABLE `t1` (
`a` bigint(20) unsigned NOT NULL,
`b` int(10) unsigned DEFAULT NULL,
`c` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`d` varbinary(1024) DEFAULT NULL,
PRIMARY KEY (`a`),
KEY `c` (`c`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

Running add_node.sh to add two more data nodes (5 and 6):

ndb_mgm> all status
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.1.32 ndb-7.0.4)
Node 4: started (mysql-5.1.32 ndb-7.0.4)
Node 5: not connected
Node 6: not connected

Running the add_node.sh script (no traffic to db) took 14m48.535s (bulk of the time spent in rolling restart of the data nodes 3 and 4 - there is another way of doing this part without rolling restart, but that requires initial planning.. i will revisit that soon).

After the add_node.sh script finished :
ndb_mgm> all status
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.1.32 ndb-7.0.4)
Node 4: started (mysql-5.1.32 ndb-7.0.4)
Node 5: started (mysql-5.1.32 ndb-7.0.4)
Node 6: started (mysql-5.1.32 ndb-7.0.4)


Nice! But the nodes 5 and 6 does not yet have any data. We need to run alter table t1 reorganize partition; to reorganize the data.

mysql> alter table t1 reorganize partition;
Query OK, 0 rows affected (4 min 7.97 sec)
Records: 0 Duplicates: 0 Warnings: 0

Moreover, alter table .. reorganize partition; is quite resource intensive (see picture at the end), so adding nodes online is only recommended at off-peak times:

Next I will look into how much reads/writes are affected by the reorg.


Verification
To verify that the data has been reorganized correctly, we can use ndb_desc.
Before adding nodes and ALTER TABLE ... REORGANIZE PARTITION;

[root@computer1 scripts]# ndb_desc -p -d test t1
-- t1 --
...
Version: 2
...
TableStatus: Retrieved
-- Attributes --
a Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR
b Unsigned NULL AT=FIXED ST=MEMORY
c Timestamp NOT NULL AT=FIXED ST=MEMORY
d Longvarbinary(1024) NULL AT=MEDIUM_VAR ST=MEMORY

-- Indexes --
PRIMARY KEY(a) - UniqueHashIndex
c(c) - OrderedIndex
PRIMARY(a) - OrderedIndex

-- Per partition info --
Partition Row count Commit count Frag fixed memory Frag varsized memory
0 374952 374952 16613376 261521408
2 375146 375146 16613376 261685248
4 375351 375351 16613376 261816320
6 374979 374979 16613376 261357568
3 374368 374368 16580608 261685248
5 374641 374641 16580608 261816320
1 375378 375378 16613376 261816320
7 375185 375185 16613376 261685248

NDBT_ProgramExit: 0 - OK

Each partition has about 375000 rows.

After ALTER TABLE ... REORGANIZE PARTITION;
Verifying that the data is distributed as expected (check column 2, "row count"):

[root@computer1 scripts]# ndb_desc -p -d test t1
-- t1 --
...
Version: 16777217
...
-- Attributes --
a Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY
b Unsigned NULL AT=FIXED ST=MEMORY
c Timestamp NOT NULL AT=FIXED ST=MEMORY
d Longvarbinary(1024) NULL AT=MEDIUM_VAR ST=MEMORY

-- Indexes --
PRIMARY KEY(a) - UniqueHashIndex
c(c) - OrderedIndex
PRIMARY(a) - OrderedIndex

-- Per partition info --
Partition Row count Commit count Frag fixed memory Frag varsized memory
0 187296 750264 16613376 261521408
2 187426 750586 16613376 261685248
4 188058 749937 16613376 261816320
6 187238 750461 16613376 261357568
1 187847 750440 16613376 261816320
3 187602 747900 16580608 261685248
5 187392 749139 16580608 261816320
7 187610 750335 16613376 261652480
8 187656 187656 8323072 131006464
12 187293 187293 8290304 130613248
10 187720 187720 8323072 130842624
14 187741 187741 8323072 130973696
9 187531 187531 8323072 130809856
13 187249 187249 8290304 130875392
11 186766 186766 8290304 130678784
15 187575 187575 8323072 130875392

Each partition has about 187000 rows.
As you can see above, all partitions that were added (partition 8-15) and the previous parititons (0-7) have the same row count (more or less).

CPU UTIL

Below is a picture showing the CPU utlization during the reorg:


And for each thread:

3 comments:

Cyril Scetbon said...

Do you think using Google Perftools (thread cache malloc) with MySQL Cluster should give a performance gain ?

Johan Andersson said...

Cyril,
Does not matter for the data nodes (they prealloc everything they needd). For the mysql server, yes it could be interesting.

Anonymous said...

Does alter table .. reorganize partition put an exclusive lock on the table? I was executing this statement for a table with 1.5 million rows and I ran a count(*) on this table from another session. This statement was waiting for the result until the table reorg was completed.

Thanks.