Johan Andersson's Cluster and HA Blog: online add node

I want to measure the time it takes to do various ONLINE ADD NODE operations in MySQL Cluster 7.0 and especially how long time it takes to repartition the data onto the newly added nodes. So here comes the first post on the subject.

First off: Go from two data nodes to four using ONLINE ADD NODE and REORGANIZE PARTITION of 2GB tables, and no traffic system.

Setup for each data node:

Dual cpu - quad core running at 2.33GHz.
8GB of RAM
DataMemory=4096MB
MaxNoOfExecutionThreads=8
Configuration generated by Configurator
Bound execution threads to cores (LockExec/Maint....Thread..)
Completely distributed setup.

Created the following table and filled it with 3M rows which gives a table of 2GB in size:


CREATE TABLE `t1` (
`a` bigint(20) unsigned NOT NULL,
`b` int(10) unsigned DEFAULT NULL,
`c` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`d` varbinary(1024) DEFAULT NULL,
PRIMARY KEY (`a`),
KEY `c` (`c`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

Running add_node.sh to add two more data nodes (5 and 6):


ndb_mgm> all status
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.1.32 ndb-7.0.4)
Node 4: started (mysql-5.1.32 ndb-7.0.4)
Node 5: not connected
Node 6: not connected

Running the add_node.sh script (no traffic to db) took 14m48.535s (bulk of the time spent in rolling restart of the data nodes 3 and 4 - there is another way of doing this part without rolling restart, but that requires initial planning.. i will revisit that soon).

After the add_node.sh script finished :

ndb_mgm> all status
Connected to Management Server at: localhost:1186
Node 3: started (mysql-5.1.32 ndb-7.0.4)
Node 4: started (mysql-5.1.32 ndb-7.0.4)
Node 5: started (mysql-5.1.32 ndb-7.0.4)
Node 6: started (mysql-5.1.32 ndb-7.0.4)

Nice! But the nodes 5 and 6 does not yet have any data. We need to run alter table t1 reorganize partition; to reorganize the data.

mysql> alter table t1 reorganize partition;
Query OK, 0 rows affected (4 min 7.97 sec)
Records: 0  Duplicates: 0  Warnings: 0

Moreover, alter table .. reorganize partition; is quite resource intensive (see picture at the end), so adding nodes online is only recommended at off-peak times:

Next I will look into how much reads/writes are affected by the reorg.

Verification
To verify that the data has been reorganized correctly, we can use ndb_desc.
Before adding nodes and ALTER TABLE ... REORGANIZE PARTITION;


[root@computer1 scripts]# ndb_desc -p -d test t1
-- t1 --
...
Version: 2
...
TableStatus: Retrieved
-- Attributes --
a Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY AUTO_INCR
b Unsigned NULL AT=FIXED ST=MEMORY
c Timestamp NOT NULL AT=FIXED ST=MEMORY
d Longvarbinary(1024) NULL AT=MEDIUM_VAR ST=MEMORY

-- Indexes --
PRIMARY KEY(a) - UniqueHashIndex
c(c) - OrderedIndex
PRIMARY(a) - OrderedIndex

-- Per partition info --
Partition    Row count    Commit count    Frag fixed memory    Frag varsized memory
0             374952       374952          16613376             261521408     
2            375146       375146          16613376             261685248     
4            375351       375351          16613376             261816320     
6            374979       374979          16613376             261357568     
3            374368       374368          16580608             261685248     
5            374641       374641          16580608             261816320     
1            375378       375378          16613376             261816320     
7            375185       375185          16613376             261685248     

NDBT_ProgramExit: 0 - OK

Each partition has about 375000 rows.

After ALTER TABLE ... REORGANIZE PARTITION;
Verifying that the data is distributed as expected (check column 2, "row count"):


[root@computer1 scripts]# ndb_desc -p -d test t1
-- t1 --
...
Version: 16777217
...
-- Attributes --
a Bigunsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY
b Unsigned NULL AT=FIXED ST=MEMORY
c Timestamp NOT NULL AT=FIXED ST=MEMORY
d Longvarbinary(1024) NULL AT=MEDIUM_VAR ST=MEMORY

-- Indexes --
PRIMARY KEY(a) - UniqueHashIndex
c(c) - OrderedIndex
PRIMARY(a) - OrderedIndex

-- Per partition info --
Partition Row count Commit count Frag fixed memory Frag varsized memory
0          187296    750264       16613376          261521408        
2         187426    750586       16613376          261685248        
4         188058    749937       16613376          261816320        
6         187238    750461       16613376          261357568        
1         187847    750440       16613376          261816320        
3         187602    747900       16580608          261685248        
5         187392    749139       16580608          261816320        
7         187610    750335       16613376          261652480        
8         187656    187656       8323072           131006464        
12        187293    187293       8290304           130613248        
10        187720    187720       8323072           130842624        
14        187741    187741       8323072           130973696        
9         187531    187531       8323072           130809856        
13        187249    187249       8290304           130875392        
11        186766    186766       8290304           130678784        
15        187575    187575       8323072           130875392

Each partition has about 187000 rows.
As you can see above, all partitions that were added (partition 8-15) and the previous parititons (0-7) have the same row count (more or less).

CPU UTIL

Below is a picture showing the CPU utlization during the reorg:

And for each thread:

3 comments:

Cyril Scetbon said...: Do you think using Google Perftools (thread cache malloc) with MySQL Cluster should give a performance gain ?; Friday, March 20, 2009 10:34:00 AM
Johan Andersson said...: Cyril,
Does not matter for the data nodes (they prealloc everything they needd). For the mysql server, yes it could be interesting.; Thursday, April 16, 2009 1:46:00 PM
Anonymous said...: Does alter table .. reorganize partition put an exclusive lock on the table? I was executing this statement for a table with 1.5 million rows and I ran a count(*) on this table from another session. This statement was waiting for the result until the table reorg was completed.

Thanks.; Tuesday, June 19, 2012 10:51:00 AM

Johan Andersson's Cluster and HA Blog

Thursday, March 19, 2009

online add node - preliminary numbers

3 comments:

Blog Archive

About Me

Good resources

Blogs