Monday, August 31, 2009

Data node stuck in phase 101 - what to do?

Sometimes I have seen and heard about that a data node gets stuck in start phase 101.

Unfortunately it is difficult to reproduce this (found no way yet), so no bug fix is in the pipe yet.

What happens is that in sp 101, the starting data node (actually a block called SUMA) should reconnect to the mysql server and take over the event handling (sending events to the mysql server), but it never gets the reconnect to the mysql server(s). A better explanation is here :)

If you see your data node stuck here then try the following:
1) Restart the mysql servers (one by one), the data node should now start
or
2) Restart the mysql servers and restart the data node.

Only do 2) if 1) does not work.

And if you know how to reproduce - let us know!

Tuesday, August 04, 2009

OPTIMIZE TABLE on Cluster (revisited)

Jonas just wrote a patch to this bug on OPTIMIZE TABLE, and the issue that was also discussed in this blog post. Jonas also fixed this bug when he was at it.

Before, OPTIMIZE TABLE hardly freed up any pages and to defragment you had to do a rolling restart of the data nodes.

Now, there is only a 2% discrepancy between OPTIMIZE TABLE and doing a rolling restart. This is great stuff.
This will fix will make it into 6.3.26 and 7.0.7.

See below for details:

Creating two tables, t5 and t6:

CREATE TABLE `t5` (
`id` varchar(32) CHARACTER SET utf8 NOT NULL,
`name` varchar(32) NOT NULL DEFAULT '',
`unit` int(11) DEFAULT NULL,
PRIMARY KEY (`id`,`name`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

And then the following table:

'CREATE TABLE `t6` ( `id` varchar(32) CHARACTER SET utf8 NOT NULL, `name` varchar(32) NOT NULL DEFAULT '', c0 int unsigned not null, c1 int unsigned not null, data2 varchar(255), `unit` int(11) DEFAULT NULL, PRIMARY KEY (`id`,`name`) ) ENGINE=ndbcluster DEFAULT CHARSET=latin1;


Then we use hugoLoad (a test program that isn't built by default in the src distribution of cluster) to load it with 1M records.

./hugoLoad -r 1000000 -d test t5
./hugoLoad -r 1000000 -d test t6


Memory usage after population:

Node 2: Data usage is 45%(14890 32K pages of total 32768)
Node 2: Index usage is 22%(3722 8K pages of total 16416)
Node 3: Data usage is 45%(14890 32K pages of total 32768)
Node 3: Index usage is 22%(3722 8K pages of total 16416)


Delete 500K records from t5.


mysql> set ndb_use_transactions=0;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (2.73 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.13 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.11 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (3.13 sec)

mysql> delete from t5 limit 100000;
Query OK, 100000 rows affected (2.97 sec)


Memory usage after deletion:


Node 2: Data usage is 45%(14890 32K pages of total 32768)
Node 2: Index usage is 19%(3120 8K pages of total 16416)
ndb_mgm> Node 3: Data usage is 45%(14890 32K pages of total 32768)
Node 3: Index usage is 19%(3121 8K pages of total 16416)


Run OPTIMIZE TABLE:


mysql> optimize table t5;
+---------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t5 | optimize | status | OK |
+---------+----------+----------+----------+
1 row in set (1 min 41.94 sec)



CPU UTIL during OPTIMIZE TABLE:
User:23.8% Sys: 4.9% ndbd

Memory usage after OPTIMIZE TABLE:


ndb_mgm> Node 2: Data usage is 39%(13090 32K pages of total 32768)
Node 2: Index usage is 19%(3120 8K pages of total 16416)
Node 3: Data usage is 39%(13090 32K pages of total 32768)
Node 3: Index usage is 19%(3121 8K pages of total 16416)


Delete 500K (50%) of the records from t6:

mysql> set ndb_use_transactions=0;
Query OK, 0 rows affected (0.00 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (2.77 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.05 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.12 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.19 sec)

mysql> delete from t6 limit 100000;
Query OK, 100000 rows affected (3.24 sec)


Memory usage after deletion:


Node 2: Data usage is 39%(13090 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 39%(13090 32K pages of total 32768)
Node 3: Index usage is 15%(2522 8K pages of total 16416)


Run OPTIMIZE TABLE:


mysql> optimize table t6;
+---------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t6 | optimize | status | OK |
+---------+----------+----------+----------+
1 row in set (1 min 45.83 sec)


Memory usage after OPTIMIZE TABLE:


Node 2: Data usage is 28%(9249 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 28%(9249 32K pages of total 32768)
Node 3: Index usage is 15%(2522 8K pages of total 16416)


After restart of node 3:


Node 2: Data usage is 28%(9249 32K pages of total 32768)
Node 2: Index usage is 15%(2521 8K pages of total 16416)
Node 3: Data usage is 26%(8568 32K pages of total 32768)
Node 3: Index usage is 13%(2236 8K pages of total 16416)


Only two percent diff between rolling restart and OPTIMIZE TABLE. Excellent. Thank you Jonas for fixing this.

Upgrade 6.3.x to 7.0.6

Currently a number of users has reported upgrade issues on 6.3.x to 7.0.6 (below is an example what might happen to you).

Thus, we don't recommend you to upgrade from 6.3.x to 7.0.6 in a production system yet.
However, we are very interested in bug reports and and help with testing.
We are also of course working on fixing these issues and will keep you posted when things has improved. Sorry for the inconvenience.

Currently the "best practice" to upgrade is to:
1) backup / mysqldump
2) start cluster 7.0.6 with initial so it is completely blank
3) restore backup / load dump


ERROR 1005 (HY000): Can't create table 'test.#sql-3d25_aa905' (Errno: 140)

mysql> show warnings;
+-------+------+--------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------+
| Error | 1296 | Error 755 'Invalid tablespace' from NDB |
| Error | 1005 | Can't create table 'test.#sql-3d25_aa905' (Errno: 140) |
+-------+------+--------------------------------------------------------------------------+

Monday, August 03, 2009

Cluster + Dolphin DX = Sweet!

At a customer I have had the opportunity to work with some really state of the art equipment.

The customer is in the online gaming industry and we are building a platform based on MySQL Cluster where we are consolidating a number of applications. Workload is mainly inserts, updates, pk reads, and index scans (quite frequent), and a few joins. The cluster handles the transient data (current working set), and is replicating to a long term data store (following this approach), and we are also feeding other back-end support systems from this.

The cluster (>8 computers) is interconnected with Dolphin DX adapters. Computers are SUNs Nehalem based servers (Intel XEON E5540). The cluster (7.0.6) consists of 4 data nodes each having 32GB of RAM, plenty of RAID 1+0 storage, and >4 mysql servers. All running on the Nehalems.. It is so sweet :)

So much did things improve with the DX adapters over Gig-E? Just by plugging it in we got an overall 2.5-3x performance improvement (throughput up, response times down). I love it. Now we will continue optimizing the setup, schemas, queries etc to really find out.

I can really recommend the investment in DX adapters (remote memory access is faster than local disk) if you are after the top-notch performance , and the company behind it, Dolphin ICS is really fantastic to work with. If the DX switch would fail, then it will automatically fall back on the Gig-E adapters, transparently, and no service interruption (you or the applications don't notice a thing, except that performance drops).

Note: Dolphin ICS' adapters offers a socket interface. Thus, any TCP/IP application can benefit from the low latency and high bandwidth.