Thursday, February 23, 2012

Deploy MySQL Cluster 7.2 GA in 288 seconds

It seems that our friends at Oracle have been pretty busy with the GA release of MySQL Cluster 7.2.

This is not just old wine in new bottles.

While it may be a dot release, it does appear to be a more significant step forward than a dot release would imply.

First off, we are very excited to announce that the Severalnines Cluster Configurator now includes support for 7.2 GA.

As the title of this blog suggests, it is possible, as we have experienced, to deploy 7.2 GA in 288 seconds, i.e. just a few minutes. This was done on m1.large instances on Amazon. We have published a quick how-to deployment guide for Amazon here: http://support.severalnines.com/entries/20916823-installing-on-amazon-ec2.

This being said, let’s have a look at some of the new features in 7.2 and why they matter.

MySQL Cluster has always suffered from poor JOIN performance, mainly because the nested loop algorithm did not work very well in a parallel environment. That meant the join was executed entirely in the MySQL Server. This resulted in a lot of network round-trips between the MySQL Server and the data nodes.

Now with 7.2, this has changed. With Adaptive Query Localization, a JOIN query is now shipped down to the data nodes and executed in parallel on local copies of data.


Results look very promising, with up to 70x factor improvement on some queries as described in this excellent post.

By getting better index statistics, the SQL optimizer can also build a better execution plan for each query. This in turn reduces the need for manual query tuning. Prior to 7.2, a lot of index hinting was needed. However, 70x speed improvement does not mean that a single query on MySQL Cluster will perform as fast as on InnoDB, since there will always be the network latency between SQL and Data nodes. But it does mean that many queries that were running very slowly in 7.1 might now run fast enough.

The Memcached API is a new addition to an already extensive list of API connectors, aimed at existing Memcached users and the NoSQL community. For a product trying to be attractive to the NoSQL community, perhaps the addition of distributed parallel JOIN algorithms may be a bit confusing. NoSQL programmers typically avoid JOIN operations, for reasons including scalability and schema flexibility.

Is someone trying to have their cake and eat it too? :-)

Among the other features described here, the separation of Data Nodes across data centers (and specially Figure 3) is misleading. With a 2-phase commit protocol to guarantee synchronous replication within a Node Group, NDB would have to defy the laws of physics to support the below diagram. Unless the data centers are physically close to each other with a low-latency redundant link in-between them, splitting of data nodes across data centers is not a good idea.

It is also worth noting that 7.2 GA is based on MySQL 5.5, whereas 7.1 was based on MySQL 5.1 – probably more relevant for users with mixed NDB and InnoDB tables.

So: is MySQL Cluster 7.2 for you?

If yes, then try it out using the Severalnines Cluster Configurator: http://www.severalnines.com/cluster-configurator/).

It will take just a few minutes to generate a deployment package, which automates the installation of the cluster. Or it might take 288 seconds ... let us know ;-)

9 comments:

Frazer Clement said...

Hi Johan,

Pretty cool deploying in < 5 minutes, I think that must be a record. Fast work supporting 7.2 with Cluster Configurator too!

Regarding the splitting of data nodes across data centres, I don't think any laws of physics need be broken. Physics dictates that latency will be higher, and achievable bandwidth will likely be lower, which will slow down operation execution + commit, node recovery etc relative to single data centre operation, this is inescapable, but it doesn't mean that a slower system is not viable for some use cases, especially if it offers geo distribution with synchronous replication.

A related concern is system stability. As you know, Cluster applies the fail-fast policy to communication failures, immediately isolating unreachable nodes. Isolated nodes terminate themselves and then attempt to restart and rejoin the cluster. While this is occurring, the system is vulnerable to further failures. For this reason it is important that communication failures (e.g. heartbeat failures) are avoided where possible, by using reliably low latency links, and by de-tuning heartbeat timeouts as far as possible within the application failure handling latency requirements.

In general most time-related parameters must be relaxed to accomodate the higher latency and lower bandwidth available.

One specific improvement in 7.2 is a new heartbeat failure handling phase known as 'ConnectivityCheck'. When configured this step will perform an all-to-all connectivity check before isolating an unreachable node. Effects of this include :
- Finding all currently unreachable nodes at once - faster failure handling in 'split brain' scenarios than the normal node-at-a-time detection + handling.
- Armouring against STONITH requests sourced by unreachable or high latency nodes. This avoids the danger of a STONITH race condition when transient latency abates during split brain handling.

The ConnectivityCheck step improves system stability when inter-node link latency can vary within the range of the heartbeat timeout period. Further improvements are of course possible, and any suggestions are welcome!

So in summary, no physical laws need to be bent, but some reconfiguration, and of course, lots of testing are required before deploying a geographically distributed cluster.

I'm interested to hear more of your thoughts on this stuff,

Frazer

Art said...

I just read your post (after I wrote a blog article on the new GA release). I wish I'd read it a week ago. I would have used it as a source.

Anyways...very informative. Thank you. I am curious about optimization for it, and I would love to implement it. I am spec'ing it out to my customers right now.

Thanks,
Art

Anonymous said...

Does 7.2 still need lots of RAM for database/table storage in RAM, or have they finally moved all database storage to .MYD/.MYI/.FRM files like the regular MySQL server?

Mike said...

As I know MySQL Cluster is not supported on a virtualization environment because of issues with timing and disk subsystems:

http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/faqs-mysql-cluster.html#qandaitem-A-1-13

Do you have other experiences with MySQL Cluster virtualization?

Johan Andersson said...

Hi,

yes we do have. Quite a lot actually.

A number of customers have deployed on e.g EC2 using our Configurator. However, if you have a very write heavy load then further tuning may be needed, but for most use cases our configurations work very well. Why it is not supported, I don't really now, but sure, if you run on Virtualbox on your laptop you will have problems. If you use decent sized and good capacity VMs it is generally not a problem (only if it is as i wrote a very write heavy app). However, in that case, it can be traded with the number of nodes to distribute out the number of disk writes. Disk speed can also be traded with disk space.

The secret sauce is to get the timing and configuration right. Specify "web" workload and "ec2" when you configure your cluster.

Link to Cluster Configurator.

Good luck, hope this helps a bit.

Johan Andersson said...

Hi anonymous,
sorry missed to reply to this (there was a lot of spam posts and i missed this one).

Yes, it is still the same. Data is by default stored in RAM, you can use Disk Data tables (indexed data will be stored in RAM, un-indexed on Disk).

Thank you,
Johan

Anonymous said...

Hi

I would like your opinion on master-to-master. Is MySQL cluster more stable than master-to-master?

Anonymous said...

Hi

I would like your opinion on master-to-master. Is MySQL cluster more stable than master-to-master and required more management.

Anonymous said...

Hi

I would like your opinion on master-to-master. Is MySQL cluster more stable than master-to-master?