Johan Andersson's Cluster and HA Blog: nosql

Showing posts with label nosql. Show all posts

Thursday, February 23, 2012

Deploy MySQL Cluster 7.2 GA in 288 seconds

It seems that our friends at Oracle have been pretty busy with the GA release of MySQL Cluster 7.2.

This is not just old wine in new bottles.

While it may be a dot release, it does appear to be a more significant step forward than a dot release would imply.

First off, we are very excited to announce that the Severalnines Cluster Configurator now includes support for 7.2 GA.

As the title of this blog suggests, it is possible, as we have experienced, to deploy 7.2 GA in 288 seconds, i.e. just a few minutes. This was done on m1.large instances on Amazon. We have published a quick how-to deployment guide for Amazon here: http://support.severalnines.com/entries/20916823-installing-on-amazon-ec2.

This being said, let’s have a look at some of the new features in 7.2 and why they matter.

MySQL Cluster has always suffered from poor JOIN performance, mainly because the nested loop algorithm did not work very well in a parallel environment. That meant the join was executed entirely in the MySQL Server. This resulted in a lot of network round-trips between the MySQL Server and the data nodes.

Now with 7.2, this has changed. With Adaptive Query Localization, a JOIN query is now shipped down to the data nodes and executed in parallel on local copies of data.

Results look very promising, with up to 70x factor improvement on some queries as described in this excellent post.

By getting better index statistics, the SQL optimizer can also build a better execution plan for each query. This in turn reduces the need for manual query tuning. Prior to 7.2, a lot of index hinting was needed. However, 70x speed improvement does not mean that a single query on MySQL Cluster will perform as fast as on InnoDB, since there will always be the network latency between SQL and Data nodes. But it does mean that many queries that were running very slowly in 7.1 might now run fast enough.

The Memcached API is a new addition to an already extensive list of API connectors, aimed at existing Memcached users and the NoSQL community. For a product trying to be attractive to the NoSQL community, perhaps the addition of distributed parallel JOIN algorithms may be a bit confusing. NoSQL programmers typically avoid JOIN operations, for reasons including scalability and schema flexibility.

Is someone trying to have their cake and eat it too? :-)

Among the other features described here, the separation of Data Nodes across data centers (and specially Figure 3) is misleading. With a 2-phase commit protocol to guarantee synchronous replication within a Node Group, NDB would have to defy the laws of physics to support the below diagram. Unless the data centers are physically close to each other with a low-latency redundant link in-between them, splitting of data nodes across data centers is not a good idea.

It is also worth noting that 7.2 GA is based on MySQL 5.5, whereas 7.1 was based on MySQL 5.1 – probably more relevant for users with mixed NDB and InnoDB tables.

So: is MySQL Cluster 7.2 for you?

If yes, then try it out using the Severalnines Cluster Configurator: http://www.severalnines.com/cluster-configurator/).

It will take just a few minutes to generate a deployment package, which automates the installation of the cluster. Or it might take 288 seconds ... let us know ;-)

Monday, October 11, 2010

MySQL Cluster - Performance (UPDATE on PK) - >120K tx/sec

This post follows on the previous post on SELECT performance. In this post I want to show three things:

How many single row UPDATEs per second you can do on on a Cluster with two data nodes (updating 64B data by the PRIMARY KEY, no batching)
Show how MySQL Cluster scales with threads and mysql servers
How ndb_cluster_connection_pool affects performance

Next post will be what happens to INSERTs.

Setup

two data nodes
one to four mysql servers
interconnected with Gig-E (single NIC)

deployed on six computers (of varying quality, but not really modern, see below). www.severalnines.com/bencher was co-located with each mysql servers to drive the load. The inserts are non batched and looks like:

UPDATE t1 SET data1='64B of data' WHERE id=[random];

Table looks like:

CREATE TABLE `t1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data1` varchar(512) DEFAULT NULL,
`data2` varchar(512) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster

The configuration was generated using www.severalnines.com/config with the following options:

MySQL Cluster 7.1.7 (src)
Cluster Usage: Option 3) High read/High write
Cores: 8
Multi-connection: 16 (so that i later could easily change between 1,8,16 connections in the pool)
DataMemory=2500M

Hardware

Data nodes deployed on: 2x4 cores Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
2 mysql servers deployed on: 2x2 cores with Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
2 mysql servers deployed on: 2x4 cores with Intel(R) Xeon(R) CPU E5345 @ 2.33GHz

Results

A few notes:

conn= 8 means ndb_cluster_connection_pool=8
1 app means one pair of bencher + mysqld co-located one server
With identical HW (2x4 cores) for the mysql server 130K updates per second should be possible.
Data nodes are quite loaded at 4 applications (from 64 - 128 threads), and the TC peaks at 86% CPU util.
The network was not saturated (about 200Mb/s each for TX and RX between the data nodes)
Latency numbers - i have them if someone is interested.

How to improve this more:

More cores, faster cores
more mysql servers (to scale out more)
eventually more data nodes
inifiniband
use ndbapi (nosql)

Wednesday, September 29, 2010

MySQL Cluster - Performance (SELECT on PK)

In this post I want to show three things:

How many single row SELECTs per second (on the PRIMARY KEY, no batching) you can do on on a Cluster with two data nodes
Show how MySQL Cluster scales with threads and mysql servers
How ndb_cluster_connection_pool affects performance

Next post will be what happens to INSERTs, and then UPDATEs.

Setup

two data nodes
one to four mysql servers
interconnected with Gig-E (single NIC)

deployed on six computers (of varying quality, see below). www.severalnines.com/bencher was co-located with each mysql servers to drive the load. The reads were a PK SELECT like:

SELECT data1,data2 FROM t1 WHERE id=[random];

data1 and data2 are each 256B, so in total 512B was read. There was 1M records in total in table t1, but this does not matter as the SELECT is on the PK (hash index) so it could have been 100B rows (if i have had the storage for it).

Table looks like:

CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
 `data1` varchar(512) DEFAULT NULL,
`data2` varchar(512) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster

but only 256B of each 512B varchar was filled with data.

The configuration was generated using www.severalnines.com/config with the following options:

MySQL Cluster 7.1.7 (src)
Cluster Usage: Option 3) High read/High write
Cores: 8
Multi-connection: 16 (so that i later could easily change between 1,8,16 connections in the pool)
DataMemory=2500M

Hardware

Data nodes deployed on: Dual CPU Quad core Intel(R) Xeon(R) CPU E5345 @ 2.33GHz
2 mysql servers deployed on: Dual CPU Dual core with Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
2 mysql servers deployed on: Dual CPU Quad core Intel(R) Xeon(R) CPU E5345 @ 2.33GHz

Results

A few notes:

conn= 8 means ndb_cluster_connection_pool=8
1 app means one pair of bencher + mysqld co-located one server
See below why 4 apps does not give >200K tx/s

Conclusions

Going from one mysql server to two mysql server gives a 99.9% throughput increase (32 threads).
At this stage I wanted to try with four mysql servers, as the two mysql servers were completely CPU bound at this stage.
Going from two mysql servers to four mysql servers gives a 66% throughput increase.

However, two of mysql servers were located on dual core, dual cpu, instead of dual cpu - quad core. The Dual CPU/Dual core servers were saturated already at 16 threads connected to MySQL.

I expect, with similar hardware to get well above 200000tx/s. The MySQL servers on the dual cpu/quad core machines could do 52000tx/s, but the dual cpu/dual core could only handle ~ 38000tx/s when executing the SELECT from 32 threads.

Hence, more than 200000tx/s should not be impossible, which would be a 92% increase in throughput compared to 2 mysql servers.
ndb_cluster_connection_pool=8 is a good value
ndb_cluster_connection_pool=1 is is really bad
ndb_cluster_connection_pool=16 does not give any real benefit, i can imagine it will be worse actually with more data nodes and more mysql servers b/c each data node has to read from more sockets. OTOH, epoll might help. Must be tested more :)
On my dual cpu/quad core machines the mysql server could scale up to 32 threads and then it hit the roof (100% CPU util including the benchmark program)
On my dual cpu/dual core machines the mysql server could scale up to16 threads and then it hit the roof (100% CPU util including the benchmark program)
CPU-bound -- the mysql servers got completely CPU bound with 32 threads
It would be fun to test with 16 or 24 core machines, but I don't have those. It would also be fun to test with faster CPUs.
The data nodes were not saturated on the peak results, but pay attention to the TC thread as in this case it accounts for most CPU util within the data node.
Latency numbers - i have them if someone is interested.

How to improve this more:

More cores, faster cores
more mysql servers (to scale out more)
eventually more data nodes
inifiniband
use ndbapi (nosql)

I think, despite MySQL Cluster being a relational database and despite we are using SQL, it is possible to get some decent numbers, and good scalability.

Wednesday, January 20, 2010

Gearman meets MySQL Cluster (NDBAPI)

After a discussion with my colleague Stephane Varoqui we decided to see how Gearman and the NDBAPI could be used together. The result of the POC was a Gearman worker and a couple of clients (clients and workers use Google Protocol Buffers as the protocol). The worker can:

set/get/delete records on a single table in MySQL Cluster using the primary key
set/get/delete "any" type. It is not possible to dynamically add types but this is done at compile time.
supports the following SQL data types: (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY
supports the following Google Protocol Buffer scalars: int32, uint32, int64, uint64, string, bytes.
not handle much errors for the time being

and a client that can

create a message and send it to the Gearman Job Server
clients ca n be written in either C++, Java, or Python (subject to what languages that Google Protocol Buffers supports)
receive (deserialize) the data.

So basically this is a new, albeit simple, connector to MySQL Cluster! Hopefully someone will find it useful.

The code can be downloaded here and some short instructions are here, and if you guys out there thinks this is usable, then it might make it to launchpad. Let me know!

Here follows some information what has been done and how to use this.

First you have to create the relation tables (engine=ndb). I will use a 'Person' (for the rest of the examples) that I will persist to the database. I have created the following relational table:

CREATE TABLE `Person` (
`id` int(11) NOT NULL,
`name` varchar(128) DEFAULT NULL,
`message` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1

The relational tables needs to be translated to Google Protocol Buffers:

> cat proto/Person.proto
message Person {
required int32 id = 1;  //PRIMARY KEY attributes are 'required'
optional string name = 2;
optional string message = 3;
}

All columns in the relational table must exist in the protocol buffer definition.

There is union-proto buffer called NdbMessage.proto that then contains all the proto buffers that can be sent between the client and worker:

> cat proto/NdbMessage.proto
// import the relevant protos that can be in
// the NdbMessage
import "Person.proto";

message NdbMessage {
enum Type { Person=1;}
// Identifies which field is filled in.
required Type type = 1;
// list of possible protos comes here:
optional Person person = 2;
}

The proto files then needs to be run through the protocol buffer compiler:

> /usr/local/bin/protoc --proto_path=proto --cpp_out=`pwd` proto/Person.proto proto/NdbMessage.proto
# this generates .h and .cc files for Proto buffer files.

There are three clients for this, one for each operation (set,get,delete).

In the ndbapi_set_client.cpp (the clients are based on the reverse_client.cpp in gearman) we invoke the 'ndbapi_set' function, that will be executed by the worker:


#include "NdbMessage.pb.h" 
#include "Person.pb.h"
.
/**instantiate a NdbMessage object and associate a Person to it*/
NdbMessage m;
m.set_type(NdbMessage_Type_Person);
Person *  p=m.mutable_person();
/* I must set all fields for now */
p->set_id(1);
p->set_name("Johan Andersson");
p->set_message("hello world, my first insert");

string s;
m.SerializeToString(&s);

...

const char * data = s.data();
result= (char*)gearman_client_do(&client,
"ndbapi_set",
NULL,
(void *)data,
(size_t)p.ByteSize(),
&result_size,
&ret);

The worker (ndbapi_worker.cpp) registers three functions:

ret=gearman_worker_add_function(&worker, "ndbapi_get", 0, ndbapi_get,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_set", 0, ndbapi_set,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_delete", 0, ndbapi_delete,NULL);

And when the worker receives the function call to 'ndbapi_set' it has to deserialize the received message into a NdbMessage:

static void *ndbapi_set(gearman_job_st *job,
 void *context,
 size_t *result_size,
gearman_return_t *ret_ptr)
{
/** receive message and convert into c++ string
* construct the wanted object (dataObject) from parsing c++ string.
*/
  const void * rawmessage;
  rawmessage= gearman_job_workload(job);
  string s((char*)rawmessage,  gearman_job_workload_size(job));

  NdbMessage dataObject;
  if(! dataObject.ParseFromString(s))
    {
      *ret_ptr= GEARMAN_WORK_FAIL;
      return NULL;
    }

The worker then looks at the type of the message and gets the underlying object (in this case Person):


google::protobuf::Message  * message;
switch(dataObject.type())
{
case NdbMessage_Type_Person:
{
message= (google::protobuf::Message*)&dataObject.person();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
/*
case NdbMessage_Type_MyType:
{
// the myType() ..  is the name of the field in MyType.proto:
// MyType myType = ;
message= (google::protobuf::Message*)&dataObject.myType();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
*/
default:
cout << "unknown type: "<< ret_ptr=" GEARMAN_WORK_FAIL;"> the insert was successful */
*result_size=0;
*ret_ptr= GEARMAN_SUCCESS;
return NULL;
}

In order to add a new type, you need to add a new 'case' to handle the type and how to get that object from the NdbMessage object (dataObject).

The worker loops over all fields in the received Proto Message and creates a transaction in the NDBAPI and executes it. Thus this part agnostic to the type you give it. As long as the following is true:

The relational table only uses (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY data types.
The .proto definition contains all columns in the relational table
The .proto file marks the PRIMARY KEY of the relational table as 'required'
For 'ndbapi_set' you need to set all columns in the table ( i will fix that as soon as possible)

The data is then persisted in the table (currently the worker expects all tables to be stored in the 'test' database):

mysql> select * from Person;
+----+-----------------+------------------------------+
| id | name            | message                      |
+----+-----------------+------------------------------+
|  1 | Johan Andersson | hello world, my first insert |
+----+-----------------+------------------------------+
1 row in set (0.00 sec)

Now the worker can also handle 'get' requests, and by using the get_client:

> ./get_client  1
name: Johan Andersson
message: hello world, my first insert

And there is also a client that does deletes (delete_client):

> ./delete_client  1
Delete successful: id=1

Summary/Conclusions

What are the performance implications of using Proto Buffer's reflection mechanism?
Proto Buffer only works for C++, Java, and Python - currently no support for PHP.
Is it better to use something else than Proto Buffers for this?
Gearman was super-easy to install so thanks for that!
Google Protocol Buffers was super-easy to install so thanks for that!
The worker needs also to be extended to support range searches and to make use of the batching interface so that it is possible persist either many types or many instances of a type in a batch.
NO-SQL -- YES-NDB !

Johan Andersson's Cluster and HA Blog

Thursday, February 23, 2012

Deploy MySQL Cluster 7.2 GA in 288 seconds

Monday, October 11, 2010

MySQL Cluster - Performance (UPDATE on PK) - >120K tx/sec

Wednesday, September 29, 2010

MySQL Cluster - Performance (SELECT on PK)

Wednesday, January 20, 2010

Gearman meets MySQL Cluster (NDBAPI)

Blog Archive

About Me

Good resources

Blogs