Wednesday, January 20, 2010

Gearman meets MySQL Cluster (NDBAPI)

After a discussion with my colleague Stephane Varoqui we decided to see how Gearman and the NDBAPI could be used together. The result of the POC was a Gearman worker and a couple of clients (clients and workers use Google Protocol Buffers as the protocol). The worker can:
  • set/get/delete records on a single table in MySQL Cluster using the primary key
  • set/get/delete "any" type. It is not possible to dynamically add types but this is done at compile time.
  • supports the following SQL data types: (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY
  • supports the following Google Protocol Buffer scalars: int32, uint32, int64, uint64, string, bytes.
  • not handle much errors for the time being
and a client that can
  • create a message and send it to the Gearman Job Server
  • clients ca n be written in either C++, Java, or Python (subject to what languages that Google Protocol Buffers supports)
  • receive (deserialize) the data.
So basically this is a new, albeit simple, connector to MySQL Cluster! Hopefully someone will find it useful.

The code can be downloaded here and some short instructions are here, and if you guys out there thinks this is usable, then it might make it to launchpad. Let me know!

Here follows some information what has been done and how to use this.

First you have to create the relation tables (engine=ndb). I will use a 'Person' (for the rest of the examples) that I will persist to the database. I have created the following relational table:
CREATE TABLE `Person` (
`id` int(11) NOT NULL,
`name` varchar(128) DEFAULT NULL,
`message` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
The relational tables needs to be translated to Google Protocol Buffers:
> cat proto/Person.proto
message Person {
required int32 id = 1; //PRIMARY KEY attributes are 'required'
optional string name = 2;
optional string message = 3;
}
All columns in the relational table must exist in the protocol buffer definition.

There is union-proto buffer called NdbMessage.proto that then contains all the proto buffers that can be sent between the client and worker:
> cat proto/NdbMessage.proto
// import the relevant protos that can be in
// the NdbMessage
import "Person.proto";

message NdbMessage {
enum Type { Person=1;}
// Identifies which field is filled in.
required Type type = 1;
// list of possible protos comes here:
optional Person person = 2;
}

The proto files then needs to be run through the protocol buffer compiler:
> /usr/local/bin/protoc --proto_path=proto --cpp_out=`pwd` proto/Person.proto proto/NdbMessage.proto
# this generates .h and .cc files for Proto buffer files.
There are three clients for this, one for each operation (set,get,delete).

In the ndbapi_set_client.cpp (the clients are based on the reverse_client.cpp in gearman) we invoke the 'ndbapi_set' function, that will be executed by the worker:

#include "NdbMessage.pb.h"
#include "Person.pb.h"

.
/**instantiate a NdbMessage object and associate a Person to it*/
NdbMessage m;
m.set_type(NdbMessage_Type_Person);
Person * p=m.mutable_person();
/* I must set all fields for now */
p->set_id(1);
p->set_name("Johan Andersson");
p->set_message("hello world, my first insert");

string s;
m.SerializeToString(&s);

...

const char * data = s.data();
result= (char*)gearman_client_do(&client,
"ndbapi_set",
NULL,
(void *)data,
(size_t)p.ByteSize(),
&result_size,
&ret);

The worker (ndbapi_worker.cpp) registers three functions:
ret=gearman_worker_add_function(&worker, "ndbapi_get", 0, ndbapi_get,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_set", 0, ndbapi_set,NULL);
ret=gearman_worker_add_function(&worker, "ndbapi_delete", 0, ndbapi_delete,NULL);

And when the worker receives the function call to 'ndbapi_set' it has to deserialize the received message into a NdbMessage:

static void *ndbapi_set(gearman_job_st *job,
void *context,
size_t *result_size,
gearman_return_t *ret_ptr)
{
/** receive message and convert into c++ string
* construct the wanted object (dataObject) from parsing c++ string.
*/
const void * rawmessage;
rawmessage= gearman_job_workload(job);
string s((char*)rawmessage, gearman_job_workload_size(job));

NdbMessage dataObject;
if(! dataObject.ParseFromString(s))
{
*ret_ptr= GEARMAN_WORK_FAIL;
return NULL;
}

The worker then looks at the type of the message and gets the underlying object (in this case Person):

google::protobuf::Message * message;
switch(dataObject.type())
{
case NdbMessage_Type_Person:
{
message= (google::protobuf::Message*)&dataObject.person();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
/*
case NdbMessage_Type_MyType:
{
// the myType() .. is the name of the field in MyType.proto:
// MyType myType = ;
message= (google::protobuf::Message*)&dataObject.myType();
reflection = (google::protobuf::Reflection *)message->GetReflection();
descriptor = (google::protobuf::Descriptor*)message->GetDescriptor();
}
break;
*/
default:
cout << "unknown type: "<< ret_ptr=" GEARMAN_WORK_FAIL;"> the insert was successful */
*result_size=0;
*ret_ptr= GEARMAN_SUCCESS;
return NULL;
}

In order to add a new type, you need to add a new 'case' to handle the type and how to get that object from the NdbMessage object (dataObject).

The worker loops over all fields in the received Proto Message and creates a transaction in the NDBAPI and executes it. Thus this part agnostic to the type you give it. As long as the following is true:
  • The relational table only uses (UNSIGNED) INTEGER, (UNSIGNED) BIGINT, CHAR, VARCHAR/VARBINARY data types.
  • The .proto definition contains all columns in the relational table
  • The .proto file marks the PRIMARY KEY of the relational table as 'required'
  • For 'ndbapi_set' you need to set all columns in the table ( i will fix that as soon as possible)
The data is then persisted in the table (currently the worker expects all tables to be stored in the 'test' database):
mysql> select * from Person;
+----+-----------------+------------------------------+
| id | name | message |
+----+-----------------+------------------------------+
| 1 | Johan Andersson | hello world, my first insert |
+----+-----------------+------------------------------+
1 row in set (0.00 sec)


Now the worker can also handle 'get' requests, and by using the get_client:
> ./get_client  1
name: Johan Andersson
message: hello world, my first insert
And there is also a client that does deletes (delete_client):
> ./delete_client  1
Delete successful: id=1
Summary/Conclusions
  • What are the performance implications of using Proto Buffer's reflection mechanism?
  • Proto Buffer only works for C++, Java, and Python - currently no support for PHP.
  • Is it better to use something else than Proto Buffers for this?
  • Gearman was super-easy to install so thanks for that!
  • Google Protocol Buffers was super-easy to install so thanks for that!
  • The worker needs also to be extended to support range searches and to make use of the batching interface so that it is possible persist either many types or many instances of a type in a batch.
  • NO-SQL -- YES-NDB !

Thursday, January 14, 2010

CMON - Install Instructions

CMON - the Cluster Monitor has recently been released and here is a little how to about how to install from binary and source (at the end), deployment etc.

1. Download the cmon binary package to a monitoring host

Currently there are binaries available for Linux 64-bit and 32-bit (statically linked) and works for MySQL Cluster 7.0.9 and later.

In this case the monitoring host will be on 'ndb05' (IP address 10.0.1.5) - see 9. Suggested Deployment for a picture describing the setup.

The Monitoring machine should have installed:
  • a mysql server up and running that will store the cmon database (if you build from source you need to have the mysql cluster libraries and include files)
  • apache (if you want to have the web interface) + php_mysql
  • rrdtool (if you want to have graphs in web interface)
The picture at the end of this post shows how you can deploy CMON.
> cd /usr/local
> wget http://www.severalnines.com/downloads/cmon/cmon-1.0.2-64bit-glibc23-mysqlcluster-709.tar.gz
> tar xvfz cmon-1.0.2-64bit-glibc23-mysqlcluster-709.tar.gz
> ln -s cmon-1.0.2-64bit-glibc23-mysqlcluster-709 cmon


2. Verify that you have a free API slot [mysqld] so that cmon can connect to MySQL Cluster

> ndb_mgm -e "show"
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @10.0.1.3 (mysql-5.1.39 ndb-7.0.9, Nodegroup: 0, Master)
id=4 @10.0.1.4 (mysql-5.1.39 ndb-7.0.9, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.0.1.1 (mysql-5.1.39 ndb-7.0.9)
id=2 @10.0.1.2 (mysql-5.1.39 ndb-7.0.9)

[mysqld(API)] 6 node(s)
id=7 @10.0.1.1 (mysql-5.1.39 ndb-7.0.9)
id=8 @10.0.1.2 (mysql-5.1.39 ndb-7.0.9)
id=9 (not connected, accepting connect from any host)
id=10 (not connected, accepting connect from 10.0.1.5)

Yes, I have two slots - if I run cmon from host 10.0.1.5, then it is fine (id=10), but in this case, I can connect cmon from any host (id=9). Point is that you must have a slot free with "any host" or from an explicit host.

You must also have MemReportFrequency=30 (or some other value, in seconds) in order to get the memory usage events sent from MySQL Cluster. Otherwize, CMON will not be able to present memory utilization information!


3. Run the cmon install script

> cd /usr/local/cmon/bin
> ./cmon_install.sh

This script will install cmon, rrd, init.d scripts, and generate SQL scripts for GRANTs
Distribution: fedora

**** MYSQL CONNECTION ****
CMON and the RRD scripts needs a mysql connection to speak mysql server holding the cmon database.
Specify the BASEDIR where mysql is installed (default /usr/local/mysql/): /usr/local/mysql/mysql

Specify the hostname of the mysql server having the cmon database (default 'ndb05'): <return>
No hostname specified - using default 'ndb05'

Specify the port of the mysql server having the cmon database (default 3306): <return>
No port specified - using default 3306

Specify the password for the 'cmon' user (default no password): <return>
No password specified - using default (no password)

Specify the ndb-connectstring to the cluster (e.g, host_A;host_B): ndb01;ndb02
## Comment: ndb01;ndb02 are the hostnames of the two management
## servers in my cluster. YOU MUST SPECIFY THIS!

**** WWW interface ****
The www files will be copied to /var/www/html/cmon/
## Comment: The installation scripts tries to find the default
## location for www used in your distribution
Specify the WWWROOT of your webserver (default /var/www/html): <return>
Copying files to /var/www/html ..

**** RRD ****
## Comment: If you don't have RRD installed then graphs will not be available
## from the web client, but there is no other functional impact!
Specify the full path to 'rrdtool' (default is /usr/bin/): <return>
No path to rrd specified - using default /usr/bin/

The rrdtool stores data files in a data directory.
Specify the full path to the data directory (about 20MB free space will be neeeded).
RRD data directory (default is /data/rrd/): <return>
No RRD data directory specified - using default /data/rrd/
Saving ../etc/cmon.conf

**** INITD SCRIPTS ****
Do you want to install /etc/init.d/cmon (y/n)? : y
Specify the directory where CMON should write its pidfile (default /var/run/): <return>
chkconfig
Saving configuration to ../etc/init.d/cmon
Installing /etc/init.d/cmon
Done - Installed init.d scripts
Now you can start cmon with '/etc/init.d/cmon start'

**** INSTALL CRONTAB ****
cron schedules jobs every 5 minutes to update the rrd database and generarate graphs for the web interface. You are recommended to install the cron jobs
Do you want to install cron jobs for cmon (y/n)? : y
You need to issue the following GRANTs before starting CMON:

GRANT super, replication client ON *.* TO 'cmon'@'ndb05';
GRANT select,update,insert,delete,create ON cmon.* TO 'cmon'@'ndb05';


Configuration now complete - you can edit the /usr/local/cmon/bin/../etc/cmon.conf manually if you wish.

Configuration is now complete, but you need to apply the suggest GRANTs to the cmon database:

## COMMENT: Connect a mysql client to the cmon database and do
## (actual GRANTs are subject to your particular settings):
mysql> GRANT super, replication client ON *.* TO 'cmon'@'ndb05';
mysql> GRANT select,update,insert,delete,create ON cmon.* TO 'cmon'@'ndb05';

4. Start CMON

## COMMENT: Make sure CMON database and Cluster is started (so initialization won't fail)
> /etc/init.d/cmon start
Starting cmon version 1.0.0 with the following parameters:
--mysqlpasswd=
--mysqlhost=ndb05
--mysqlport=3306
--ndb-connectstring=ndb01;ndb02
--savetime-clusterlog=48 (hours)

If that doesn't look correct, kill cmon and restart with -? for help on the parameters, or change the params in /etc/init.d/cmon

You need to GRANT (and specify a password if you wish) the following on mysql on ndb05:

GRANT create,select,update,insert,delete on cmon.* to 'cmon'@'ndb05';
GRANT super on *.* to 'cmon'@'ndb05';

Testing connection to mysqld..
Connection ok..
Please wait while cmon is starting up..
Recreating missing tables
Done
Registering managed cluster with cluster id=1
Managed cluster has been registered - registered cluster id=1
cmon has started successfully.
Going to daemoinze.. - cmon will write a log in syslog from now
start on service executed successfully
5. Access the Cluster status from SQL
## COMMENT: connect a mysql client to the mysql server holding the cmon database.
mysql> use cmon;

Database changed

mysql> show tables;
+-------------------------+
| Tables_in_cmon |
+-------------------------+
| alarm |
| alarm_log |
| backup |
| backup_log |
| cluster |
| cluster_log |
| cluster_state |
| cluster_statistics |
| configurator_nodemap |
| diskdata |
| email_notification |
| mailserver |
| memory_usage |
| mysql_global_statistics |
| mysql_master_status |
| mysql_server |
| mysql_slave_status |
| mysql_statistics |
| mysql_variables |
| node_state |
| node_statistics |
| restore |
| restore_log |
| schema_object |
+-------------------------+
24 rows in set (0.00 sec)

mysql> select status from cluster_state;
+---------+
| status |
+---------+
| STARTED |
+---------+
1 row in set (0.00 sec)

mysql> select * from node_state;
+-----+--------+------------+-----------+-----------+---------------+---------+-------------+-----------------+---------------------+
| cid | nodeid | status | node_type | nodegroup | host | version | disconnects | last_disconnect | report_ts |
+-----+--------+------------+-----------+-----------+---------------+---------+-------------+-----------------+---------------------+
| 1 | 1 | CONNECTED | NDB_MGMD | NULL | 10.0.1.1 | 7.0.9 | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 2 | CONNECTED | NDB_MGMD | NULL | 10.0.1.2 | 7.0.9 | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 3 | STARTED | NDBD | 0 | 10.0.1.3 | 7.0.9 | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 4 | STARTED | NDBD | 0 | 10.0.1.4 | 7.0.9 | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 7 | CONNECTED | API | NULL | 10.0.1.1 | 7.0.9 | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 8 | CONNECTED | API | NULL | 10.0.1.2 | 7.0.9 | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 13 | NO_CONTACT | API | NULL | 0.0.0.0 | NULL | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 12 | NO_CONTACT | API | NULL | 0.0.0.0 | NULL | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 11 | NO_CONTACT | API | NULL | 0.0.0.0 | NULL | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 10 | NO_CONTACT | API | NULL | 0.0.0.0 | NULL | 0 | NULL | 2010-01-13 12:00:56 |
| 1 | 9 | CONNECTED | API | NULL | 10.0.1.5 | 7.0.9 | 0 | NULL | 2010-01-13 12:00:56 |
+-----+--------+------------+-----------+-----------+---------------+---------+-------------+-----------------+---------------------+
6. Web interface

Requires that you have a webserver with php-mysql installed.
Simply connect your web browser to e.g 127.0.0.1:8080/cmon

7. Add MySQL Servers to be monitored

As MySQL Cluster is not able to distinguish between an API or a MySQL Server, we need to tell CMON where our mysql servers are!
> cd /usr/local/cmon/bin
> ./cmon_add_mysqld.sh -ndb01 -p3306
Using default config file path ../etc/cmon.conf
Added MySQL Server

You need to execute the following GRANTs on mysqld@ndb01 for CMON agent to be able to connect:

GRANT REPLICATION SLAVE, SUPER ON *.* TO 'cmon'@'ndb05';
GRANT SELECT ON mysql.* TO 'cmon'@'ndb05';
And so on, you need to add every mysql server you want to monitor.

8. Scripts

All information is stored in SQL tables so it is easy to write scripts around it. Here you can find some example scripts.
## COMMENT: Put the scripts in /usr/local/cmon/bin
> cd /usr/local/cmon/bin
> sh get_cluster_status.sh
STARTED

> sh get_node_status.sh -n 4
STARTED
9. Suggested Deployment
Recommendation: If you have the 'mysqld' on 10.10.1.5, connect it also to Cluster and you can use it for administration!


10. Building from source

If you prefer to build from source then you have to have:
  • MySQL Cluster binaries/libs/includes installed on the build machine
  • gcc/g++/make
  • mysql_config and ndb_config on the PATH
> which mysql_config
/usr/local/mysql/mysql/bin/mysql_config
> which ndb_config
/usr/local/mysql/mysql/bin/ndb_config

## COMMENT: If 'mysql_config' or 'ndb_config' is not on the PATH,
## then you must make sure it is!

> wget http://www.severalnines.com/downloads/cmon/cmon-1.0.2.tar.gz
> tar xvfz cmon-1.0.2.tar.gz
> cd cmon-1.0.2
> ./configure --prefix=/usr/local/cmon/
## COMMENT: end the prefix with 'cmon' and life will be easier.
> make
> sudo make install
> cd /usr/local/cmon/
Proceed at section 2 above.
11. Bugs and problems

I would be very happy if you file bugs on lauchpad! If you don't want to file a bug report, you can send an email to 'support (at) severalnines (dot) com
Also, if you have suggestions for improvements, let me know!