Tuesday, December 23, 2008

MySQL Cluster 6.4 - BETA

The Configurator and Sandbox (only the Custom part so far) has been updated to support MySQL Cluster 6.4!
MySQL Cluster 6.4 is a fantastic piece of software with new features such as:
  • Multi-threaded Support. A new configuration parameter, MaxNoOfExecutionThreads=[2,4,8] allows you to set how many cores the data node should utilize.
    MaxNoOfExecutionThreads=2 for two core machines
    MaxNoOfExecutionThreads=4 for quad core machines
    MaxNoOfExecutionThreads=8 for eight core machines
    With 8 you will get 4 query handling threads! Can you beat Jonas' 1M reads on a single box?
  • Online add node - allows you to scale online! I will publish a procedure how to do this very soon. Currently the Configurator and the Sandbox do not support this.
  • Windows Support (I have not tried this and not 100% of the quality, nor how to build it, but it should be there).
  • Snapshot backup - START BACKUP, [SNAPSHOTSTART | SNAPSHOTEND]
    SNAPSHOTEND - the backup contains the database at backup end time (default and same as behavior in previos releases)
    SNAPSHOTSTART - the backup contains the database at backup start time
And remember it is a beta, and be prepared for bugs and other oddities until it has stabilized.

Friday, December 19, 2008

MySQL Cluster Sandbox - test cluster to cluster replication!

If you have max 15 minutes and want to try out MySQL Replication (geo redundancy) between two Clusters on your localhost (only tested on Linux) or on vmware then this Sandbox is for you.

Here is what you have to do:
  1. Go to www.severalnines.com/sandbox
  2. Create a master cluster:
    I have used the ports 1186 (for the management server) and 3306 and 3307 for the mysql servers).
  3. Enter the email address and a set of scripts to install cluster will be sent.
  4. Create a slave cluster:
    Make sure you use different ports for the slave cluster. I have used the ports 1187 (for the management server) and 3310 and 3311 for the mysql servers).

Master Cluster:

tar xvfz mysqlcluster-63-master.tar.gz
cd mysqlcluster-63-master
cd scripts
If you have the gnu tool chain (g++/gcc/make) you can build from source:
sh download-and-compile.sh
sh dist-src-build.sh
or if you prefer pre-built binaries (you must chose the "non-rpm" (i.e. the .tar.gz) and the mysql-...tar.gz must not be in /tmp !!):
sh dist-tgz mysql-cluster-gpl-6.3.17-linux-i686-glibc23.tar.gz
Then we have to setup the mysql servers and start the cluster:
sh bootstrap.sh
sh start-cluster-initial.sh
Connect the mysql client to the mysql server:
sh mysqlclient-1.sh
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'password';
We don't want the GRANT to be replicated (and clear out old stuff in the binlogs that we might have):
RESET MASTER;
and once more (on both mysql servers since grants done locally on each mysql server)..
sh mysqlclient-2.sh
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'password';
RESET MASTER;

Slave Cluster:

tar xvfz mysqlcluster-63-slave.tar.gz
cd mysqlcluster-63-slave
cd scripts
If you have the gnu tool chain (g++/gcc/make) you can build from source:
sh download-and-compile.sh
sh dist-src-build.sh
or if you prefer pre-built binaries (you must chose the "non-rpm" (i.e. the .tar.gz) and the mysql-...tar.gz must not be in /tmp !!):
sh dist-tgz mysql-cluster-gpl-6.3.17-linux-i686-glibc23.tar.gz
Then we have to setup the mysql servers and start the cluster:
sh bootstrap.sh
sh start-cluster-initial.sh
Connect the mysql client to the mysql server:
sh mysqlclient-1.sh
CHANGE MASTER TO master_host='localhost', master_port=3306, master_user='repl', master_password='password';
START SLAVE;
SHOW SLAVE STATUS\G

Create a table on the master:
sh mysqlclient-1.sh
use test
CREATE TABLE t1 (a INTEGER PRIMARY KEY) ENGINE=NDB;
INSERT INTO t1(a) VALUES (1);
Go to the slave:
sh mysqlclient-1.sh
use test
SELECT * FROM t1;

Read more on replication and learn how to do link failovers etc.

Friday, December 12, 2008

cmon 0.12 - with diskdata support and a lot of fixes

cmon 0.12 is now released and can be downloaded from www.severalnines.com/cmon.

Release notes:
  • connection handling - cmon will now retry connects forever. cmon will also recover and automatically reconnect to cluster after a cluster failure or if the connection to the mysql server crashes where cmon logs information. This means you will have to terminate the cmon using e.g 'killall cmon' or /etc/init.d/cmon stop
  • disk data - cmon now gathers information about the amount of table space used and this is presented in the web interface
  • init.d scripts - fixed bugs in creating the init.d scripts during make install-extra
  • fixed compiler warnings - now 0 compiler warnings on Linux systems
  • mysql_statistics table - now explicitly using engine=myisam
  • web interface - changed printout if cmon cannot connect to the cmon mysql server (suggests you to change $mysql_hostname in {wwwroot}/cmon/cmon_db_conf.inc.php
  • web interface - changed "Memory" to "Storage" which now contains table space information
  • web interface - print hostname instead of ipaddresses if possible
  • cmon has been tested on SUSE 10 (SLES 10).
  • documentation bug - wwwroot for SUSE is /srv/www/htdocs/
Upgrading from cmon 0.11
  • cmon 0.12 will add a new table called 'diskdata'. This will be added automatically when you start cmon-0.12. If you prefer, you can download the sql script and create it manually.
  • don't forget to do ./configure --with-wwwroot={documentroot of www server} if you want to easily install the www files for cmon using 'make install-extra'
  • make install-extra will also install the init.d scripts.

External integration
cmon was successfully integrated at a customer who was using Solarwinds IpMonitor as the central monitor for all their IT services.
By using SQL to select the cluster state, node state, etc from the cmon database, and raise alarms and notifications when needed, the customer's MySQL Cluster solution could be monitored from one central point.

Todo
  • brush up web interface (make it look nicer)
  • add replication link monitoring
  • add smtp/snmp notifications

Wednesday, December 10, 2008

Disk data - a summary

Here is a summary about Disk Data tables in MySQL Cluster.

Data node configuration:
  • SharedGlobalMemory=384M
    RAM from the shared global memory is used for the UNDO_BUFFER when you create the log file group.
    In the configuration generated by severalnines.com/config then you have to uncomment the SharedGlobalMemory in mysqlcluster-XYZ/cluster/config/config.ini before you start the cluster.
  • DiskPageBufferMemory=3072MB
    If you are relying a lot on the disk data, we recommend to set this to as much as possible.
    In the configuration generated by severalnines.com/config then you have to uncomment the DiskPageBufferMemory in mysqlcluster-63/cluster/config/config.ini before you start the cluster.
    The DiskPageBufferMemory should be set to:
    DiskPageBufferMemory=TOTAL_RAM - 1GB (OS) - 1200MB (approx size used for buffers etc in the data nodes) - DataMemory - IndexMemory
    Expect to do some trial and terror before getting this correct.
Creating the storage:
  • LOGFILE GROUP
    CREATE LOGFILE GROUP lg ADD UNDOFILE 'undo1.dat' INITIAL_SIZE=20G UNDO_BUFFER_SIZE=128M ENGINE=NDB;

    You can only have one LOGFILE GROUP. 20GB undo log is just an example, but don't be shy with it if you have space. Theoretically, you could need as much UNDO as you have TABLE SPACE size. If you have 10GB table space, then start with an UNDO log that is 10% of that. Add more UNDO LOG FILES if you need, i.e, run into problems. UNDO_BUFFER_SIZE=128M. should be sufficient for most applications (but you must have set SharedMemoryGlobal=384M). This buffer cannot grow dynamically or be extended later :(
  • UNDO LOG FILES
    You can add more undo files (online) by doing:
    ALTER LOGFILE GROUP lg ADD UNDOFILE 'undo_2.dat' INITIAL_SIZE 256M ENGINE NDB;
  • TABLESPACE
    CREATE TABLESPACE ts_1 ADD DATAFILE 'data1.dat' USE LOGFILE GROUP lg INITIAL_SIZE=256M ENGINE=NDB;
  • DATAFILES
    In 7.0 you can use DiskIOThreadPool which makes it possible to have X threads operating on one file, so you don't have to create many data files. Of course for some applications it may be good to have the more files if you want to spread out over several physical disks.
    In 6.3 it is recommended to have many but smaller data files. Here is a script that generates SQL (ALTER TABLE ADD ... ) to create number of datafiles. Run it as:
    sh ts-gen.sh 100 > alter_add_ts.sql
    and you will get a file with 99 SQL statements to add table spaces.
    From the mysql client you can then do:
    mysql> source alter_add_ts.sql
    and a lot of datafiles will be added to the ts_1 tablespace.
    Please note that you can change filesize and path if you wish.
  • Having more than 50-60 data files does not really give anything. The reason for having several data files instead of one is that more internal file system threads can operate in "parallel" on the data files, instead of having to be synchronized on one.
  • You grow the table space by adding data files.
  • Adding data files is an online operation if you wish to extend your table space.
  • There is no auto-extend yet of table spaces.
  • You can create many tablespaces, each having many data files.
  • Leave EXTENT_SIZE as it is. We have found no reason yet to change it.
  • Example CREATE TABLE:
    CREATE TABLE t1 ( a INTEGER PRIMARY KEY, b VARBINARY(2048)) engine=ndb TABLESPACE ts_1 STORAGE DISK;
    The column 'b' will be stored on disk. The column 'a' will be stored in DataMemory together with the Ordered Index on 'a' (which comes from the PRIMARY KEY). The PRIMARY KEY hash index on 'a' will use IndexMemory.

Disk space and disk utilization:
  • Indexed attributes are stored in RAM (DataMemory). However, you can specify for the non-indexed if they should be in RAM or DISK when you create the table (see ref manual, about column definition STORAGE DISK or STORAGE MEMORY).
  • In MySQL Cluster 6.2 , 6.3, and 7.0 disk data attributes are always stored as fixed size (e.g a VARCHAR(256) will always use 256B on disk, even if it is NULL - this will change in 7.1, hopefully).
  • BLOB/TEXTs are special and stored in segments. The first 256B will be stored in RAM, the rest will be stored in segments (depending if it is a MEDIUMBLOB/BLOB/LONGBLOB then the segment size varies (from 2048B up to 8192B iirc).
  • NULLs stored in a BLOB/TEXT will occupy 0B on disk, but 2B overhead of DataMemory.
  • The data files are divided into extents. Each extent is 1024K (1MB). There is no reason to change the extent size. Many records may reside in one extent.
  • If you write something to a disk data table, then the data will be written in as many table spaces as you have NoOfReplicas set to. Thus, typically you will have two copies of the data (one on each node in a node group).
  • The information_schema contains information how much of the table space and undo files that are used. E.g
    select free_extents, total_extents from information_schema.files where file_type='datafile'

Short on disk data internals :

Thanks jonas for your input on this!
  • The DiskPageBufferMemory is a LIRS cache (variant of the LRU cache)
  • Dirty pages are flushed to the table space during an LCP (local checkpoint), but also if they have been inactive for some time (i have no info yet on the inactive time)
  • Changes to Disk data is Redo logged (same ways as with in-memory tables).
  • Changes to Disk data is written to a circular UNDO log (using WAL). This is so that we can undo all the way back to the start of the last completed LCP. This way, we can get to a state where memory and disk are in sync (i.e. references between disk data and in-memory storage are correct) before we start running the REDO log.
  • If a record exists in the DiskPageBufferMemory, it will be returned from there to the application.
  • If a record does not exists in the DiskPageBufferMemory, it will have to be fetched from disk. In case there are no empty pages in the DiskPageBufferMemory, the LRU algorithm will sync dirty pages to disk in order to free up empty pages.
Performance

What to use disk data for?:
  • Storing e.g inactive subscribers in a subscriber database (when activated they you can move them into RAM). If an activated user has been inactive for too long it can be moved back to the disk data storage. This is something you have to write in the application logic.
  • Storing e.g SMS/MMS
  • Logging data
  • Large feeds coming in that you need to persist, but not necessarily access in real-time (i.e<3ms).>
  • Probably more..
Suitable access patterns:
  • You should not use data stored in disk data tables to realtime applications. If you do random io, then you applications will be as fast as the disk.
  • Use simple access patterns - avoid joins on disk data. See it as an extension to RAM where you store data that you don't have to have that frequent access to.
  • If you want frequent access to the data you should have a BIG DiskPageBufferMemory (like in innodb, the innodb_buffer_pool is very important to performance). It is the same here.
  • Range searches will be expensive as disk data is not stored according to the order of a particular index. It is more or less random where data is stored. I will get back on numbers of this.
  • If you use SSD for disk data, then the above might be more or less true. Disk data has not been tested with SSDs yet.
Caveats:
  • Check my earlier blog post for caveats regarding deallocation/allocation of extents.
  • If you restart the cluster with --intial you have to manually delete all the data files and undo log files that was created. These are not cleared even if you start the cluster with --initial.

Disk data counters (more)

In a previous blog "disk data counter" I did a couple of measurements how to configure the disk data and what we can expect.

Now I have redone the insert test to put as much load as possible with respect to the disk sub system.

1. Insert 5M records a' 4096B (1 threads, batches of five):
6.3.19: 286 sec (3721.12 QPS)

2. Insert 5M records a' 4096B (5 threads, batches of five):
6.3.19: 286 sec (8315.36 QPS)

At 2) the IO sub-system is quite loaded (98.2%) and I get (periodically,when the DiskPageBufferMemory is checkpointed):

"REDO buffers overloaded, consult online manual (increase RedoBuffer) - code=1221" (this does not matter, as i retry my transactions in the application).
I could increase the RedoBuffer (it is set to 32M), but there is no point as if I increase the load, then the disks would not be able to keep up any way. Hence, to get better perf now I need to increase the number of disks I can write to.

Here are some numbers for Read and Writes of 1024B records:

3. Insert 5M records a' 1024B (5 threads, batches of five):
6.3.19: 375 sec (13335 QPS)

4. Insert 10M records a' 1024B (10 threads, batches of five):
6.3.19: 506 sec (19737.2 QPS)

5. Insert 20M records a' 1024B (5 threads, batches of five):
6.3.19: 1100 sec (17035.8 QPS)

6. Insert 20M records a' 1024B (10 threads, batches of five):
6.3.19: 945 sec (21140.7 QPS)

7. Insert 20M records a' 1024B (20 threads, batches of five):
6.3.19: 675 sec (29578.9 QPS)

8. Read 1M random records (1024B) out of 20M (10 threads):
(Each thread does 100K reads)
6.3.19: 816 sec (1230.01 QPS)

9. Read 2M random records (1024B) out of 20M (20 threads):
(Each thread does 100K reads)
6.3.19: 683 sec (2911.67 QPS)

10. Read 3M random records (1024B) out of 20M (30 threads):
(Each thread does 100K reads)
6.3.19: 1273 sec (2350.62 QPS)

The setup was same as in disk data counters.
Next I am going to do some tests on ordered index scans (i.e range scans on an in-memory index) and read data stored on disk.

Use this numbers only for an indication what you can achieve since it may differ for your application.

Tuesday, December 09, 2008

Cluster - session management tips and tricks

Many users use MySQL Cluster for session management. In MySQL Cluster tables are checkpointed and redo logged to disk.

But for session data, do we really care?
If the Cluster is history then the sessions are most likely to be obsolete when the cluster recovers.

And do we want to use disk space and disk band width for the session data?

If you answer "no" on the questions above, then you can create a table that is not checkpointed nor redo logged, by setting the session variable ndb_table_no_logging:


mysql> use test;
Database changed

#enable no logging
mysql> set ndb_table_no_logging=1;
Query OK, 0 rows affected (0.02 sec)

#a grossly simplified structure:
# this table will not be checkpointed or redo logged, because of
set ndb_table_no_logging=1;
mysql> create table session(id bigint primary key, data varbinary(1024), expiry_time timestamp) engine=ndb;
Query OK, 0 rows affected (1.21 sec)

# disable creation of tables with no logging (i.e, enable logging again, default behavior):
mysql> set ndb_table_no_logging=0;
Query OK, 0 rows affected (0.00 sec)



To verify that the table is not logged (look for Temporary table):

johan@stingray:~$ ndb_desc -d test session
-- session --
Version: 1
Fragment type: 5
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: yes
Number of attributes: 3 Number of primary keys: 1
Length of frm data: 289
Row Checksum: 1 Row GCI:
1 SingleUserMode: 0
ForceVarPart: 1
TableStatus: Retrieved
-- Attributes --
id Bigint PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY
data Longvarbinary(1024) NULL AT=MEDIUM_VAR ST=MEMORY
expiry_time Timestamp NOT NULL AT=FIXED ST=MEMORY

-- Indexes --
PRIMARY KEY(id) - UniqueHashIndex
PRIMARY(id) - OrderedIndex

NDBT_ProgramExit: 0 - OK


If the Cluster crashes and later recovers the session table will not contain any data, but will be completely empty.

I don't think this works with Disk data tables, but i haven't tried so I don't know the error you will get.

This is a technique that we have implemented with quite a few customers using Cluster for session management.

Friday, November 28, 2008

Disk data extent (de)allocation caveats

Here is a discussion about some caveats with allocation and deallocation of extents in MySQL Cluster (disk data).

Let's assume we have created a table space with one datafile having 5 extents (i.e the datafile is 5MB in since, since the size of each extent is 1MB).

Using the information_schema we can figure out how many free extents we have by executing the following query:

mysql> select free_extents, total_extents from information_schema.files where file_type='datafile';
+--------------+---------------+
| free_extents | total_extents |
+--------------+---------------+
| 5 | 5 |
| 5 | 5 |
+--------------+---------------+

The figure below shows a number of extents that are all free.
We then create a table using the table space and start populating it with data:

mysql> create table t1(a char(255)) engine=ndb tablespace ts1 storage disk;
mysql> insert into t1(a) values ('a');
mysql> insert into t1(a) select a from t1 limit 4096;



After inserting for while you can the table space is full (all extents have been filled):


mysql> select free_extents, total_extents from information_schema.files where file_type='datafile';
+--------------+---------------+
| free_extents | total_extents |
+--------------+---------------+
| 0 | 5 |
| 0 | 5 |
+--------------+---------------+

If I try to insert more data, then I will get:
ERROR 1114 (HY000): The table 't1' is full and I would have to add another datafile to my table space.

Now I decide to delete records:
mysql> delete from t1 limit 4096;

Eventually extents will become empty:

But there are a couple caveats to be aware of!!
Caveat one is that the extents that are free up by the deletion does not show up in the information schema. I still have zero free extents!!:

mysql> select free_extents, total_extents from information_schema.files where file_type='datafile';
+--------------+---------------+
| free_extents | total_extents |
+--------------+---------------+
| 0 | 5 |
| 0 | 5 |
+--------------+---------------+

However, I can insert more records , since there are holes in the extents that can be used.

Now I delete ALL (or almost all, it doesn't really matter) records in t1, so that table is completely empty!!

Caveat two comes when I try to create a new table using the same tablespace:

mysql> create table t2(a char(255)) engine=ndb tablespace ts1 storage disk;
Query OK, 0 rows affected (1.03 sec)

mysql> insert into t2 (a) values ('aaaa');
ERROR 1114 (HY000): The table 't2' is full

The problem is that:
  1. Extents are allocated/assigned to a table
  2. Empty extents followed by DELETE are not deallocated from the table.
Only DROP TABLE deallocate/deassociate the extents making them available to another table!!!

Thus I cannot do:

mysql> insert into t2 (a) values ('aaa')
ERROR 1114 (HY000): The table 't2' is full

But inserting into t1 is still possible:

mysql> insert into t1 (a) select a from t1 limit 4096;
Query OK, 4096 rows affected (0.17 sec)
Records: 4096 Duplicates: 0 Warnings: 0

mysql> insert into t1 (a) select a from t1 limit 4096;
Query OK, 4096 rows affected (0.16 sec)
Records: 4096 Duplicates: 0 Warnings: 0


I hope this will be fixed one day!

(btw, the cmon trunk on http://launchpad.net/cmon monitors the table spaces from today. I will release source distro shortly, but in the time being you can get the source using bazaar-vcs and do bzr branch lp:cmon).

Friday, November 21, 2008

MySQL Cluster 6.3.19

...has just been released:

You can download the GPL source distribution here. The config tool has been updated to use the new version.

Important things in 6.3.19:
  • MySQL Server 5.1.29
  • Faster --initial
  • Undo logs and data files (disk data) is created faster. Now 512K is written at a time instead of 32K.
  • New config parameter: InitFragmentLogFiles=sparse|full (you can choose if you want sparse redo log files or zero filled (full)).
For a full change log please see the reference manual.

To upgrade using severalnines.com/config:
  • copy cluster/config/config.ini to a safe place (if you have made changes to it)
  • copy cluster/config/my.cnf to a safe place (if you have made changes to
    comment out #skip_thread_priority in my.cnf if you have it set (it is deprecated).
  • generate a new config that mimics your current config so that you get exactly the same scripts (but a new version of them).
  • Unpack the mysqlcluster-63.tar.gz
  • copy the config.ini and the my.cnf from the safe place to mysqlcluster-63cluster/config/
  • cd mysqlcluster-63/cluster/scripts
  • sh download-and-compile.sh
  • sh dist.sh
  • sh rolling-restart.sh (will copy out the config files from cluster/config hence it was important to copy them back there)
CMON
If you are using cmon-0.10 (you should upgrade to cmon-0.11) or cmon-0.11:
  • stop cmon
  • start a mysql client:
  • use cmon
  • drop mysql_statistics;
  • drop mysql_variables;
  • start cmon
cmon will now recreate the mysql_statistics and mysql_variables tables. This is because the MySQL server in MySQL Cluster 6.3.19 (and 6.3.18) has added more status variables. If you don't do the above, then cmon cannot aggregate mysql statistics.

Disk data counters

I have measured the time it takes to do some operations with Cluster (6.3.19, which is about to be released very soon - you should upgrade) with respect to the disk, like starting it, creating table spaces etc.

Below all of this you can find the setup I used.

Initial start of Cluster (40GB of redo log files):
6.3.19: 3min 27sec

Create a 20GB undo file:
6.3.19: 6min 17sec

Create a 128MB data file for the tablespace:
6.3.19: ~3 sec

Insert 1M records a' 4096B (1 thread, batches of five):
6.3.19: 286 sec (3721.12 QPS)
(we can probably provision faster with bigger batches or more threads)

I then provisioned another 4M records (total of 5M records in DB).

Evil test: 100K Random reads (read 4096B) (5M records in DB):
6.3.19: 1290.42QPS (20 threads, io util is ~90% so we are almost completely io bound). So this is result is inline what we would expect when being io bound, especially since i have used four data nodes, each having one disk.

Setup:
  • Total of 4 data nodes on 4 computers.
  • Another computer was used to generate the inserts/reads
  • 8GB of RAM
  • Gig-E
  • 1 * 146GB SAS, 10KRPM
  • 128MB IndexMemory
  • 1024MB DataMemory
  • ODirect=1
  • SharedGlobalMemory=384M
  • NoOfFragmentLogFiles=40
  • FragmentLogFileSize=256M
  • DiskPageBufferMemory=3072M
  • Table space (one ts) with 100 datafiles a' 128MB (best practice is to use many but small data files instead of one big. This will be changed in 6.4 so that you can use one big data file).
    Point here is that you should have a few data files. One data file is bad, more than 128 is overkill since the data node won't keep more than that many data files open at once anyways. This affects how many data files the data node can write to in "parallel".
  • Extent size=1MB (which is quite ok)
  • Logfile group: One 20GB undofile and 128MB Undobuffer
    The undo file was a bit too big (not that it matter, i had the disk space) but I used 5366054928 extents out of 21474836480 (so ~25% was only used).
  • There is also a new configuration option in 6.3.19 which lets you create the data files
  • The disk data table looks like (data column will be stored on disk):
    create table dd (
    id integer primary key,
    ts integer,
    data varbinary(4096),
    index(ts)) engine=ndb TABLESPACE ts_1 storage disk;
  • Fedora core 9 ( uname -r --> 2.6.26.6-79.fc9.x86_64 )
(the config files were generated by severalnines.com/config )

Now, especially for inserts/writes there are quite a few things competing for the single disk:
  • REDO log
  • LCP
  • and the disk data itself (UNDO LOG + DATA FILES)
However, the best setup is to (if you have two disks)
  • Disk 1: REDO + LCP
  • Disk 2: UNDO LOG + DATA FILES
Three disks:
  • Disk 1: REDO + LCP
  • Disk 2: UNDO LOG
  • Disk 3: DATA FILES
If you have more than that then you can put DATA FILES and UNDO LOG on a RAID.

IMPORTANT: When you have done an --initial start, the files for the UNDO LOG and the DATA FILES are NOT removed. You have to remove them by hand. Otherwise you will get an error if you try to CREATE LOGFILE GROUP..

Thursday, November 20, 2008

cmon 0.11 released

cmon 0.11 which you can get from www.severalnines.com/cmon corrects the following bugs:
  • Graphs wrap half ways - https://bugs.launchpad.net/cmon/+bug/300222
  • Memory leaks (missed a couple of mysql_free_resultset) https://bugs.launchpad.net/cmon/+bug/300225
There are also some code cleanups.
To upgrade from earlier version:
  • start a mysql client on the mysql server(s) having the cmon database
  • use cmon
  • drop mysql_statistics table
  • drop mysql_variables table
  • start cmon which will then recreate those tables
The above is particularly important if you move from MySQL Cluster 6.3.17 or earlier to 6.3.18 (or later) because the number of statistic counters and variables has increased between the versions. cmon will build up the mysql_statistics and mysql_variables tables dynamically instead (if you have dropped them earlier).

Tuesday, November 04, 2008

CMON - News Flash! Monitor your mysql servers!

Here is a short summary of CMON news:
Feature update of cmon 0.1 (the same as cmon 1.2.2 plus):
  • monitor your mysql servers (add/remove mysql servers that should be monitored)
  • view status and variables of your mysql servers
  • view graphs of status variables for individual mysql servers as well as a grand total.
  • get recommendations on parameter tuning of my.cnf
  • updated web interface (small fixes made)
  • and some bug fixes..
To do
  • add AJAX support to web interface so that graphs are reloaded in a better way
  • finish plugin interface - so that it will be possible to write SNMP, SMTP, etc plugins
  • improve rules for spotting configuration problems in the monitored mysql servers.
  • improved overview of mysql variables.
  • replication channels
Screen shots

Display graphs (both for individual as well as aggregated over all mysql servers):


View the status of your mysql servers that CMON is monitoring (status, variables, graphs). Sub-optimal config values are highlighted in read!:



Add a mysql server that should be monitored by CMON...



Remove a MySQL Server from CMON..


Thursday, October 30, 2008

MySQL Cluster Sandbox

There is also a sandbox for MySQL Cluster!

If you have Linux installed (I have not tested on Solaris) and want to get familiar with MySQL Cluster, or setup a development environment all on your localhost, then MySQL Cluster Sandbox is for you!

Here is the README!

The whole installation is in a single directory and easy to remove.

Optimizing Queries on Cluster

On the cluster mailing list (cluster@lists.mysql.com , thread "slow selects") there was recently a very good example of how to optimize queries on Cluster. Thanks to Nick Keefen for raising this problem and Jeff Sturm for the answer how to solve it!

In short the problem is that the Optimizer in the MySQL server does not get adequate statistics from the data nodes about table sizes, indexes etc. This makes the Optimizer clueless in some cases how to order tables in joins, and also in some cases which is the best indexes to use.

So here is the problem that Nick highlighted:

When the tables are stored in MyISAM:
mysql> SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) limit 3;
3 rows in set (0.32 sec)

phpfox_user is about 100000 rows and phpfox_videos is 170000 rows large

Trying to run the same query on my cluster machine, i get
3 rows in set (20.47 sec)

Why is this? Let's look at the EXPLAIN of the queries:


MyISAM explain:

mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) ORDER BY v.vid_time DESC LIMIT 3;
+----+-------------+-------+--------+---------------+---------
+---------+---------------------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------
+---------+---------------------+--------+----------------+
| 1 | SIMPLE | v | ALL | vid_userid | NULL | NULL | NULL | 135025 | Using filesort |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | nktest.v.vid_userid | 1 | |
+----+-------------+-------+--------+---------------+---------
+---------+---------------------+--------+----------------+


NDB explain:

mysql> explain SELECT v.vid_id, v.vid_title, u.user FROM phpfox_user AS u JOIN phpfox_videos AS v ON(v.vid_userid = u.id) ORDER BY v.vid_time DESC LIMIT 3;
+----+-------------+-------+------+---------------+-----------
-+---------+-----------------+-------+------------------------
---------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-----------
-+---------+-----------------+-------+------------------------
---------+
| 1 | SIMPLE | u | ALL | PRIMARY | NULL | NULL | NULL | 82124 | Using temporary; Using filesort |
| 1 | SIMPLE | v | ref | vid_userid | vid_userid | 4 | bb2_phpfox.u.id | 1 | |
+----+-------------+-------+------+---------------+-----------
-+---------+-----------------+-------+------------------------
---------+


Note the additional 'Using temporary' clause for NDB. Nearly all of the 20 seconds taken by the query fall at 'copying to temporary table'.


The problem is that the Optimizer has changed the order (in case of NDB) how tables should be joined!

Jeff Sturm replied to Nick with a remedy:


Looks like the query optimizer is putting the tables in the wrong order.
(This isn't uncommon in my experience, as NDB seems to have less
information than other engines to optimize queries at runtime.)

Try modifying your query with STRAIGHT_JOIN syntax, i.e.:

SELECT v.vid_id, v.vid_title, u.user FROM phpfox_videos AS v
STRAIGHT_JOIN phpfox_user AS u ON v.vid_userid = u.id ORDER BY
v.vid_time DESC LIMIT 3;


and the problem was solved. The query now runs in split second.
Moreover, in some cases you also want to use USE INDEX or FORCE INDEX.
There are situations where the Optimizer wants to use e.g index a_idx to resolve the query, but really index b_idx is the one it should use!
Thus you can use:
SELECT * FROM t1 FORCE INDEX(b_idx) , t2 WHERE ...

Good luck! And yes, one day the Optimizer will get the good statistics!

Tuesday, October 21, 2008

DiskPageBufferMemory

I get a few questions how to set DiskPageBufferMemory.

Before discussing that let's recap a few things:
  • Non-indexed attributes (or explicitly defined in CREATE TABLE..) are stored on disk in table spaces.
  • Indexed attributes are stored in DataMemory
  • The hash of the Primary Key is stored in the IndexMemory
Determining how much IndexMemory/DataMemory/Tablespaces that are needed, you can use the dimensioning toolkit.

Now you can setup Undobuffers etc according to my post on disk data a while back.

But what about the DiskPageBufferMemory?! If you have used innodb, then you know the innodb_buffer_pool. This is basically the same. You need to set it big so that you cache as much as possible of the active (most frequently used) data set on disk.

We need to know what the data node is using memory for so that we can set the DiskPageBufferMemory appropriately:
  • Internal buffers (RedoBuffer, send buffers): 900MB (if you use the config files from the config tool)
  • OS will use ~1GB (this really too much and can be trimmed)
  • DataMemory: X MB (derived from the dimensioning tool), say 2GB (for the example below)
  • IndexMemory: Y MB (derived from e.g the dimensioning tool), say 256MB (for the example below)
If I have 8GB of RAM in my computer, I would set the DiskPageBufferMemory to:

DiskPageBufferMemory = 0.8 x [ 8GB - 1GB (OS) -900MB (internal buffers) - 2GB (DataMemory) - 256MB (IndexMemory) ] = ~3072MB

I take 0.8x just to leave some slack if I need to reconfig som other buffer later. After a while, when the system is tuned in I can then chose to use those extra MB to extend the DiskPageBufferMemory.

And yes, the statistics to monitor the DiskPageBufferMemory (and all other params as well) are missing. But behold, it will come!

And let me know if you have tested disk data with SSD devices!

Tuesday, October 07, 2008

CMON binaries for Solaris 10 (sparc 64-bit)

I have built binaries for Solaris 10 (sparc 64-bit). You can download them at http://www.severalnines.com/cmon/
Moreover a version 1.2.1 of cmon was also released as it fixed some nuisances (fixed so it is possible to actually build it on Solaris :).

Read the release notes for more info.

Upgrade from MySQL Cluster 6.2 to 6.3

Is it possible to upgrade from MySQL Cluster 6.2 to MySQL Cluster 6.3? Of course!
Can it be done online? Sure!

Here is how!

I started off as with the cluster in the picture below:



All you need to do is as follows:
  • Copy out the new binaries (MySQL Cluster 6.3) on to all hosts
  • Install the new version (please note that you have not restarted anything yet - we are about too). The install phase is just to copy out the binaries of the new version to all hosts and make sure that you are referencing the new version of the Cluster binaries in scripts, PATHs etc.
After that you need to:
  1. restart both (all) management servers at the same time!
    (TRICK! If the management servers are using a connect-string that only references itself, then you can restart them one at a time. Otherwise they risk picking up the other ones configuration! So if you have two hosts where the management servers run, A and B, then the connect strings should be:
    • For the management server on host A: ndb_mgmd -c "A"
    • For the management server on host B: ndb_mgmd -c "B"
    • And NOT ndb_mgmd -c "A;B" , then you risk that they pick up the wrong configuration data (unless you stop them both at the same time and then restart them) and it would be really bad!
  2. ndb_mgm -e "show" --> verify that all of them have version 6.3.x. The data nodes and mysql servers will still show they are version 6.2.x
  3. restart the data nodes (one at a time) until you have restarted all of them
  4. verify with ndb_mgm -e "show" that the data nodes all have the correct version (6.3.x)
  5. restart the mysql servers (one at a time) until you have restarted all of them.
  6. verify with ndb_mgm -e "show" that the data nodes all have the correct version (6.3.x)
While I am restarting the cluster (i am using the configuration tool and monit and cmon) you see the following in cmon:



And when the rolling restart was complete I have:

Actually, for a short period of time, the web browser will show a blank page for the duration when the management servers are stopped and restarting!

If you had used the configuration tool to generate scripts and a configuration for MySQL Cluster 6.2, then you can generate a new configuration and scripts for MySQL Cluster 6.3 (the two configurations must be identical in terms of number of nodes (data memory, data nodes, management servers, mysql servers) and then use those scripts:
  1. sh ./download-and-compile.sh (download and compiles latest MySQL Cluster 6.3 - works probably only for Linux)
  2. sh ./dist.sh
or if you can find the correct version at dev.mysql.com/downloads/cluster then you can do
  1. sh ./dist-tgz.sh <name of binary dist(rpms does not work>
  2. sh ./rolling-restart.sh (sh ./rolling-restart-monit.sh if you are using monit to guard your processes).
I used the approach above together with cmon.

Friday, October 03, 2008

cmon 1.2 - released

You can now download cmon 1.2.2 at www.severalnines.com/cmon
Thank you everyone for your feedback.
Bugs fixed in cmon 1.2.2:
Bugs fixed in cmon 1.2.1:
  • Fixed a number of issues with building cmon on Solaris.
  • Updated the compile instructions adding a note how to build it on Solaris.

Bugs fixed in cmon 1.2:
  • --logfile - specifying a custom logfile was not working earlier and cmon exitted. Now this is fixed!
Features:
  • new configure option (--with-wwwroot): ./configure --with-wwwroot=</path/to/wwwroot/>
  • make install-extra: install cmon/www/ files into wwwroot directory and also install initd/cmon into /etc/init.d/cmon. This requires that you are root.
  • jpgraph-1.26 is bundled (in www/cmon) and is installed with make install-extra.

Tuesday, September 30, 2008

MySQL Cluster 6.3.17 - binaries released

Check out http://dev.mysql.com/downloads/cluster/! Finally, there are binaries for MySQL Cluster 6.3!

Moreover, read my previous post :)

Indestructible Cluster part 2

In part 1 we got the cluster monitoring up and running. Indeed a good step forward into taming Cluster. This time we are going to add process management to our solution - if a node fails (ndbd, ndb_mgmd, mysqld etc) fails, then it should be automatically restarted.

To do this we need:
  • initd scripts for each process so there is a way to start and stop the processes.
  • process management software (i have chosen monit for this).
The initid scripts presented below are templates and you may have to change things such as paths.
For monit we do not need init.d scripts as we will install monit in inittab so that the OS (I live in a Linux world so if you use Solaris or something this may not necessarily apply in that environment).

initd scripts - walkthrough
We need initd scripts for the following processes:
  • ndb_mgmd - (scripts open in new tab)
    In this script you should change:

    host=<hostname where the ndb_mgmd resides>
    configini=<path to config.ini>
    bindir=<path to ndb_mgmd executable>

    If you have two ndb_mgmd, you need to copy this file to the two computers where you have the management server and change the hostname to match each computer's hostname.
  • ndbd
    If you run >1 ndbd on each computer use the ndbdmulti script.
    In this script you should change:

    host=<hostname where the ndbd runs>
    connectstring=<hostname_of_ndb_mgmd_1;hostname_of_ndb_mgmd_2>
    bindir=<path to ndbd executable>
    ##remove --nostart if you are not using cmon!!
    option=<--nostart>
    ##if you use ndbdmulti you need to set the ndbd node id here (must correspond to Id in [ndbd] section of your config.ini.
    nodeid=<nodeid of data node on host X as you have set in [ndbd] in config.ini>


    If you use ndbdmulti, then you need to have one ndbdmulti script for each data node you have on a host, because each ndbdmulti script must have its unique nodeid set. Thus you have e.g /etc/initd/ndbdmulti_1 and /etc/initd/ndbmulti_2.

  • mysqld
    In this script you may want to change the following things to reflect your configuration.

    basedir=<e.g /usr/local/mysql/>
    mycnf=<path to my.cnf>
    bindir=<e.g /usr/local/mysql/bin/>

  • cmon
    In the initd script for cmon you can change the following things.
    However, it is recommended to have a mysql server and a management server on the same host/computer as where cmon is running.

    bindir=<by default cmon is installed in /usr/local/bin>
    ## the following libdir covers the most common places for the mysql libraries, otherwise you must change so that you have libndbclient on the libdir path.
    libdir=/usr/local/mysql//mysql/lib/mysql:/usr/local/mysql//mysql/lib/:/usr/local/mysql/lib/:/usr/local/mysql/lib/mysql
    OPTIONS - there are a bunch of cmon options you might have to change as well. By default cmon will try to connect to a mysql server on localhost, port=3306, socket=/tmp/mysql.sock, user=root, no password, and to a managment server (nbd_mgmd) on localhost
install init.d scripts
You should copy the scripts to /etc/init.d/ on the relevant computers and try them before moving on!

If all scripts are tested and ok, Then you need to add the scripts to the runlevels so that when the computers reboot they are started automatically.

# on the relevant host(s):
sudo cp cmon /etc/init.d/
# on the relevant host(s):
sudo cp ndb_mgmd /etc/init.d/
# on the relevant host(s):
sudo cp ndbd /etc/init.d/
# on the relevant host(s):
sudo cp mysqld /etc/init.d/


On Redhat (and I presume Fedora, OpenSuse, Centos etc):

# on the relevant host(s):
sudo chkconfig --add cmon
# on the relevant host(s):
sudo chkconfig --add ndb_mgmd
etc..

On Ubuntu/Debian and relatives:

# on the relevant host(s):
sudo update-rc.d cmon start 99 2 3 4 5 . stop 20 2 3 4 5 .
# on the relevant host(s):
sudo update-rc.d ndb_mgmd start 80 2 3 4 5 . stop 20 2 3 4 5 .
# on the relevant host(s):
sudo update-rc.d ndbd start 80 2 3 4 5 . stop 20 2 3 4 5 .
etc..

Reboot a computer and make sure the processes start up ok.
If you are going to use cmon then cmon will control the start of the data node. If the data nodes are started with --nostart the data nodes will only connect to the management server and then wait for cmon to restart the data node.

monit - build
The first thing we need to do is to build monit on every computer:
download monit
unpack it
tar xvfz monit-4.10.1.tar.gzcompile
cd monit-4.10.1
./configure
make
sudo make install
I think you have to do this on every computer (I have not tried to copy the binary from one computer to every other computer).

Now when monit is installed on every computer we want to monit it is time to define what monit should monitor. Basically monit checks a pid file. If the process is alive, then great else monit will restart it. There are many more options and I really suggest that you read the monit documentation!

monit - scripts
monit uses a configuration file called monitrc. This file defines what should be monitored. Below are some examples that I am using:
  • monitrc for ndb_mgmd, mysqld, and cmon - (scripts open in new tab)
    You should copy this script to /etc/monitrc:
    sudo cp monitrc_ndb_mgmd /etc/monitrc
    and change the permission:
    sudo chmod 700 /etc/monitrc
    Then you should edit monitrc and change where the pid files are (it is likely you have different data directories than me):
    check process ndb_mgmd with pidfile /data1/mysqlcluster/ndb_1.pid
    etc..

  • monitrc for ndbd
    If you have >1 ndbd on a machine, use monitrc for multiple ndbd on same host
    You should copy this script to /etc/monitrc:
    sudo cp monitrc_ndbd /etc/monitrc
    and change the permission:
    sudo chmod 700 /etc/monitrc
    Then you should edit monitrc and change where the pid files are (it is likely you have different data directories than me):
    check process ndbd with pidfile /data1/mysqlcluster/ndb_3.pid
    etc..

  • monitrc for mysqld - (scripts open in new tab)
    You should copy this script to /etc/monitrc:
    sudo cp monitrc_mysqld /etc/monitrc
    and change the permission:
    sudo chmod 700 /etc/monitrc
    Then you should edit monitrc and change where the pid files are (it is likely you have different data directories than me and hostnames):
    check process mysqld with pidfile /data1/mysqlcluster/ps-ndb01.pid
    etc..

monit - add to inittab
The final thing is to add monit to inittab so that if monit dies, then the OS will restart it!

On every computer you have to do:
echo '#Run monit in standard run-levels' >> /etc/inittab
echo 'mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc' >> /etc/inittab
monit will log to /var/log/messages so tail it check if all is ok:
tail -100f /var/log/messages
If not then you have to change something :)
If it seems ok, then kill a process that is managed:
killall -9 ndbd
and monit will automatically restart it:
Sep 30 13:15:44 ps-ndb05 monit[4771]: 'ndbd_3' process is not running
Sep 30 13:15:44 ps-ndb05 monit[4771]: 'ndbd_3' trying to restart
Sep 30 13:15:44 ps-ndb05 monit[4771]: 'ndbd_3' start: /etc/init.d/ndbd_3
Sep 30 13:15:44 ps-ndb05 monit[4771]: 'ndbd_7' process is not running
Sep 30 13:15:44 ps-ndb05 monit[4771]: 'ndbd_7' trying to restart
Sep 30 13:15:44 ps-ndb05 monit[4771]: 'ndbd_7' start: /etc/init.d/ndbd_7
Sep 30 13:15:49 ps-ndb05 monit[4771]: 'ndbd_3' process is running with pid 5680
Sep 30 13:15:49 ps-ndb05 monit[4771]: 'ndbd_7' process is running with pid 5682

So that should be it. If you have done the things correctly you should now have a MySQL Cluster complete with monitoring and process management. As easy way to accomplish this is the use the configuration tool, which generates the various scripts (initd, build scripts etc)!

Please let me know if you find better ways or know about better ways to do this!

Monday, September 29, 2008

cmon 1.1 - released

New features in cmon 1.1:
  • used-defined log destination (earlier only possible to log to syslog or console)
    cmon --logfile=/tmp/cmon.log
    The logs are rotating and rotate when 1MB size has been reached.
  • init.d script - in initd/cmon is a script that can be used to start and stop cmon. You may have to edit the content of that script (e.g paths).
Bugs fixed in cmon 1.1:
  • various issues in the php scripts
  • fixed queries managing the backup and backuplog tables (the queries periodically removed all entries). Now the backup/backuplog/restore/restorelog tables can max contain 100 records, based on age.
  • Improvements in documentation around how to get graphs working (you need php-gd).
Get cmon at http://www.severalnines.com/cmon/

Friday, September 26, 2008

cmon 1.0.1 released

Thank you all for you feedback and especially to Matt and Oli for spotting some serious issues. Hence a new version cmon 1.0.1 is out.


Bugs fixed in 1.0.1 is:
  • option handling is fixed (--mysqluser and --mysqlpasswd was not handled properly).
  • initd/cmon (the documentation talks about this initd file) is now included
    Please not that you may have to edit and change it a bit (PATHs etc).

Version 1.1 will include:
  • possibility to log to other place than syslog.

Wednesday, September 24, 2008

Indestructible Cluster part 1

Version 1.0 of CMON (gpl license) has now been released.
It is dead easy to use and gives you a lot of helpful information about the state of the cluster. Memory utilization, statistics etc.

You can also get graphical representation using a web server and the php scripts that comes with cmon.

If you use it with the new and improved config tool then you can setup a system with process management (monit) and init.d scripts etc in minutes, and cmon will then automatically coordinate the start procedures of the data nodes in the cluster!

First you have to download cmon and then you can check out the documentation for it.

Then you have to compile it:

tar xvfz cmon-1.0.tar.gz
cd cmon-1.0
sh ./autogen.sh
./configure
make
make install ## -- optional step - will install into /usr/local/bin as default and you need to be root


The easiest and recommended way is to have a mysql server and a management server on the computer where you are installing cmon. This makes things simpler.

Now you can start cmon which will daemonize itself (start with --nodaemon if you don't want this to happen):

/usr/local/bin/cmon ##if you did 'make install' above
or
src/cmon ##if you didn't 'make install' above

This corresponds to starting cmon with the following arguments:

cmon --ndb-connectstring=localhost \
--mysqlhost=localhost \
--mysqluser=root \
--mysqlpasswd="" \
--mysqlsocket="/tmp/mysql.sock"\
--savetime-clusterlog=24 \
--savetime-statistics=24

cmon will:
  • automatically create a database called 'cmon' on the localhost the necessary tables.
  • log to syslog (tail -f /var/log/message )

If you start up cmon and tail /var/log/message you will get:

Sep 16 23:14:09 ps-ndb01 cmon: Starting cmon with the following parameters: --mysqldatabase = cmon --mysqluser = root --mysqlpasswd = --mysqlhost = localhost --mysqlport = 3306 --mysqlsocket = /tmp/mysql.sock --ndb-connectstring = localhost --savetime-clusterlog = 24 hours --savetime-statistics = 24 hours
Sep 16 23:14:09 ps-ndb01 cmon: If that doesn't look correct, kill cmon and restart with -? for help on the parameters, or change the params in /etc/init.d/cmon
Sep 16 23:14:09 ps-ndb01 cmon: Creating pid file /tmp/cmon.pid
Sep 16 23:14:09 ps-ndb01 cmon: Creating database 'cmon' if it does not exist
Sep 16 23:14:09 ps-ndb01 cmon: Table 'backup' did not exist. Schema incomplete. Going to recreate missing tables.
Sep 16 23:14:09 ps-ndb01 cmon: Recreated missing tables
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'backup'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'backup_log'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'restore'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'restore_log'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'cluster_log'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'cluster_state'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'memory_usage'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'node_state'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'cluster_statistics'
Sep 16 23:14:09 ps-ndb01 cmon: Created table 'node_statistics'
Sep 16 23:14:09 ps-ndb01 cmon: Done
Sep 16 23:14:09 ps-ndb01 cmon: Setting up threads to handle monitoring
Sep 16 23:14:09 ps-ndb01 cmon: Starting MonitorThread
Sep 16 23:14:09 ps-ndb01 cmon: Starting LogEventThread
Sep 16 23:14:09 ps-ndb01 cmon: connected to mysqld
Sep 16 23:14:09 ps-ndb01 cmon: connected to ndb_mgmd on host localhost, port 1186


Using the PHP scripts that are located in the 'www' directory and a webserver you can get nice stats about Cluster. Such as:

..and per node statistics:

..and memory usage:

... and detailed backup and restore statistics:
You are actually getting every stat you can get from MySQL Cluster at this stage. And if you don't want to use the PHP scripts, you can access the same information using plain SQL, since cmon logs everything to normal SQL tables.

And yes, I have only tested this on Linux deployed on a cluster consisting of HP DL385 systems with x86-64 architecture (in this case dual cpu dual core Intel(R) Xeon(R) CPU 5160 @ 3.00GHz and dual cpu quad core Intel(R) Xeon(R) CPU E5345 @ 2.33GHz ). They are a bit old, but good.

Let me know if you have problems or questions on the above:
johan at severalnines dot com

Monday, September 22, 2008

Sun CMT and Cluster

After joining SUN I have seen and heard more and more people/customers that are wanting or planning to deploy MySQL Cluster on SUN's CMT technology.

For a bit more information on CMT, please read what Allan Packer, SUN, has written more on CMT hardware on his blog.

Looking at MySQL Cluster, and in particular the data nodes, they like:
  1. Fast CPUs (two cores per data node), 2+GHz CPU.
  2. RAM, as fast as possible.
  3. Fast disks (SSD would be great fun to try in combo with Cluster). Fast disks are important especially for write-heavy applications.
CMT does not offer 1). CMT technology is currently quite slow (slightly above the 1GHz line). So running data nodes on CMT is not a good idea, which is also indicated by customer benchmarks. But what the data nodes are good at is handling a great deal of load in parallel.

How to generate a lot of parallel load? Well, use Sun CMT in the application layer (mysqld + app/web servers).

This is also inline with the trends in the industry where you don't perform heavy operations on the precious CPUs used by the storage layer, but rather do computation etc at the application layer. An excellent fit for SUN CMT, which has ample of threads to throw at a problem! Also, SUNs Java technology is optimized for CMT making it an extremely good idea to use Glassfish as the app server in this scenario.

Now, bear in mind that all nodes making up a MySQL Cluster (6.2 and 6.3) requires the same endianess, so you can't combine CMT with AMD/Intel architecture at the storage level yet (this got to get fixed).

This means I would use two of these for the data nodes and the final architecture would look as follows which would be nearly indestructible (with redundant BIGIP Load balancers, switches, data nodes etc).


Tuesday, August 12, 2008

Cluster Monitoring - CMON

Jan 14th 2010 - See this post instead: http://johanandersson.blogspot.com/2010/01/cmon-install-instructions.html ! For installation instrictions.

A system is not a system unless it can be monitored and controlled...I recall something like that from a university course in Control Theory.

However, in a couple of blogs from now on I am going to give some ideas on how to monitor and control MySQL Cluster. To control, we need to have process management software that automatically restarts processes.

To do monitoring we need to have a tool to collect information... and here is a new monitoring tool, called CMON (instructions further down). CMON aggregates information from MySQL Cluster that earlier was only accessible from the cluster log or the management client, such as:
  • cluster state
  • node state
  • backup statistics
  • statistics
  • cluster events (cluster log basically)
.. and let's you access the information using SQL, because CMON logs the information into ordinary MYISAM tables! So, it is really easy to use!
Here are some examples:

-- get memory usage
select nodeid,100*(pg_used_im/pg_tot_im) as im_used_percent from memory_usage order by nodeid;
select nodeid,100*(pg_used_dm/pg_tot_dm) as dm_used_percent from memory_usage order by nodeid;


-- get status of all data nodes
select * from node_state where node_type='NDBD';

-- backup status
select * from backup;

-- query the cluster log e.g,
select * from cluster_log where event="DISCONNECT";

Note: You must have MemReportFrequency=30 (or some other value) set in config.ini to get the Memory usage events!!

There is a README in the tarball that describes how to install it. If you want the long version, please read on.

Deployment (example HA environment)
Here is a recommended setup that we will base the setup and installation CMON on. In an HA environment we typically have a setup like shown in the picture below:


We have a number of processes that are deployed on its own hardware (hostA, hostB etc). Now, let's add CMON to this layer. I would add CMON to the Management Layer, and I would also add a mysqld on that layer so that CMON can log locally. The picture would then look like:


The MySQL server on hostA and hostB does not have to be connected to the Cluster but can be standalone. In later blogs we are going to add process management to this picture as well. As you can see we have a redundant pair of CMONs, and thanks to the SQL interface to cluster state information it is easy to write web applications to hook into this. So we are all good.

Moreover, a solution like the above picture can easily be generated using the config tool.

Installation
In order to install CMON on a computer corresponding to hostA or hostB in the picture above.
  1. Download the code
  2. tar xvfz cmon.tgz
  3. make sure you have mysql_config on the path!
    1. if you have used the config tool (and the default install path):
    2. export PATH=/usr/local/mysql/mysql/bin:$PATH
  4. cd cmon
  5. sh autogen.sh
  6. ./configure
  7. make
  8. make install
  9. export LD_LIBRARY_PATH=/usr/local/mysql/mysql/lib:/usr/local/mysql/mysql/lib/mysql
  10. cmon --help
  11. if you have a mysqld started on localhost
    1. mysql> source sql/cmon.sql
  12. cmon &
    1. this is equivalent to:
      cmon --mysqluser='root' --mysqlpasswd='' \
      --mysqlhost='localhost' --mysqlport=3306 \
      --mysqlsocket='/tmp/mysql.sock' \
      --ndb-connectstring='localhost' &

    2. if you don't have the mysqld on localhost you might have to GRANT and use the options that you get from cmon --help to connect correctly to the mysql server.
  13. You can find more example queries in sql/queries.sql
Ps, if I get more that 100 unique downloads I will maintain this :)

And please let me know what is wrong with.. johan at severalnines dot com!
On the road map:
  • make it as a real daemon
  • add more data to the tables
  • notifications
And I would be very grateful if someone could write a php script that could present the average load (like ops_per_sec etc) in a graph on a web page.
Because I have no idea how to do that.

Known issue: sometimes the cmon crashes on startup. The crash actually happens in libmysqlclient_r. However, it is just to start it again if this happens. I am looking into this.


Aug 13. 2008 - fixed bug in cmon.sql script.

Please note that all this is of MySQL Cluster 6.2 and MySQL Cluster 6.3.

Good luck!

Sunday, August 10, 2008

Evaluation and Development Packages (UPDATE)

I have changed the scripts a bit.

Now everything is "self-contained" within the package so no directories outside mysqlcluster-63 is created.

You can now install this as any user! Before you needed to be root, but this is no longer needed! However, you should avoid installing it on an NFS mounted volume.

There is also a README that describes the directory and file structure and how to get going.

Wednesday, August 06, 2008

Evaluation and Development Packages

Hello Cluster users,

I have put together a set of evaluation and development packages that allows you to install a MySQL Cluster on "localhost" in a matter of seconds (ok, a minute is more realistic). If you want a fully distributed setup, please use the configuration tool.

The packages comes in four sizes (tiny, small, medium, large) and deploys a cluster consisting of:
  • 1 management server
  • 2 data nodes
  • 2 mysql servers
All installed on localhost. Of course, you also get a set of scripts to manage your cluster!

Check it out on http://www.severalnines.com/eval/
Download a package and follow the instructions below (tested on Linux):

To install and start MySQL Cluster 6.2:

tar xvfz mysqlcluster-62.tgz
cd mysqlcluster-62
cd scripts
## Download the latest MySQL Cluster 6.2 or use wget..
## Distribute the binary to "localhost" in this case
bash dist.sh mysql-cluster-gpl-6.2.15-linux-i686-glibc23.tar.gz
## Copy config files, install mysql servers (no processes are started yet)
bash bootstrap.sh
(bash stop-cluster.sh #i recommend this to get rid of any running mysql or cluster nodes)
## Start the cluster
bash start-cluster-initial.sh
## Start a mysql client to mysql server 1
bash mysqlclient1.sh
## In another terminal, start a mysql client to mysql server 2
bash mysqlclient2.sh


To install and start MySQL Cluster 6.3:

tar xvfz mysqlcluster-63.tgz
cd mysqlcluster-63
cd scripts
## Download the latest 6.3 src and compile it.
bash download-and-compile.sh
## Distribute the binary to "localhost" in this case
bash dist.sh
## Copy config files, install mysql servers (no processes are started yet)
bash bootstrap.sh
(bash stop-cluster.sh #i recommend this to get rid of any running mysql or cluster nodes)
## Start the cluster
bash start-cluster-initial.sh
## Start a mysql client to mysql server 1
bash mysqlclient1.sh
## In another terminal, start a mysql client to mysql server 2
bash mysqlclient2.sh

Testing the setup
In one terminal, we call it term1:

bash mysqlclient1.sh


In another terminal, we call it term2:

bash mysqlclient2.sh


In term1:

mysql> use test;
mysql> create table t1 ( a integer primary key, b integer) engine=ndb;
mysql> insert into t1 values(1,1);


In term2:

mysql> use test;
mysql> select * from t1;



Another script to launch the management client:

bash ndb_mgm.sh
ndb_mgm> all report mem

There are also other scripts to take backups, conduct rolling restarts etc.

Good luck. Future examples will be based on the infrastructure provided by severalnines.com.

Friday, June 27, 2008

ndb_restore tricks

With ndb_restore it is actually possible (talking MySQL Cluster 6.2 and 6.3 here, haven't looked at 5.0 for a long time) to restore a single table into a CSV file that can be loaded using LOAD DATA INFILE.

This can be useful if e.g:
  1. You want to extract data from a cluster table and load it into MyISAM/Innodb. This can be useful for reporting/data-mining purposes.
  2. You want to load data back into Cluster but into another table looking the same as the original table (not to spoil the original data) for some reason.
Number 2) could be used in scenarios where you want to upgrade the schema by e.g changing the column length of a char(10) to char(255) (not an online operation) and add a bunch of other stuff. Please note that add column is an online operation, changing the column length is not.

Here is an example on how you can do it generate a CSV file using ndb_restore.


I have the following setup:


Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=3 @10.128.22.129 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 0, Master)
id=4 @10.128.22.130 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 0)
id=5 @10.128.22.129 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 1)
id=6 @10.128.22.130 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 1)

[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.128.22.127 (mysql-5.1.24 ndb-6.3.15)
id=2 @10.128.22.128 (mysql-5.1.24 ndb-6.3.15)

[mysqld(API)] 37 node(s)
id=7 @10.128.22.127 (mysql-5.1.24 ndb-6.3.15)
id=8 @10.128.22.128 (mysql-5.1.24 ndb-6.3.15)
id=9 (not connected, accepting connect from any host)



I created two tables, t1, and t2 and filled them with some data. I will not show this step here.

First you need to take a backup:
  • ndb_mgm -e "start backup"
I have used the scripts generated by the Configuration Tool and do:
  • sh start-backup /tmp
The script aggregates the backups files generated by each individual data node (since the backup is distributed) to a central place. In this case /tmp on the host I run the script from.

I have the following backup files in /tmp/

[root@ps-ndb01 BACKUP-2008Jun27-162551]# pwd
/tmp/BACKUP-2008Jun27-162551
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ls -al
total 136
drwxr-xr-x 2 root root 4096 Jun 27 16:43 .
drwxrwxrwx 5 root root 4096 Jun 27 16:25 ..
-rw-r--r-- 1 root root 13116 Jun 27 16:25 BACKUP-2-0.3.Data
-rw-r--r-- 1 root root 13620 Jun 27 16:25 BACKUP-2-0.4.Data
-rw-r--r-- 1 root root 14216 Jun 27 16:25 BACKUP-2-0.5.Data
-rw-r--r-- 1 root root 12312 Jun 27 16:25 BACKUP-2-0.6.Data
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.3.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.3.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.4.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.4.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.5.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.5.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.6.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.6.log
[root@ps-ndb01 BACKUP-2008Jun27-162551]#


As you see, there are one data file (.Data) for each data node where the backup came from.
Now I want to generate a CSV file containg only the data from table t1! Since the backup is from four data nodes I have to run ndb_restore four times, once for each backup file (3, 4, 5, 6).


[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 3 --fields-enclosed-by="'" --fields-terminated-by=',' --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 4 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 5 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 6 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1

What has happened now is that a file, t1.txt, has been generated containing:

[root@ps-ndb01 BACKUP-2008Jun27-162551]# cat t1.txt
'9','hello'
'10','hello'
'15','hello'
'3','hello'
'6','hello'
'7','hello'
'16','hello'
'2','hello'

Voila! This file can now be loaded using LOAD DATA INFILE...

About the flags used in the ndb_restore program:
--verbose=0 - restore silently
--print_data - print the data, do not restore to the Cluster
-b - backup-id, in this case backup-id=2
-n - id of the data node where the backup came from (in this case we have data nodes 3,4,5 and 6.
--tab - directory to write the csv file (in this case . which means the cwd).
--append - only used for the last three runs of the ndb_restore. This means that the data will be appended to t1.txt


Good luck,
j

Thursday, June 19, 2008

New version of the Dimensioning Toolkit

2009-07-15: The Dimensioning Toolkit has been replaced with sizer.

I have put up a new version of the Dimensioning Toolkit.
Now, there is a particular version for MySQL Cluster 6.2 and MySQL Cluster 6.3.

The reason for this split is that the libraries and include files are in different locations (include/ and lib/ in 6.2, but lib/mysql and include/mysql in 6.3).
I don't know why the build team changes this between versions (even builds) and so often... very annoying.

Anyways, follow the links to download:

Dimensioning Toolkit for MySQL Cluster 6.2
Dimensioning Toolkit for MySQL Cluster 6.3

Disk data tables

I got a few questions about how to configure the cluster for disk data.
The new version of the Dimensioning Toolkit does a better job calculating:
  • UNDO LOG file size
  • TABLESPACE size
  • UNDO Buffer size
What I write about below is taken into account in the Dimensioning Toolkit. Moreover, the things below only applies to disk data tables (read about the generic stuff about them in the reference manual).

UNDO LOG and UNDO BUFFER
Disk data tables make use of an UNDO log and an UNDO buffer.
The size of the undo log and the undo buffer is specified when you create the logfile group.

I recommend setting:
  • UNDO log size = 4 to 6 times the DataMemory, thus the same size as for the REDO log.
  • UNDO buffer size= 32M
The memory for the UNDO buffer is allocated from the SharedGlobalMemory, which is a parameter in the [ndbd default] section.
In order to have an UNDO buffer that is 32M, you need to set:

[ndbd default]
...
SharedGlobalMemory=256M
...

Current version of the Configuration Tool sets this value for you, and the Dimensioning Toolkit calculates the size of the UNDO log file for you.

In order to create a log file group with the specified values you then do:

CREATE LOGFILE GROUP lg ADD UNDOFILE 'undo1.dat'
INITIAL_SIZE=[4-6]*DataMemory (in MB) ##e.g DataMemory=1024M --> INITIAL_SIZE=6144M
UNDO_BUFFER_SIZE=32M
ENGINE=NDB;

For MySQL Cluster 6.2, it should be 6xDataMemory. For MySQL Cluster 6.3 it should be 4xDataMemory (this because 6.3 writes only two LCPs)



TABLESPACE
Next thing is to create a table space. A table space is a collection of one or more data files.
Benchmarks indicates it is better to have many smaller data files than one giant data file. This has to do with how the data nodes are handling open files (one thread for each open file). More files --> more threads...

Again, the Dimensioning Toolkit will calculate the total size you need for the table space, but you can later on add more data files online if you wish.

If you want to create a table space that is 1GB in size I would do like:

CREATE TABLESPACE ts_1
ADD DATAFILE 'data1.dat'
USE LOGFILE GROUP lg
INITIAL_SIZE=128M ##one data (data1.dat) file with size = 128M
ENGINE=NDB;

Then add more data files:

ALTER TABLESPACE ts_1 ADD DATAFILE 'data2.dat' INITIAL_SIZE=128M ENGINE=NDB;
...
ALTER TABLESPACE ts_1 ADD DATAFILE 'data10.dat' INITIAL_SIZE=128M ENGINE=NDB;


I intentionally over-allocated the size of the table space (10 x 128M).
Moreover, you can have one log file group, but one log file group can contain many undo files, and you can have many table spaces.

Recommendations
My recommendations are:
  • Use the Configuration Tool to generate a good config.ini. Even though you don't want to use the scripts, you can still use the config.ini. The config.ini it generates is state of the art and a great boiler plate!
  • Use the Dimensioning Toolkit to scope out how much RAM, Disk etc you need.
  • Use the resulting DataMemory/IndexMemory etc you got from the Dimensioning Toolkit in the Configuration Tool to nail a configuration that suits you.
Good luck!