This blog presents a new toolkit to assist in Dimensioning of MySQL Cluster.
The toolkit consists of two parts:
- a spreadsheet to assist in calculating the amount of storage needed etc.
ndb_record_size is a program that calculates the amount of storage (IndexMemory, DataMemory, Disk space (for disk data table)) that is needed for a particular table in a database. The program takes into account the storage overhead needed when calculating the record size.
There are two versions of the spreadsheet, one for MySQL Cluster 6.2, and one for MySQL Cluster 6.3. The main difference is that the spreadsheet for MySQL Cluster 6.3 takes into account compression of LCP and Backups if you chose to enable that.
The spreadsheet uses the information given by ndb_record_size and you can determine how many data nodes you need, the amount of IndexMemory/DataMemory that is needed etc.
Please note that these tools are not Supported by MySQL Support. However, I have used them with a number of customers where they have proven very successful.
The ndb_record_size program needs to be compiled (let me know if problems exists on Supported platforms other than Linux) before we can use it:
- tar xvfz ndb_record_size.tgz
- cd ndb_record_size
- (edit the Makefile to change "INSTALL_DIR". If you are using the Configuration Tool, and are using the default INSTALL path there, you don't have to do anything)
To run it:
> export LD_LIBRARY_PATH=/usr/local/mysql/mysql/lib/mysql:$LD_LIBRARY_PATHThis will print out some help information how the program should be executed:
ndb_record_size <connect> <database> <table>
- <connect> - the host where the management server is running
- <database> - name of the database where the <table> is stored
- <table> - the name of the table to dimension
Before we can run the program to dimension the tables, we need to have the tables created in storage engine=ndb.
Migrating existing MyISAM/Innodb database
If you have an already existing database in MySQL (e.g in MYISAM, Innodb) that you would like to migrate to MySQL Cluster you can do:
change the engine=MyISAM or engine=INNODB to engine=ndb, e.g by:
> cat databasedump.sql | sed -e 's/MyISAM/NDB/' > databasedump-ndb.sql
> use the mysql client and do "source databasedump-ndb.sq"
Provision database with reference data
Best practice using ndb_record_size is to provision the database with reference data.
The reason for this is to be able to calculate storage correctly needed by VAR* attributes.
If you have no reference data, then ndb_record_size will default to maximum length of the VAR* attribute.
E.g, if you have a column "email varchar(255)" then the size of the record will be calculated with email having a column length of 256B. However, in real-life an email address is e.g mostly often not that long, but the average could be actually be e.g only 64B.
If you populate using reference data, then ndb_record_size will calculate the size of the record with the average size for each varchar/varbinary attribute over max 1000 records.
If you have 5 records for reference data, it will average over those 5 records. If one record exists, it will take that.
Using reference data you will get more exact information on how much storage you will need!
I have created four tables in mysql's "test" database:
CREATE TABLE `t1` (and a few variants of that one. I have not inserted any reference data (bad practice).
`a` int(11) NOT NULL,
`b` bigint(20) DEFAULT NULL,
`c` char(21) DEFAULT NULL,
`d` varchar(22) DEFAULT NULL,
`e` bit(1) DEFAULT NULL,
PRIMARY KEY (`a`),
KEY `b` (`b`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
I then run the ndb_record_size program in this table.
./ndb_record_size localhost test t1As you can see the ndb_record_size program recommends me to populate with average data/reference data in order to calculate a good size of the VAR* attribute. It also prints out how many ordered indexes, and unique indexes you have on this table.
Calculating storage cost per record for table t1:
--- WARNING! No reference data found for VAR*. Defaulting to max size..23 Bytes
Consider loading database with average data for exact measurement
--- Ordered Index found named (PRIMARY). Additional cost per record is = 16 bytes of DataMemory.
--- Ordered Index found named (b). Additional cost per record is = 16 bytes of DataMemory.
Record size (incl OH):
#Rows found=0 records (--A required)
Appending the following to test.csv
Finally, ndb_record_size appends the storage required to a csv file called
I run the program a couple of more times to size all the four tables I have created in the test database:
./ndb_record_size localhost test t2
./ndb_record_size localhost test t3
./ndb_record_size localhost test t4
I then look at the output of the test.csv file:
Using the dimensioning spreadsheet
The information in the test.csv file is pasted/imported into the sizing spreadsheet.
As you see in the picture below, I have also added how many records I will have in each table.
When you have added how many records you will have in each table then you can navigate to the tab "Data node configuration". What that tab looks like is shown below:
As you can see there are two fields, DataMemory and IndexMemory.
These fields (and NoOfFragmentLogFiles etc) are automatically derived from the data provided in the tab "TableObjects".
The values in "red" can be altered. Values in "blue" boxes should be transferred to the [ndbd default] section of the config.ini.
E.g, you can now change the number of data nodes (they must be even 2,4,6...) and the storage needed for each data node is then updated in the spreadsheet.
Thus, you can see how many data nodes you need to handle a particular database size.
When you have determined how many data nodes you need to have by checking that the "Total RAM per data node" and the "Total Disk space per data node" you can transfer these values into your config.ini file and restart the cluster.
If you have generated a configuration using the Configuration Tool then you can do like this:
1. cd mysqlcluster-XY
2. edit config/config.ini
3. update the config.ini and save and exit
4. If you have changed "NoOFragmentLogFiles" you must run (in order to recreate the redo log on disk with the new size)
4.1. sh rolling-restart-initial.sh
Or if you haven't:
4.2. sh rolling-restart.sh
The above operation is online, meaning you can have traffic while performing this step.
If you have changed the number of Data Nodes then it is not possible to update the configuration online (yet). You must generate a new configuration, e.g by using the Configuration Tool, with the correct number of data nodes! Don't forget to stop the old cluster before installing the new cluster!