LOAD DATA INFILE
.This can be useful if e.g:
- You want to extract data from a cluster table and load it into MyISAM/Innodb. This can be useful for reporting/data-mining purposes.
- You want to load data back into Cluster but into another table looking the same as the original table (not to spoil the original data) for some reason.
Here is an example on how you can do it generate a CSV file using ndb_restore.
I have the following setup:
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=3 @10.128.22.129 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 0, Master)
id=4 @10.128.22.130 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 0)
id=5 @10.128.22.129 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 1)
id=6 @10.128.22.130 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 1)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.128.22.127 (mysql-5.1.24 ndb-6.3.15)
id=2 @10.128.22.128 (mysql-5.1.24 ndb-6.3.15)
[mysqld(API)] 37 node(s)
id=7 @10.128.22.127 (mysql-5.1.24 ndb-6.3.15)
id=8 @10.128.22.128 (mysql-5.1.24 ndb-6.3.15)
id=9 (not connected, accepting connect from any host)
I created two tables, t1, and t2 and filled them with some data. I will not show this step here.
First you need to take a backup:
- ndb_mgm -e "start backup"
- sh start-backup /tmp
I have the following backup files in /tmp/
[root@ps-ndb01 BACKUP-2008Jun27-162551]# pwd
/tmp/BACKUP-2008Jun27-162551
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ls -al
total 136
drwxr-xr-x 2 root root 4096 Jun 27 16:43 .
drwxrwxrwx 5 root root 4096 Jun 27 16:25 ..
-rw-r--r-- 1 root root 13116 Jun 27 16:25 BACKUP-2-0.3.Data
-rw-r--r-- 1 root root 13620 Jun 27 16:25 BACKUP-2-0.4.Data
-rw-r--r-- 1 root root 14216 Jun 27 16:25 BACKUP-2-0.5.Data
-rw-r--r-- 1 root root 12312 Jun 27 16:25 BACKUP-2-0.6.Data
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.3.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.3.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.4.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.4.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.5.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.5.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.6.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.6.log
[root@ps-ndb01 BACKUP-2008Jun27-162551]#
As you see, there are one data file (.Data) for each data node where the backup came from.
Now I want to generate a CSV file containg only the data from table t1! Since the backup is from four data nodes I have to run ndb_restore four times, once for each backup file (3, 4, 5, 6).
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 3 --fields-enclosed-by="'" --fields-terminated-by=',' --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 4 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 5 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 6 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1
What has happened now is that a file, t1.txt, has been generated containing:
[root@ps-ndb01 BACKUP-2008Jun27-162551]# cat t1.txt
'9','hello'
'10','hello'
'15','hello'
'3','hello'
'6','hello'
'7','hello'
'16','hello'
'2','hello'
Voila! This file can now be loaded using LOAD DATA INFILE...
About the flags used in the ndb_restore program:
--verbose=0 - restore silently
--print_data - print the data, do not restore to the Cluster
-b - backup-id, in this case backup-id=2
-n - id of the data node where the backup came from (in this case we have data nodes 3,4,5 and 6.
--tab - directory to write the csv file (in this case . which means the cwd).
--append - only used for the last three runs of the ndb_restore. This means that the data will be appended to t1.txt
Good luck,
j