Below all of this you can find the setup I used.
Initial start of Cluster (40GB of redo log files):
6.3.19: 3min 27sec
Create a 20GB undo file:
6.3.19: 6min 17sec
Create a 128MB data file for the tablespace:
6.3.19: ~3 sec
Insert 1M records a' 4096B (1 thread, batches of five):
6.3.19: 286 sec (3721.12 QPS)
(we can probably provision faster with bigger batches or more threads)
I then provisioned another 4M records (total of 5M records in DB).
Evil test: 100K Random reads (read 4096B) (5M records in DB):
6.3.19: 1290.42QPS (20 threads, io util is ~90% so we are almost completely io bound). So this is result is inline what we would expect when being io bound, especially since i have used four data nodes, each having one disk.
Setup:
- Total of 4 data nodes on 4 computers.
- Another computer was used to generate the inserts/reads
- 8GB of RAM
- Gig-E
- 1 * 146GB SAS, 10KRPM
- 128MB IndexMemory
- 1024MB DataMemory
- ODirect=1
- SharedGlobalMemory=384M
- NoOfFragmentLogFiles=40
- FragmentLogFileSize=256M
- DiskPageBufferMemory=3072M
- Table space (one ts) with 100 datafiles a' 128MB (best practice is to use many but small data files instead of one big. This will be changed in 6.4 so that you can use one big data file).
Point here is that you should have a few data files. One data file is bad, more than 128 is overkill since the data node won't keep more than that many data files open at once anyways. This affects how many data files the data node can write to in "parallel". - Extent size=1MB (which is quite ok)
- Logfile group: One 20GB undofile and 128MB Undobuffer
The undo file was a bit too big (not that it matter, i had the disk space) but I used 5366054928 extents out of 21474836480 (so ~25% was only used). - There is also a new configuration option in 6.3.19 which lets you create the data files
- The disk data table looks like (data column will be stored on disk):
create table dd (
id integer primary key,
ts integer,
data varbinary(4096),
index(ts)) engine=ndb TABLESPACE ts_1 storage disk; - Fedora core 9 ( uname -r --> 2.6.26.6-79.fc9.x86_64 )
Now, especially for inserts/writes there are quite a few things competing for the single disk:
- REDO log
- LCP
- and the disk data itself (UNDO LOG + DATA FILES)
- Disk 1: REDO + LCP
- Disk 2: UNDO LOG + DATA FILES
- Disk 1: REDO + LCP
- Disk 2: UNDO LOG
- Disk 3: DATA FILES
IMPORTANT: When you have done an --initial start, the files for the UNDO LOG and the DATA FILES are NOT removed. You have to remove them by hand. Otherwise you will get an error if you try to CREATE LOGFILE GROUP..
5 comments:
Hi!
To split I/O loads amongst several disks, I wonder the following configuration is good or not:
o 3 disks total.
o 1 disk for LCP/GCP
o 2 disks for data file and undo log file, which each disk has 1 data file and 1 undo file.
Should I use separate disks for data files and undo files and the following configuration is better than the above?
o 3 disks total.
o 1 disk for LCP/GCP.
o 1 disk for data files.
o 1 disk for undo files.
I guess the former can split I/O load better.
I have changed the test so that the english is more clear on this.
The last option you have is the best.
I see you are using 128M tablespace files. Is it recommended to not use files larger than this? We tried adding our third 2G tablespace, and one of the nodes crashed and restarted a few time, then both crashed and we had to revert to our backup.
What is the best practice for sizing the files? We want to make it about 80G total.
The trace file shows the same block ID's being read back and forth between DBTC and DBDIH for about 10,000 lines before a startphase 4 crash.
Hi trellph,
I have updated the post with
"Point here is that you should have a few data files. One data file is bad, more than 128 is overkill since the data node won't keep more than that many data files open at once anyways. This affects how many data files the data node can write to in "parallel".
What you describe sounds like a bug.
Can you please file a bug report and stick an excerpt of the trace lines in there. Would be great. Which version are you on?
Bug #40993 has been updated!
Post a Comment