Monday, October 30, 2006

Using CHAR/VARCHAR data types in NDBAPI

Many people have problem how to use the string data types (char/varchar) from the NDBAPI. Here I explain how you can use them.

First of all, let's create a table we can use as an example:

CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` char(64) DEFAULT NULL,
`c` varchar(255) DEFAULT NULL,
`d` varchar(256) DEFAULT NULL,
PRIMARY KEY (`a`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;


CHAR

In the MySQL server CHARs are space padded to the length of the CHAR.

Example:
insert into t1(a,b) values (1, 'mysql'); -- 'mysql' is 5 bytes long.

The MySQL server will now space-pad the string up to the length (in this case 64 bytes, given the character set used).

However, the NDBAPI does not respect this and does not care if it is space-padded, null-padded, or even 5-padded. But, if you want to be able to query the data that you have inserted/updated from the NDBAPI from the MySQL server, then you have to space-pad the data, because the MySQL Server requires this. Otherwise, the data will not display properly. So, in the NDBAPI you are responsible to space-pad the data!

From the NDBAPI, the insert above would be represented as (no error handling):

int a=1;
char b[64];
char str = "mysql";
memset(b,32,sizeof(b)); //ascii 32 is 'space'
memcpy(b,str,strlen(str));

NdbTransaction * t = ndb>startTransaction();
NdbOperation * op = t>getNdbOperation("t1");

op>insertTuple();
op>equal("a", (const char*)&a);
op>setValue("b",b);
t>execute(NdbTransaction::Commit, NdbTransaction::AbortOnError, 1);


Upon retrieval using the NDBAPI, you will have to strip the trailing spaces yourself (stripping is not shown in the example):


char b[65]; // length of column C + 1 for null character
memset(b,0,sizeof(b));

NdbTransaction * t = ndb>startTransaction();
NdbOperation * op = t>getNdbOperation("t1");
op>committedRead();
op>equal("a", (const char*)&a);
op>getValue("b",b));
t>execute(NdbTransaction::Commit, NdbTransaction::AbortOnError, 1);


cout << "a= " << a << "b=" << b << endl;


Will printout
a=1 b=mysql<59 trailing spaces>


VARCHAR

The VARCHARs data type stores the length of the string in the first byte (please not that different character sets exists with different storage requirements), followed by the actual string. There is an exception to this. If the VARCHAR is longer than 256 characters, then the first two bytes store the length of the string that follows.

Example:
insert into t1(a,c) values (2, 'mysql cluster'); -- 'mysql cluster' is 13 bytes long.

Internally, the length (13 bytes) is stored in the first byte, and then the string follows. In order to insert this string using the NDBAPI, then we will have to put together a buffer that stores the lenght in the first byte, and then the rest of the string. From the NDBAPI, the insert above would be represented as (no error handling):

char * str = "mysql cluster";
char c[strlen(str) + 1]; // +1 to store the length of the string. memset(c,0,sizeof(c));
memcpy(&c[1],str,strlen(str));
c[0]=strlen(str);
NdbTransaction * t = ndb>startTransaction();
NdbOperation * op = t>getNdbOperation("t1");
op>insertTuple();
op>equal("a", (const char*)&a);
op>setValue("c",c);
t>execute(NdbTransaction::Commit, NdbTransaction::AbortOnError, 1);

And to retrieve:

int a=2;
char c[255 + 1]; // length of column c + 1 byte for length

memset(c,0,sizeof(c));

NdbTransaction * t = ndb>startTransaction();
NdbOperation * op = t>getNdbOperation("t1");
op>committedRead();
op>equal("a", (const char*)&a);
op>getValue("c",c);
t>execute(NdbTransaction::Commit, NdbTransaction::AbortOnError, 1);

int len=c[0];
char c_real[len+1]; //+1B for null-terminator
c_real[len]=0;
memcpy(c_real, &c[1], len);

cout << "a= " << a << "c_real=" << c_real << endl;

Will printout:
a= 2 c_real= mysql cluster



LONG VARCHAR

If a VARCHAR is longer than 256B, then it is said to be a LONG VARCHAR. In a LONG VARCHAR the length is stored in the first two bytes, then the string follows. Below is an example on how to prepare a LONG VARCHAR in the NDBAPI.

insert into t1(a,d) values (3, 'how LONGVARCHAR works in the ndbapi);

would be represented as:

int a=3;
char * str = "how LONGVARCHAR works in the ndbapi";
int len = strlen(str);

char d[len + 2]; // +2B to store the length of the string.
memset(d,0,sizeof(d));
memcpy(&d[2],str, len);

d[0]=(char) ((len & 0x00FF))>>0;
d[1]= (char)((len & 0xFF00)>>8);

NdbTransaction * t = ndb>startTransaction();
NdbOperation * op = t>getNdbOperation("t1");
op>insertTuple();
op>equal("a", (const char*)&a);
op>setValue("d",d);
t>execute(NdbTransaction::Commit, NdbTransaction::AbortOnError, 1);


And to read it:

int a=3;

char d[256 + 2]; // length of column d + 2 byte for length

memset(d,0,sizeof(d));

NdbTransaction * t = ndb>startTransaction();
NdbOperation * op = t>getNdbOperation("t1");
op>committedRead();
op>equal("a", (const char*)&a);
op>getValue("d",d);
t>execute(NdbTransaction::Commit, NdbTransaction::AbortOnError, 1);

/*get the length of the string*/

int len = ((int) ((unsigned char) d[0]) + ((int) ((unsigned char) d[1]) << 8));
char d_real[len+1]; //+1B for null-terminator
d_real[len]=0;
memcpy(d_real, &d[2], len);

cout << "a= " << a <<" d_real=" << d_real << endl;


Will printout:
a= 3 d_real= how LONGVARCHAR works in the ndbapi


BINARY/VARBINARY/LONGVARBINARY

The same principles apply, but read the reference manual about the padding of BINARY/VARBINARY.

4 comments:

Anonymous said...

Um... If MySQL pads data with spaces... how could I store a value of "mysal "?

Note that there is a single space at the end of the string.

How can the engine tell the difference between the space *I* added, and the space *it* added?

(Hint: it can't)

Yet... I get my data back perfectly.

Johan Andersson said...

VARCHAR has the characteristics you are describing.

http://dev.mysql.com/doc/refman/5.1/en/char.html

"The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed."

Anonymous said...

If I create a table like this

CREATE TABLE foo (
a VARCHAR(253) CHARSET utf8
) engine = ndbcluster;

the column type will be LONGVARCHAR and Column::getLength() will return 759. This is three bytes per character, though the actual NDB strings are packed one byte per character.

This inconsistency breaks your example code.

Very annoying.

Johan Andersson said...

It is pretty std to have 3bytes per character for UTF8.
Latin1 (default charset) requires 1B per character.

mysql>SHOW CHARACTER SET
will show you the char sets and storage reqs.

moreover, you are responsible for preparing the string to the correct type, i.e, covert "xyz" to "xyz" in UTF8.