Header: christian-rossow.de
Apply for a PostDoc position in my System Security research group at CISPA, Germany.


If you can't explain it simply, you don't understand it well enough.

HowTo - Create PostgreSQL C library functions

The first thing you should do is reading the extensive official manual of PostgreSQL C library functions. This page serves as additional example only that demonstrates how you can handle bytea values in C.

When starting to program your own library, you need a Makefile. The example below is a very flexible one that I got from a geek on #postgres. Before using, please adapt the module and source file names accordingly.

Example Makefile for C library function:
MODULES = randomness

PG_CONFIG = pg_config
PGXS = $(shell $(PG_CONFIG) --pgxs)
INCLUDEDIR = $(shell $(PG_CONFIG) --includedir-server)
include $(PGXS)

randomness.so: randomness.o
	cc -shared -o randomness.so randomness.o

randomness.o: randomness.c
	cc -o randomness.o -c randomness.c $(CFLAGS) -I$(INCLUDEDIR)

Now let's start with some the fun stuff. I'll give an example of a C function int32 x := bytea_size(bytea) that returns the size of a bytea field:

Example C library function: bytea_size(bytea)
  1 #include "postgres.h"
  2 #include "fmgr.h"
  3 #include "miscadmin.h"
  5 #ifdef PG_MODULE_MAGIC
  7 #endif
  9 Datum bytea_size(PG_FUNCTION_ARGS);
 11 PG_FUNCTION_INFO_V1(bytea_size);
 12 Datum
 13 bytea_size(PG_FUNCTION_ARGS) {
 14     bytea *data = PG_GETARG_BYTEA_P(0);
 15     unsigned char *ptr = (unsigned char *) VARDATA(data);
 16     int32 tcount = 0, i;
 18     // count characters
 19     for (i = VARSIZE(data) - VARHDRSZ; i; i--) {
 20         if (!i%1000) CHECK_FOR_INTERRUPTS();
 21	        unsigned char c = *ptr++; // get the char
 22         tcount++;
 23     }
 25     PG_RETURN_INT32(tcount);
 26 }

Obviously, the code uses 'Version 1 Calling Conventions', what you should also always do. I will spare you all details that you understand since reading the official docs. Here what's probably new to you (and some lessons I learned):
  • CHECK_FOR_INTERRUPTS() (see line 20) should be called regularly from inside your function. It allows the C function to check, whether it should be interrupted, e.g. because somebody cancelled an operation. However, the macro brings some overhead, thus for performance reasons don't call it too often.
  • PG_GETARG_BYTEA_P() (see line 14) allows you to retrieve a pointer to a bytea datum. There are some more elegant solutions to iterate through pieces of a bytea datum instead of retrieving it all at once, though.
  • If you are wondering where all these macros, type mappings, etc. are defined, have a look at the PostgreSQL server header files. To find out where these resist type pg_config --includedir-server on your Linux machine.
  • *NEVER* return something using the common return x; statement, this screws up a lot of things. Always use the appropriate macro to return from a PostgreSQL C library function.

If you are finished writing the C function, try compiling it until you don't get compiler-warnings anymore. Then you should copy the new shared object (.so) file to your pgsql library directory. If you don't know where that is, check it out using (on Linux):

[user@db ~]# pg_config --pkglibdir

We're about to finish. The last thing you need to do is creating the SQL function that bases on your created C function. In our example, this would work with:

CREATE FUNCTION bytea_size(bytea) RETURNS integer
  AS 'randomness', 'bytea_size'

That's all! What regards the C function part, we're done. If you create something sensible and want to share this to the community, then publish your project on pgFoundry. What drove me to C language functions was the missing support of estimating the randomness of data in PostgreSQL. I'll publish another article about the 'pg-randomness' project as soon as it's public in pgFoundry.