HowTo: PostgreSQL - Custom Aggregate Functions
Basics about aggregate functionsThese nice functions support you by aggregating multiple rows of a table. Assume you have a table called employees with the column salary and you are interested in the sum of all salaries. You could either sum it manually (just a joke) or use a standard aggregate function like SUM, that performs an operation for each dataset found and so sums up a total sum.
Snip 1 - Using an standardized aggregate function:
SELECT SUM(salary) AS total_sum FROM employees;
So what does this sum operation require? The big advantage of aggregate functions is now, that they can combine multiple rows, instead of maximum combine columns. In our example the SUM operation sums up all salaries, which are part of the queried rows. There is no other possibility to subsume rows than aggregate functions, unless you handle something on the part of an application.
The big advantage of PostgreSQL is that you can create custom aggregate functions, which satisfy the combination you imagined.
Quick startBefore reading this passage, please preposition the official documentation about CREATE AGGREGATE. Based on this I try to explain the main points and assist you at the most important steps.
Snip 2 - Synopsis of CREATE AGGREGATE:
CREATE AGGREGATE name ( input_data_type [ , ... ] ) ( SFUNC = sfunc, STYPE = state_data_type [ , FINALFUNC = ffunc ] [ , INITCOND = initial_condition ] [ , SORTOP = sort_operator ] )
Imagine the AGGREGATE is a process, which has a defined state (of type STYPE, more later), which changes during iteration through the selected datasets. What specifically will be changed is defined through the function SFUNC. The final function FINALFUNC will be applied after aggregating and it's return value will be returned as result of the aggregation - the default of FINALFUNC is returning the last known state.
To assist the official documentation I give you some hints for finding the correct values.
ExampleAs I came into contact with custom aggregate functions, I had a table with many datasets containing IP-addresses. My idea was, simply put, to build up a hash code about IP-addresses, which I selected before, which can easily done by XOR-ing all IP-addresses.
Snip 4 - Finished aggregator for 'hashing' IPv4-addresses:
01 CREATE AGGREGATE hash_ips ( ip4 ) ( 02 SFUNC = ip4_xor, 03 STYPE = ip4, 04 INITCOND = '0.0.0.0' 05 )
The aggregator is called hash_ips (line 01) and expects one paramter of type ip4 (line 01, look at pgFoundry for more information about ip4). It's initial state itself is also of type ip4 (line 03) and has the initial value of the IP-address 0.0.0.0 (line 04). For each dataset this get's XORed (by passing the function ip4_xor, line 02)) and will be returned after the iteration (because no FINALFUNC is stated).
Snip 5 - After creation you can use the aggregator like other aggregators.
SELECT hash_ips(column_with_ip) FROM mytable;For more examples please have a look at the official examples.
ConclusionThe main idea was to simply create one-way-functions (called 'hash'-function), which succeeded smoothly with PostgreSQL. This shows once more, that an object relational DBMS puts developer in a position to enhance the existing database with exotic functions. We should start thinking about alternate all the MySQL installations for complex software solutions and start cooperating with a 'real' database - PostgreSQL.