RSS

PostgreSQL mini cookbook: Aggregate Query Tricks

PostgreSQL mini cookbook: PostgreSQL aggregate query tricks, those tricks back to 2001 and still works today. 😉

Sum a field

You want to calculate the total number of payments per customer, staff.

Solution

Use SUM() and GROUP BY:

SELECT a, b, sum(c) FROM t GROUP BY a ,b;

Example:

# SELECT * FROM payment;
 payment_id | customer_id | staff_id | rental_id | amount |        payment_date
------------+-------------+----------+-----------+--------+----------------------------
      17503 |         341 |        2 |      1520 |   7.99 | 2007-02-15 22:25:46.996577
      17504 |         341 |        1 |      1778 |   1.99 | 2007-02-16 17:23:14.996577
      17505 |         341 |        1 |      1849 |   7.99 | 2007-02-16 22:41:45.996577
      17506 |         341 |        2 |      2829 |   2.99 | 2007-02-19 19:39:56.996577
      17507 |         341 |        2 |      3130 |   7.99 | 2007-02-20 17:31:48.996577
      17508 |         341 |        1 |      3382 |   5.99 | 2007-02-21 12:33:49.996577

# SELECT customer_id, staff_id, SUM(amount) FROM payment GROUP BY customer_id, staff_id;
 customer_id | staff_id |  sum
-------------+----------+--------
         476 |        1 |  44.88
         205 |        2 |  48.91
         344 |        1 |  12.95
         598 |        1 |  43.90
          14 |        1 |  55.84

Discussion

You need to construct a SELECT query with a GROUP BY clause and apply a sum() function. So, to find the total number of orders we have per product in the system we can do:

SELECT p.pr_code, p.pr_desc, sum(o.ord_qty) AS quantity
  FROM products p, orders o
  WHERE p.pr_code=o.ord_product GROUP BY p.pr_code, p.pr_desc;

So why do we GROUP BY both the product-code and description, surely they match up? Well, they do, but if you return a field from a query it should either be mentioned in the GROUP BY or be subject to an aggregate function (such as sum).

As an alternative we could have used:

SELECT p.pr_code, min(p.pr_desc), sum(o.ord_qty) AS quantity
  FROM products p, orders o
  WHERE p.pr_code=o.ord_product GROUP BY p.pr_code;

Which achieves the same since for any given pr_code we only have one pr_desc but probably isn’t as clear.

Finding the maximum/minimum value

You’re not interested in how many widgets your biggest customer bought but you do want to know who they are.

Solution

Use a subquery that finds the max() of the quantity you want to measure and compare against it.

Discussion

This is similar to totalling, but you use the min/max functions instead of sum. In addition, we need to compare individual records against this maximum value. So, to get the company name and order details of the largest order in the system:

SELECT c.co_name, p.pr_code, p.pr_desc, o.ord_qty
  FROM companies c, products p, orders o
  WHERE c.co_id=o.ord_company AND p.pr_code=o.ord_product
    AND o.ord_qty = (SELECT max(ord_qty) FROM orders);

If there are two or more orders both have the same largest quantity, they will both be returned.

Counting doesn’t give me the total number of records in a table

You know there are 100 records in this table but count(field) returns less than that.

Solution

Use count(*) to avoid skipping NULL values in a specific field.

Discussion

You try to count the number of diary entries in the system:

SELECT count(dy_notes) FROM diary;

You find that entries with NULL dy_notes are being skipped. This is the correct behavior:

count() only counts non-null values.

The way of solving this is:

SELECT count(*) FROM diary;

I want to know how many types of X are in the system

You have a thousands of products in your range but suspect most don’t sell. You want a list of product-types that are in your orders list.

Solution

You can GROUP BY the product-code and sum quantity if you want to know that, or use count(distinct <field>).

Discussion

In practice, you will probably want to see how many of each product you’ve sold.

SELECT ord_product, sum(ord_qty) FROM orders GROUP BY ord_product;

If you don’t you can use

SELECT count(distinct ord_product) FROM orders;

This ignores null values of ord_product and throws away duplicates.

Defining your own aggregate function

You work for NASA and min(), max(), sum() etc. aren’t enough for you.

Solution

Create your own aggregate function with CREATE AGGREGATE and an accumulator function.

Discussion

If you’re planning on doing this without reading the manuals think again. Basically however you need to define:

  • The return-type of the aggregate function (e.g. int4)
  • The type of thing it aggregates (e.g. int4 again)
  • An initial value (probably 0)
  • A state-transition function to accumulate values (takes the running total and the next value)
  • Optionally a finalisation function to produce the final value (useful for averages or standard deviations)

You might well end up coding the state-transition and finalisation functions in C, but here is an trivial one-liner.

CREATE AGGREGATE catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1='');

This sets the initial condition to '' (the empty string), accepts and returns text and uses the built-in textcat function to concatenate string values.

NOTE - order of concatenated text is not well-defined. You should only use commutable functions such as +, * etc (subnote - commutable might not be right I might mean ‘associative’. If you don’t know whether I’m right or wrong, make sure you test your aggregate function carefully).

References