PostgreSQL mini cookbook: Aggregate Query Tricks
Sum a field
You want to calculate the total number of payments per customer, staff.
Solution
Use SUM()
and GROUP BY
:
SELECT a, b, sum(c) FROM t GROUP BY a ,b;
Example:
# SELECT * FROM payment;
payment_id | customer_id | staff_id | rental_id | amount | payment_date
------------+-------------+----------+-----------+--------+----------------------------
17503 | 341 | 2 | 1520 | 7.99 | 2007-02-15 22:25:46.996577
17504 | 341 | 1 | 1778 | 1.99 | 2007-02-16 17:23:14.996577
17505 | 341 | 1 | 1849 | 7.99 | 2007-02-16 22:41:45.996577
17506 | 341 | 2 | 2829 | 2.99 | 2007-02-19 19:39:56.996577
17507 | 341 | 2 | 3130 | 7.99 | 2007-02-20 17:31:48.996577
17508 | 341 | 1 | 3382 | 5.99 | 2007-02-21 12:33:49.996577
# SELECT customer_id, staff_id, SUM(amount) FROM payment GROUP BY customer_id, staff_id;
customer_id | staff_id | sum
-------------+----------+--------
476 | 1 | 44.88
205 | 2 | 48.91
344 | 1 | 12.95
598 | 1 | 43.90
14 | 1 | 55.84
Discussion
You need to construct a SELECT
query with a GROUP BY
clause and apply a sum()
function.
So, to find the total number of orders we have per product in the system we can do:
SELECT p.pr_code, p.pr_desc, sum(o.ord_qty) AS quantity
FROM products p, orders o
WHERE p.pr_code=o.ord_product GROUP BY p.pr_code, p.pr_desc;
So why do we GROUP BY
both the product-code and description,
surely they match up?
Well, they do, but if you return a field from a query it
should either be mentioned in the GROUP BY or be subject to
an aggregate function (such as sum).
As an alternative we could have used:
SELECT p.pr_code, min(p.pr_desc), sum(o.ord_qty) AS quantity
FROM products p, orders o
WHERE p.pr_code=o.ord_product GROUP BY p.pr_code;
Which achieves the same since for any given pr_code we only have one pr_desc but probably isn’t as clear.
Finding the maximum/minimum value
You’re not interested in how many widgets your biggest customer bought but you do want to know who they are.
Solution
Use a subquery that finds the max()
of the quantity
you want to measure and compare against it.
Discussion
This is similar to totalling, but you use the min/max functions instead of sum. In addition, we need to compare individual records against this maximum value. So, to get the company name and order details of the largest order in the system:
SELECT c.co_name, p.pr_code, p.pr_desc, o.ord_qty
FROM companies c, products p, orders o
WHERE c.co_id=o.ord_company AND p.pr_code=o.ord_product
AND o.ord_qty = (SELECT max(ord_qty) FROM orders);
If there are two or more orders both have the same largest quantity, they will both be returned.
Counting doesn’t give me the total number of records in a table
You know there are 100 records in this table but count(field) returns less than that.
Solution
Use count(*)
to avoid skipping NULL values in a specific field.
Discussion
You try to count the number of diary entries in the system:
SELECT count(dy_notes) FROM diary;
You find that entries with NULL
dy_notes are being skipped.
This is the correct behavior:
count()
only counts non-null values.
The way of solving this is:
SELECT count(*) FROM diary;
I want to know how many types of X are in the system
You have a thousands of products in your range but suspect most don’t sell. You want a list of product-types that are in your orders list.
Solution
You can GROUP BY
the product-code and sum quantity if you want to know that,
or use count(distinct <field>)
.
Discussion
In practice, you will probably want to see how many of each product you’ve sold.
SELECT ord_product, sum(ord_qty) FROM orders GROUP BY ord_product;
If you don’t you can use
SELECT count(distinct ord_product) FROM orders;
This ignores null values of ord_product and throws away duplicates.
Defining your own aggregate function
You work for NASA and min()
, max()
, sum()
etc. aren’t enough for you.
Solution
Create your own aggregate function with CREATE AGGREGATE
and an accumulator function.
Discussion
If you’re planning on doing this without reading the manuals think again. Basically however you need to define:
- The return-type of the aggregate function (e.g.
int4
) - The type of thing it aggregates (e.g.
int4
again) - An initial value (probably
0
) - A state-transition function to accumulate values (takes the running total and the next value)
- Optionally a finalisation function to produce the final value (useful for averages or standard deviations)
You might well end up coding the state-transition and finalisation functions in C, but here is an trivial one-liner.
CREATE AGGREGATE catenate(sfunc1=textcat, basetype=text, stype1=text, initcond1='');
This sets the initial condition to '' (the empty string), accepts and returns text and uses the built-in textcat function to concatenate string values.
NOTE - order of concatenated text is not well-defined. You should only use commutable functions such as +, * etc (subnote - commutable might not be right I might mean ‘associative’. If you don’t know whether I’m right or wrong, make sure you test your aggregate function carefully).
References
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that.