RSS

PostgreSQL mini cookbook: Performance Tuning, Debugging and Testing

PostgreSQL mini cookbook: Performance Tuning, Debugging and Testing, those tricks back to 2001 and still works today. 😉

Keeping index statistics up to date

Performance has been steadily deteriorating as you use your Postgres system.

Solution

Use the VACUUM ANALYZE command in psql or the vacuumdb command-line tool.

vacuumdb is garbage-collect and analyze a PostgreSQL database.

Discussion

When you delete records from Postgres it doesn’t reclaim the space, it just marks them deleted and gets on with other business. From psql you can issue a VACUUM command to reclaim space from a table.

Its second function is to update index statistics. Your indexes will always be accurate without this, but Postgres uses the statistics to know when and where to use an index rather than scanning the table.

If you want to see what is happening you can add the VERBOSE qualifier.

=# VACUUM ANALYZE inventory;
VACUUM

=# VACUUM VERBOSE ANALYZE inventory;
INFO:  vacuuming "public.inventory"
INFO:  index "inventory_pkey" now contains 4581 row versions in 15 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_store_id_film_id" now contains 4581 row versions in 15 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "inventory": found 0 removable, 4581 nonremovable row versions in 25 out of 25 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  analyzing "public.inventory"
INFO:  "inventory": scanned 25 of 25 pages, containing 4581 live rows and 0 dead rows; 4581 rows in sample, 4581 estimated total rows
VACUUM

Note - those of us who speak the Queen’s English can use ANALYSE instead in recent versions of Postgres.

[NOTE - blocks updates but not reads - check this]

The vacuumdb command works similarly, but can process a whole database in one go.

vacuumdb --full <dbname>  # Perform "full" vacuuming.

Analyzing a query-plan

A query is running much slower than you expected. You suspect an index is being ignored or you are missing an index on a particular column.

Solution

Use EXPLAIN <query>.

Discussion

The EXPLAIN command displays the query plan that Postgres uses. It shows the decisions it makes and the costs it calculated that led to those decisions.

=# EXPLAIN SELECT co_name, dy_type,dy_notes FROM companies JOIN diary ON co_id=dy_company;
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..25.81 rows=30 width=44)
-> Seq Scan on diary (cost=0.00..1.03 rows=3 width=28)
-> Index Scan using companies_co_id_key on companies (cost=0.00..8.14 rows=10 width=16)
EXPLAIN

The costs given are relative to each other - no absolute meaning. The rows are an estimate on the number of rows being returned and the width represents the amount of data involved per record. In the (trivial) example above, Postgres performs a sequential scan of the diary table and then uses the index on companies.co_id to perform the join.

Complex queries produce complex query plans - pay careful attention to the rows part. If Postgres thinks it will fetch 1000 rows from a 1500 row table then it will ignore the index and use a Seq Scan instead. If it estimates the rows wrong it will make the wrong decision.

Forcing use of an index

You have all the relevant indexes and you know it would be faster if Postgres used them but EXPLAIN is showing that Postgres is getting its cost calculations wrong.

Solution

Issue SET SEQSCAN TO OFF before the query to force all indexes to be used.

Discussion

Before forcing Postgres to use indexes, make sure you have run VACUUM ANALYZE recently and that the EXPLAIN shows unexpected cost calculations. It’s important to make sure Postgres is using accurate information before overriding its decisions.

If that still doesn’t help, you can tell Postgres to always use an index if one is available. If you have a complex query with a mix of sequence scans and index searches this may actually make things worse.

SET SEQSCAN TO OFF
SELECT ...
SET SEQSCAN TO ON

Check the runtime-configuration page for other settings you can change. You alter the various cost weightings at your own risk.

Setting logging

You can do this on the command-line to postmaster or set it in postgresql.conf in the data directory (examples below)

# This is a comment
log_connections = yes
syslog = 2

You can log to syslog, stdout etc. See DEBUG_LEVEL etc for more info. You will need to consult the manual page for your syslogger to find out how to configure that end of things.

Debugging a complex query/function

You are trying to build a complex query/plpgsql function and keep getting parse errors referring to an error near ' or similar.

Solution

Build your query/function in a text editor and execute using

$ psql -f (file) (database)

This gives you line numbers with the error message, and lets you run through the test/debug cycle quickly.

Tweaking for performance

You have your database design worked out but testing reveals performance problems - what hardware do you upgrade first?

Solution

No one answer for this - it depends on where the bottleneck is.

Discussion

Before you start spending money on hardware, make sure your database design is solid and you know where the bottlenecks are.

Some key points are (in no particular order):

  • Turn off flushing after each write (NOTE - see WAL docs once available)

  • Check shared-memory settings .

  • Check max no. of clients (and up shared-mem accordingly)

  • Amount of RAM / speed of disks (enough RAM for whole DB if poss :-)

  • Processor speed and number (only one process per query though, so only useful for multiple clients)

  • Use replication if you feel it suits your needs (see the contrib/ directory of the source distribution)

Again - make sure you understand what the problem is before leaping into upgrades.

Reference