PostgreSQL mini cookbook: Performance Tuning, Debugging and Testing
Keeping index statistics up to date
Performance has been steadily deteriorating as you use your Postgres system.
VACUUM ANALYZE command in psql or the vacuumdb command-line tool.
vacuumdb is garbage-collect and analyze a PostgreSQL database.
When you delete records from Postgres it doesn't reclaim the space, it just marks them deleted and gets on with other business. From psql you can issue a VACUUM command to reclaim space from a table.
Its second function is to update index statistics. Your indexes will always be accurate without this, but Postgres uses the statistics to know when and where to use an index rather than scanning the table.
If you want to see what is happening you can add the
=# VACUUM ANALYZE inventory; VACUUM =# VACUUM VERBOSE ANALYZE inventory; INFO: vacuuming "public.inventory" INFO: index "inventory_pkey" now contains 4581 row versions in 15 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "idx_store_id_film_id" now contains 4581 row versions in 15 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "inventory": found 0 removable, 4581 nonremovable row versions in 25 out of 25 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: analyzing "public.inventory" INFO: "inventory": scanned 25 of 25 pages, containing 4581 live rows and 0 dead rows; 4581 rows in sample, 4581 estimated total rows VACUUM
Note - those of us who speak the Queen's English can use
ANALYSE instead in recent versions of Postgres.
[NOTE - blocks updates but not reads - check this]
The vacuumdb command works similarly, but can process a whole database in one go.
vacuumdb --full <dbname> # Perform "full" vacuuming.
Analyzing a query-plan
A query is running much slower than you expected. You suspect an index is being ignored or you are missing an index on a particular column.
EXPLAIN command displays the query plan that Postgres uses.
It shows the decisions it makes and the costs it calculated that led to those decisions.
=# EXPLAIN SELECT co_name, dy_type,dy_notes FROM companies JOIN diary ON co_id=dy_company; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..25.81 rows=30 width=44) -> Seq Scan on diary (cost=0.00..1.03 rows=3 width=28) -> Index Scan using companies_co_id_key on companies (cost=0.00..8.14 rows=10 width=16) EXPLAIN
The costs given are relative to each other - no absolute meaning. The rows are an estimate on the number of rows being returned and the width represents the amount of data involved per record. In the (trivial) example above, Postgres performs a sequential scan of the diary table and then uses the index on companies.co_id to perform the join.
Complex queries produce complex query plans - pay careful attention to the rows part. If Postgres thinks it will fetch 1000 rows from a 1500 row table then it will ignore the index and use a Seq Scan instead. If it estimates the rows wrong it will make the wrong decision.
Forcing use of an index
You have all the relevant indexes and you know it would be faster if Postgres used them but EXPLAIN is showing that Postgres is getting its cost calculations wrong.
SET SEQSCAN TO OFF before the query to force all indexes to be used.
Before forcing Postgres to use indexes, make sure you have run VACUUM ANALYZE recently and that the EXPLAIN shows unexpected cost calculations. It's important to make sure Postgres is using accurate information before overriding its decisions.
If that still doesn't help, you can tell Postgres to always use an index if one is available. If you have a complex query with a mix of sequence scans and index searches this may actually make things worse.
SET SEQSCAN TO OFF SELECT ... SET SEQSCAN TO ON
Check the runtime-configuration page for other settings you can change. You alter the various cost weightings at your own risk.
You can do this on the command-line to postmaster or
set it in
postgresql.conf in the data directory (examples below)
# This is a comment log_connections = yes syslog = 2
You can log to syslog, stdout etc. See DEBUG_LEVEL etc for more info. You will need to consult the manual page for your syslogger to find out how to configure that end of things.
Debugging a complex query/function
You are trying to build a complex query/plpgsql function and keep getting parse errors referring to an error near
' or similar.
Build your query/function in a text editor and execute using
$ psql -f (file) (database)
This gives you line numbers with the error message, and lets you run through the test/debug cycle quickly.
Tweaking for performance
You have your database design worked out but testing reveals performance problems - what hardware do you upgrade first?
No one answer for this - it depends on where the bottleneck is.
Before you start spending money on hardware, make sure your database design is solid and you know where the bottlenecks are.
Some key points are (in no particular order):
Turn off flushing after each write (NOTE - see WAL docs once available)
Check shared-memory settings.
Check max no. of clients (and up shared-mem accordingly)
Amount of RAM / speed of disks (enough RAM for whole DB if poss :-)
Processor speed and number (only one process per query though, so only useful for multiple clients)
Use replication if you feel it suits your needs (see the contrib/ directory of the source distribution)
Again - make sure you understand what the problem is before leaping into upgrades.
- PostgreSQL SQL Explain
- PostgreSQL SQL Set
- PostgreSQL Server Config
- PostgreSQL psql
- PostgreSQL Managing Kernel Resources
Was this page helpful?
Glad to hear it!
Sorry to hear that.