A PostgreSQL table can grow quietly for weeks before anyone notices. The application team says they already deleted old data. Storage still looks high. Queries are touching more blocks than expected. Autovacuum is running, but the table does not seem to become smaller. This is where many DBAs first realize that DELETE in PostgreSQL is not the same as physically removing rows from the table file.
PostgreSQL uses MVCC, so old row versions remain inside the table until VACUUM can clean them. This is normal behavior, not a bug. The problem starts when dead tuples grow faster than VACUUM can remove them, or when long-running transactions prevent cleanup. Then you get table bloat, index bloat, stale statistics, poor plans, unnecessary I/O, and sometimes transaction ID wraparound pressure.
This article walks through PostgreSQL VACUUM from a production DBA angle.
We will create a realistic test table, generate churn, compare table size
before and after VACUUM, tune autovacuum for high-write tables, monitor
running VACUUM jobs, and discuss when VACUUM FULL,
REINDEX, partitioning, or
pg_repack makes more sense.
DELETE Does Not Mean Space Is Gone
In PostgreSQL, when a row is updated or deleted, the old version is not immediately removed from disk. PostgreSQL keeps it because another transaction may still need to see that older version for read consistency.
That is MVCC doing its job.
The cleanup happens later through VACUUM. VACUUM removes dead row versions when they are no longer visible to any active transaction. It also cleans dead index entries and updates visibility information.
A simple delete may look harmless:
DELETE FROM dba_lab.order_eventsWHERE created_at < now() - interval '90 days';
But physically, the table may still occupy almost the same space after this delete. Regular VACUUM usually marks the freed space as reusable inside PostgreSQL. It does not always return that space to the operating system.
This is one of the first PostgreSQL mindset shifts for Oracle DBAs. In Oracle, you think in terms of undo, segments, extents, high water mark, shrink, and space advisor. In PostgreSQL, you have to think about heap tuples, dead row versions, visibility, autovacuum, and bloat.
Practical DBA use case
After a purge job, do not only check the row count. Check dead tuples and relation size.
SELECT schemaname, relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'order_events';
Remember that n_live_tup and
n_dead_tup are estimates. They are still useful for
trend monitoring, but for exact inspection you can use extensions like
pgstattuple where permitted.
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('dba_lab.order_events');
Production caveat
VACUUM may find dead tuples but still not remove them if an old transaction is holding the visibility horizon.
Check old transactions:
SELECT pid, usename, application_name, client_addr, state, xact_start, now() - xact_start AS xact_age, backend_xmin, left(query, 120) AS query_text FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY xact_start;
An idle-in-transaction session from an application pool can hold back
cleanup much longer than expected.
Build a Realistic Test Table
A clean lab helps explain the behavior without touching production. The goal is not to create a perfect benchmark. The goal is to reproduce a common DBA situation: insert data, update some rows, delete some rows, then observe what VACUUM actually changes.
Create a schema:
CREATE SCHEMA IF NOT EXISTS dba_lab;
Create a table that looks closer to a real application event table:
CREATE TABLE dba_lab.order_events ( event_id BIGSERIAL PRIMARY KEY, tenant_id INTEGER NOT NULL, event_type TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'NEW', payload TEXT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), processed_at TIMESTAMPTZ ) WITH (fillfactor = 90);
Create a few practical indexes:
CREATE INDEX idx_order_events_tenant_created ON dba_lab.order_events (tenant_id, created_at); CREATE INDEX idx_order_events_status ON dba_lab.order_events (status); CREATE INDEX idx_order_events_event_type ON dba_lab.order_events (event_type);
Load around 1.2 million rows:
INSERT INTO dba_lab.order_events ( tenant_id, event_type, status, payload, created_at ) SELECT (random() * 2000)::int + 1, (ARRAY['LOGIN', 'PAYMENT', 'API_CALL', 'BATCH_JOB'])[1 + floor(random() * 4)::int], 'NEW', repeat(md5(g::text), 24), now() - (g || ' seconds')::interval FROM generate_series(1, 1200000) AS g;
Run analyze:
ANALYZE dba_lab.order_events;
Check size:
SELECT pg_size_pretty(pg_relation_size('dba_lab.order_events')) AS table_size, pg_size_pretty(pg_indexes_size('dba_lab.order_events')) AS index_size, pg_size_pretty(pg_total_relation_size('dba_lab.order_events')) AS total_size;
Sample realistic output:table_size | index_size | total_size ------------+------------+------------ 1184 MB | 156 MB | 1340 MB
The exact number will differ depending on PostgreSQL version, block size, payload length, index structure, fillfactor, and storage.
Why this test is better
Instead of deleting only the highest IDs, we will create scattered churn. This is closer to real production, where updates and deletes happen across many pages, not always neatly at the end of the table.
Create Churn: Updates and Scattered Deletes
First, update a portion of the table:
UPDATE dba_lab.order_events SET status = 'PROCESSED', processed_at = now()
where event_id % 4 = 0 ;
This updates around 300,000 rows. In PostgreSQL, an update creates a new row version and leaves the old version behind until VACUUM can clean it.
Now delete scattered rows:
DELETE FROM dba_lab.order_events WHERE event_id % 7 = 0;
This removes around 171,000 rows logically.
Now,. Run analyze:
ANALYZE VERBOSE dba_lab.order_events;
Check table statistics:
SELECT relname, n_live_tup, n_dead_tup, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'order_events';
Sample output:
relname | n_live_tup | n_dead_tup | last_analyze --------------+------------+------------+------------------------------- order_events | 1028450 | 441720 | 2026-05-30 14:20:44.121+08
Do not expect the numbers to match exactly. PostgreSQL statistics are estimates unless you use deeper inspection tools.
SELECT pg_size_pretty(pg_relation_size('dba_lab.order_events')) AS table_size, pg_size_pretty(pg_indexes_size('dba_lab.order_events')) AS index_size, pg_size_pretty(pg_total_relation_size('dba_lab.order_events')) AS total_size;
Check size again:
table_size | index_size | total_size ------------+------------+------------ 1428 MB | 226 MB | 1654 MB
Why did size increase after updates and deletes?
Because updates created new row versions. Deletes marked rows as dead. Indexes also had to track old and new entries. This is exactly how table and index bloat starts.
Run Regular VACUUM and Read the Result Correctly
SELECT relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, vacuum_count, autovacuum_count FROM pg_stat_user_tables WHERE relname = 'order_events';
Now run:
VACUUM VERBOSE ANALYZE dba_lab.order_events;
Check table statistics again:
relname | n_live_tup | n_dead_tup | last_vacuum | vacuum_count --------------+------------+------------+------------------------------+-------------- order_events | 1029000 | 0 | 2026-05-30 14:27:18.921+08 | 1
Now check size:
SELECT pg_size_pretty(pg_relation_size('dba_lab.order_events')) AS table_size, pg_size_pretty(pg_indexes_size('dba_lab.order_events')) AS index_size, pg_size_pretty(pg_total_relation_size('dba_lab.order_events')) AS total_size;
Sample output after regular VACUUM:table_size | index_size | total_size ------------+------------+------------ 1416 MB | 219 MB | 1635 MB
This is the realistic part.
Dead tuples are gone, but the table size did not drop dramatically. That does not mean VACUUM failed. It means the freed space is mostly inside the table and can be reused by future inserts or updates.
If your workload continues inserting into the same table, this is fine. PostgreSQL can reuse that internal free space. If your business expectation is to return disk space to the OS immediately, regular VACUUM is not the tool for that.
What to check in VACUUM VERBOSE output
Look for lines related to:
tuples: removed, remain, dead but not yet removableindex scanspages removedpages remaindead item identifiers removedWAL usagebuffer usage
Pay special attention to: dead but not yet removable
If this is high, VACUUM found dead tuples but could not remove them. That usually points to old snapshots, long-running transactions, replication slot xmin, prepared transactions, or standby feedback behavior.
Monitor VACUUM While It Runs
For large tables, do not run VACUUM and wait blindly. Use
pg_stat_progress_vacuum.
SELECT p.pid, a.usename, a.application_name, p.datname, p.relid::regclass AS relation_name, p.phase, p.heap_blks_total, p.heap_blks_scanned, p.heap_blks_vacuumed, round( 100.0 * p.heap_blks_scanned / NULLIF(p.heap_blks_total, 0), 2 ) AS scanned_pct, p.index_vacuum_count, p.max_dead_tuples, p.num_dead_tuples FROM pg_stat_progress_vacuum p JOIN pg_stat_activity a ON a.pid = p.pid ORDER BY p.pid;
Common phases include:
initializingscanning heapvacuuming indexesvacuuming heapcleaning up indexestruncating heapperforming final cleanup
This view is for normal VACUUM activity. Do not depend on it for
VACUUM FULL.
Check wait events for the VACUUM session:
SELECT pid, wait_event_type, wait_event, state, query FROM pg_stat_activity WHERE query ILIKE 'vacuum%';
Check blockers:SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query, blocking.state AS blocking_state FROM pg_stat_activity blocked JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid <> blocked_locks.pid JOIN pg_stat_activity blocking ON blocking.pid = blocking_locks.pid WHERE NOT blocked_locks.granted;
Production caveat
VACUUM is designed to work with normal database activity, but it is not free. It consumes I/O, CPU, memory, and can generate WAL. If a huge cleanup runs during peak traffic, users may notice.
For PostgreSQL 14 and above, check WAL activity:
SELECT wal_records, wal_fpi, pg_size_pretty(wal_bytes) AS wal_bytes, stats_reset FROM pg_stat_wal;
Autovacuum Tuning for High-Churn Tables
Autovacuum is not the enemy. It is usually the reason PostgreSQL keeps running without manual maintenance every day.
But default autovacuum settings are generic. A large event table with frequent updates and deletes should not use the same cleanup behavior as a small lookup table.
The basic vacuum trigger is based on:
autovacuum_vacuum_threshold+autovacuum_vacuum_scale_factor * table_size
By default, the scale factor can be too relaxed for large tables. For a table with 20 million rows, waiting for 20 percent churn may mean millions of dead tuples before autovacuum becomes aggressive enough.
For a hot table, tune at table level:
ALTER TABLE dba_lab.order_events SET ( autovacuum_vacuum_threshold = 10000, autovacuum_vacuum_scale_factor = 0.02, autovacuum_analyze_threshold = 10000, autovacuum_analyze_scale_factor = 0.01, autovacuum_vacuum_cost_limit = 2000, autovacuum_vacuum_cost_delay = 2 );
Check table-level options:
SELECT n.nspname AS schema_name, c.relname AS table_name, c.reloptions FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'dba_lab' AND c.relname = 'order_events';
How to identify tables needing tuning
SELECT schemaname, relname, n_live_tup, n_dead_tup, round( 100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2 ) AS dead_tuple_pct, last_autovacuum, autovacuum_count FROM pg_stat_user_tables WHERE n_live_tup + n_dead_tup > 0 ORDER BY dead_tuple_pct DESC LIMIT 20;
Production caveat
Do not apply aggressive autovacuum settings everywhere. More frequent vacuuming means more background I/O. Tune the top few high-churn tables first, monitor the effect, then adjust.
For insert-heavy tables in newer PostgreSQL versions, also review insert-triggered autovacuum parameters such as:
SHOW autovacuum_vacuum_insert_threshold;SHOW autovacuum_vacuum_insert_scale_factor;
These are useful for append-only tables where freezing and visibility map maintenance still matter.
Index Bloat Needs Separate Attention
VACUUM cleans dead index entries, but it does not always compact indexes the way DBAs expect. A heavily updated table may have indexes that remain much larger than necessary.
Check index size and usage:
SELECT schemaname, relname AS table_name, indexrelname AS index_name, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE relname = 'order_events' ORDER BY pg_relation_size(indexrelid) DESC;
If the index is bloated and still required, rebuild it carefully.
REINDEX INDEX CONCURRENTLY dba_lab.idx_order_events_status;
For all indexes on a table:
REINDEX TABLE CONCURRENTLY dba_lab.order_events;
Production caveat
REINDEX CONCURRENTLY is safer for availability than
normal REINDEX, but it still needs extra disk space
and time. Always check free space before starting.
SELECT pg_size_pretty(pg_relation_size('dba_lab.idx_order_events_status')) AS index_size;
At and at OS level: df -h
Also remember that unused indexes create unnecessary VACUUM work. Every update and delete has to maintain more index structures.
Find low-usage indexes:
SELECT schemaname, relname AS table_name, indexrelname AS index_name, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC;
Do not drop indexes only based on one snapshot. Validate over a business
cycle.
Oracle vs PostgreSQL: Same Pain, Different Internals
Oracle and PostgreSQL both provide read consistency, but they do it differently.
In Oracle, old versions are managed through undo. After large deletes, DBAs often think about undo retention, segment size, high water mark, shrink space, and reclaiming extents.
Example Oracle checks:
SELECT owner, segment_name, segment_type, bytes / 1024 / 1024 AS size_mb FROM dba_segments WHERE owner = 'APPUSER' ORDER BY bytes DESC;
Shrink example:
ALTER TABLE appuser.order_events ENABLE ROW MOVEMENT;ALTER TABLE appuser.order_events SHRINK SPACE;
In PostgreSQL, the old row versions remain in the heap until VACUUM removes them. The equivalent DBA thought process is different:
VACUUM VERBOSE ANALYZE dba_lab.order_events;
Then validate:
SELECT pg_size_pretty(pg_relation_size('dba_lab.order_events')) AS table_size, pg_size_pretty(pg_indexes_size('dba_lab.order_events')) AS index_size, pg_size_pretty(pg_total_relation_size('dba_lab.order_events')) AS total_size;
DBA mindset shift
Do not manage PostgreSQL like Oracle.
In PostgreSQL:
-
Regular VACUUM is normal maintenance
-
Autovacuum tuning is workload-specific
-
Long transactions can block cleanup
-
Index bloat may need REINDEX
-
Physical shrink usually needs rewrite-style operations
-
Partitioning is often the best purge strategy
Quick Takeaways
-
DELETE does not immediately remove physical row versions in PostgreSQL.
-
Regular VACUUM removes dead tuples and makes space reusable, but may not shrink the table file.
-
VACUUM FULLcan reclaim physical space, but it rewrites the table and needs a strong lock. -
Long-running transactions can prevent VACUUM from removing dead tuples.
-
Autovacuum defaults are not always suitable for large high-churn tables.
-
Index bloat needs separate monitoring and sometimes
REINDEX CONCURRENTLY. -
Partitioning is often better than massive recurring deletes.
Lessons From the Field: DBA Takeaways
Most VACUUM-related incidents do not start with someone saying, “VACUUM is failing.” They start with symptoms.
- Storage alerts.
- Slow reports.
- High I/O.
- Replication lag.
Autovacuum workers running for hours.
A purge job that deleted millions of rows but did not reduce filesystem usage.
The first DBA mistake is assuming that VACUUM failed because table size did not shrink. In many case s, VACUUM did exactly what it was supposed to do. It removed dead tuples and made space reusable. It just did not return that space to the OS.
The second mistake is tuning autovacuum globally without knowing which tables are causing the problem. A few high-churn tables usually create most of the pain. Tune those tables directly.
The third mistake is ignoring long transactions. One idle transaction can hold the xmin horizon and make autovacuum look ineffective.
The fourth mistake is forgetting indexes. A table may look acceptable, but indexes may have grown badly due to repeated updates and deletes.
The fifth mistake is running VACUUM FULL too
casually. It gives visible space back, but it can block application workload
and create replication lag.
A practical weekly DBA check should include:
SELECT schemaname, relname, n_live_tup, n_dead_tup, round( 100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2 ) AS dead_tuple_pct, last_autovacuum, autovacuum_count FROM pg_stat_user_tables WHERE n_live_tup + n_dead_tup > 0 ORDER BY n_dead_tup DESC LIMIT 20;
Also check old transactions:SELECT pid, usename, state, now() - xact_start AS xact_age, backend_xmin, left(query, 100) AS query_text FROM pg_stat_activity WHERE xact_start IS NOT NULL ORDER BY xact_start;
Preventing bloat is much easier than cleaning it under pressure.
Conclusion
PostgreSQL VACUUM is not just a maintenance command. It is part of how PostgreSQL survives real OLTP workload.
Every update and delete can leave old row versions behind. That is normal MVCC behavior. The production risk begins when cleanup cannot keep up, when long transactions hold old snapshots, when indexes grow silently, or when teams expect DELETE to behave like an immediate file shrink operation.
The practical approach is simple. Monitor dead tuples. Track autovacuum
activity. Watch long-running transactions. Review relation and index size
trends. Tune autovacuum at table level for high-churn objects. Use regular
VACUUM for routine cleanup. Use
REINDEX CONCURRENTLY when indexes are the real
problem. Use VACUUM FULL only when physical space
must be returned and a maintenance window is available.
For large purge workloads, design matters more than cleanup. If you delete millions of old rows every month, partitioning may be a better solution than repeatedly creating dead tuples and then trying to clean them later.
Oracle DBAs moving into PostgreSQL should not map every concept directly. PostgreSQL bloat management has its own rules. Once you understand those rules, VACUUM becomes less mysterious and more predictable.
Review your busiest tables. Check which ones generate the most dead tuples. Confirm whether autovacuum is keeping up. Test table-level tuning before production rollout. Bloat is easier to prevent than to repair during an incident.
FAQs
1. Why did my PostgreSQL table not shrink after VACUUM?
Regular VACUUM removes dead tuples and marks space reusable inside the
table. It does not always return disk space to the operating system. For
physical shrink, you may need VACUUM FULL,
CLUSTER, pg_repack, or
partition-level cleanup.
2. Is VACUUM FULL safe for production?
It can be used in production only with planning.
VACUUM FULL rewrites the table and requires a strong
lock, so it can block application sessions. It also generates WAL and may
create replication lag.
3. How do I know if autovacuum is not keeping up?
Check n_dead_tup,
last_autovacuum, and
autovacuum_count in
pg_stat_user_tables. If dead tuples keep increasing
and autovacuum is infrequent or long-running, the table may need
tuning.
4. Can long-running queries stop VACUUM cleanup?
Yes. Long-running transactions, idle-in-transaction sessions, old snapshots, prepared transactions, and some replication-related xmin conditions can prevent VACUUM from removing dead tuples.
5. Should I tune autovacuum globally or per table?
Start per table. Most environments have a few high-churn tables causing most bloat. Table-level settings are safer than aggressively changing autovacuum behaviour for the entire database.
Have you seen a PostgreSQL purge job finish successfully but storage usage remain almost unchanged?
Was the real issue dead tuples, long transactions, index bloat, replication lag, or simply wrong expectations from DELETE?
Share your production experience. VACUUM problems are usually learned during real incidents, not from syntax alone.


No comments:
Post a Comment