A recent scenario,
We had a single table with about 50M records that use around 80gb of space. Application recently deleted older records which was no longer needed reducing the Number of records to 30M.
However, we did not have sufficient space to recover that disk space by running a full vacuum, also could not take the database offline which full vacuum runs. And we're looking for an alternative for this that we can try? We had a two-node Postgres cluster using patroni.Areas to look into :
- What is actually taking up space?
28m-30m rows isn't that many. We have more than that and about the same size on disk. Maybe it's an index not being used.🤔🤔
- Do you need to recover the disk space for the OS? or just for Postgres?
Because if you just want the space for postgres to use then the regular vacuum is fine. If you need it at the OS level... well that's tougher.
- Is your space just table size or is it also index size?
You might get a win from dropping some/all indexes and recreating them, if you have the space you can create it concurrently first and then drop the old one with no loss in performance, otherwise maybe just pick a downtime where you have degraded performance
- Which version of Postgres?
Most recent versions of Postgres can recover sufficient space with a simple "vacuum verbose" to make it worthwhile.
Postgres 12 and up have 'reindex
concurrently' which you can use a bloat query to find your bloated indexes
after running 'vacuum verbose', then reindex them concurrently without taking
your database offline. In my experience that gets you back about 90% of the
space that 'vacuum full' would get back, depending upon the size, number, and
complexity of your indexes. But of course, you must have sufficient space to
build a new index.
I have heard of Repack
but not used this https://reorg.github.io/pg_repack/
Performing a full-table repack requires free disk space
about twice as large as the target table(s) and its indexes. What it does it
creates a dummy table and a rule to push new records into this table. Slowly
copies over old records while new get applied to this dummy table. And does a
file system swap. There are locks involved. There is no avoiding that. It uses
a trigger and log table to capture changes, not a rule, but, yes, that's how it
works and means that it, if there are active- writes happening while a repack is
going on then it will require more space than a VACUUM FULL run
Another solution
which we could’ve implemented is,
If older data (28 M rows) is not used actively and can take
few hours of unavailability, pg_dump it to some bigger disk/network location.
· Delete that data from the table (truncate table ... is enough and way faster) and try vacuum full.
· Create new tables and add old data to those using some partition strategy.
· Rename old table. Create a parent table with old table name.
· Attach all the tables including old under the parent as partition.
I was able to reclaim the space when I found that indexes are huge in size and those are not frequently used by ongoing jobs and hence, by dropping some indexes, do the vacuum
full and then recreate them.
Also, it's possible to REINDEX individual indexes that might already
free up some space.
I haven't used it but perhaps give pgcompacttable a try.
[https://github.com/dataegret/pgcompacttable ]
It works by taking advantage of the fact that regular, non-FULL, VACUUM will actually free pages back to the OS if they are...
1. completely empty and
2. at the end of the table, performing UPDATEs to shuffle rows at the end of the table to empty space in pages earlier in the table to clear out those last pages.
Found this post interesting? Subscribe us :)😊😉
Read more
Configure Streaming Replication in PostgreSQL
All about Physical Replication and Log shipping in Postgres
Streaming-Replication Sync and Async, benefits of streaming replication over Log-shipping
No comments:
Post a Comment