Friday 11 December 2020

Possible ways to recover space from deleted rows with insufficient disk space remaining for full vacuum

 A recent scenario,

We had a single table with about 50M records that use around 80gb of space. Application recently deleted older records which was no longer needed reducing the Number of records to 30M.

However, we did not have sufficient space to recover that disk space by running a full vacuum, also could not take the database offline which full vacuum runs. And we're looking for an alternative for this that we can try? We had a two-node Postgres cluster using patroni.

 

Areas to look into :

  •   What is actually taking up space?

28m-30m rows isn't that many. We have more than that and about the same size on disk. Maybe it's an index not being used.🤔🤔

 

  • Do you need to recover the disk space for the OS? or just for Postgres?

Because if you just want the space for postgres to use then the regular vacuum is fine. If you need it at the OS level... well that's tougher.

 

  •  Is your space just table size or is it also index size?

You might get a win from dropping some/all indexes and recreating them, if you have the space you can create it concurrently first and then drop the old one with no loss in performance, otherwise maybe just pick a downtime where you have degraded performance

 

  •   Which version of Postgres?

Most recent versions of Postgres can recover sufficient space with a simple "vacuum verbose" to make it worthwhile.

Postgres 12 and up have 'reindex concurrently' which you can use a bloat query to find your bloated indexes after running 'vacuum verbose', then reindex them concurrently without taking your database offline. In my experience that gets you back about 90% of the space that 'vacuum full' would get back, depending upon the size, number, and complexity of your indexes. But of course, you must have sufficient space to build a new index.

 

I have heard of Repack but not used this  https://reorg.github.io/pg_repack/

Performing a full-table repack requires free disk space about twice as large as the target table(s) and its indexes. What it does it creates a dummy table and a rule to push new records into this table. Slowly copies over old records while new get applied to this dummy table. And does a file system swap. There are locks involved. There is no avoiding that. It uses a trigger and log table to capture changes, not a rule, but, yes, that's how it works and means that it, if there are active- writes happening while a repack is going on then it will require more space than a VACUUM FULL run

 

Another solution which we could’ve implemented is,

If older data (28 M rows) is not used actively and can take few hours of unavailability, pg_dump it to some bigger disk/network location.

·         Delete that data from the table (truncate table ... is enough and way faster) and try vacuum full.

·         Create new tables and add old data to those using some partition strategy.

·         Rename old table. Create a parent table with old table name.

·         Attach all the tables including old under the parent as partition.

 

 

I was able to reclaim the space when I found that indexes are huge in size and those are not frequently used by ongoing jobs and hence, by dropping some indexes, do the vacuum full and then recreate them.

Also, it's possible to REINDEX individual indexes that might already free up some space.

 

 

I haven't used it but perhaps give pgcompacttable  a try.

 [https://github.com/dataegret/pgcompacttable

It works by taking advantage of the fact that regular, non-FULL, VACUUM will actually free pages back to the OS if they are...

1. completely empty and 

2. at the end of the table, performing UPDATEs to shuffle rows at the end of the table to empty space in pages earlier in the table to clear out those last pages.


Found this post interesting? Subscribe us  :)😊😉


Read more 

Configure Streaming Replication in PostgreSQL

All about Physical Replication and Log shipping in Postgres 

Streaming-Replication Sync and Async, benefits of streaming replication over Log-shipping


No comments:

Post a Comment