In the recent time in one of our environment we observed that the auto vacuum process has been running for long time during the production hours which has impacted heavily. In this blog post we are going to discuss about the auto vacuum process, its types, how it works, how to reduce the time taken by the auto vacuum process and what are the parameters associated with it. Ok let’s get started.
Postgres uses MVCC - Multi-Version Concurrency Control to handle the consistency. When any insert or update operations are carried out, the row is not physically deleted. So after a series of multiple insert/update/delete operations there are lot of dead spaces available causing bloat. Hence the vacuum process is done as the cleanup for removing the bloat and reclaiming the space.
How to find bloat in a tupule?
Bloat refers to nothing but the dead spaces that are being accumulated on the database. Thus after a delete/update the new set of rows created are referred to as the "live" rows, and the older set are referred to as "dead" rows. We can use the below query to check the live and dead rows status.
Types of Vacuum:
Vacuum process generally involves two types which are explained below:
Standard VACUUM
This process runs in parallel along with the application hits to reclaim the space. It does not affects the performance since it does not hold any lock on the tables. While running auto vacuum even if indexing is carried on, then the auto vacuum process gets cancelled by itself. This is the most recommended option to perform space reclamation.
Full VACUUM
On the other hand Full Vacuum process holds an exclusive lock on the table that has been currently working on. So it creates a huge impact on the performance of the database. This process tends to clear more dead spaces and it happens slowly.
Autovacuum process:
It is a daemon process that runs automatically when the postgres service has been started. It is just a launcher process that runs every time and it performs clearance only when it achieves the threshold to clear the old dead space. We check the process by the below command:
It automates the execution of VACUUM and ANALYZE commands. The vacuum command is used to clear the dead spaces and analyze command is used to gather the statistical information about the contents of tables.
For example consider the below scenario in which we delete a row and reclaim the space for a table.
Checking the table count:
Deleting a record from the table:
Checking the table bloat now:
Applying Analyze command:
Applying vacuum command:
Checking the table bloat again and now we could see the dead tuples are cleared now.
How auto vacuum runs?
The auto vacuum(VACUUM+ANALYZE) process begins to reclaim the dead space when it reaches the threshold based on the below formula.
Analyze/Vacuum scale factor- Fraction of table of records
Analyze/Vacuum threshold- Number of records to trigger the auto vacuum process.
Increasing the frequency of auto vacuum process by increasing the threshold of the autovacuum_vacuum_scale_factor,autovacuum_vacuum_analyze_factor parameters will reduce the auto vacuum process execution time. Thus we can optimise the Auto Vacuum by avoiding frequent run.
Purpose of Auto Vacuuming:
To reclaim the dead tuple spaces
To speed up the index scans and query performance
To update data statistics used by the query planner
To protect data loss of old data
Hope you get a better idea when and how to use the auto vacuum in PostgreSQL. Please let us know if you have any queries in the comments section.