Improve Postgresql Performance by Reducing Dead Tuple without Auto-vacuum
This story is about how we faced critical performance issues with one of our Postgres databases and how our team resolved it.
A high-level overview of our system
![]() |
Today, it’s all about data. Our solution continuously synchronizes customers’ business data, processes it, and finally provide more business insights and enrich data with valuable information to facilitate customer.
How did the problem emerge?
We have watchers that inform us when something goes wrong with our system. Suddenly, all the watchers start firing for a specific application.
After doing some investigation, we noticed that our Postgres was heavily overloaded. Taking a deeper look at the AWS performance insights [1], I examined that an auto-vacuum was executing aggressively for a specific table. As a result, our system ran out of CPU credits, which led to significant performance degradation. Finally, it slowed down our whole application.
The obvious question was:
why did auto-vacuuming start so aggressively out of nowhere?
The answer to this question was an increasing number of dead tuples due to data synchronization.
My mistake
For data synchronization, I used UPSERT
(insert when no record exists and update if it exists). I was not aware that an UPDATE
command is implemented differently in Postgres.
It adds a new tuple (or row) and marks the old tuple as dead. Afterward, auto-vacuum removes these dead tuples based on auto vacuum settings.
Have a look at this excellent article [2] on this topic — Multi-Version Concurrency Control (MVCC).
Solution
Based on my findings, I came up with a solution to only upsert if and only if there is any change concerning an existing record in DB.
PSUEDO CODE
1. FETCH EXISTING DATA
2. SEPARATE DATA AS UNCHANGED AND DATA-TO-BE-UPSERTED
3. UPSERT DATA-TO-BE-UPSERTED
It drastically reduces dead tuples and auto-vacuum. It also helped in resolving performance issues.
Furthermore, my use-case was about data synchronization done at regular intervals, so I barely observed changes in data that often.
Resources
[2] basic-understanding-bloat-vacuum-postgresql-mvcc
Thanks for reading.
Originally published on medium