Anas Anjaria
Anas Anjaria's blog

Follow

Anas Anjaria's blog

Follow
Improve Postgresql Performance by Reducing Dead Tuple without Auto-vacuum

Photo by Max Duzij on Unsplash

Improve Postgresql Performance by Reducing Dead Tuple without Auto-vacuum

Anas Anjaria's photo
Anas Anjaria
·Sep 9, 2022·

2 min read

This story is about how we faced critical performance issues with one of our Postgres databases and how our team resolved it.

A high-level overview of our system

Import data, process it, and export it back

Today, it’s all about data. Our solution continuously synchronizes customers’ business data, processes it, and finally provide more business insights and enrich data with valuable information to facilitate customer.

How did the problem emerge?

We have watchers that inform us when something goes wrong with our system. Suddenly, all the watchers start firing for a specific application.

After doing some investigation, we noticed that our Postgres was heavily overloaded. Taking a deeper look at the AWS performance insights [1], I examined that an auto-vacuum was executing aggressively for a specific table. As a result, our system ran out of CPU credits, which led to significant performance degradation. Finally, it slowed down our whole application.

The obvious question was:

why did auto-vacuuming start so aggressively out of nowhere?

The answer to this question was an increasing number of dead tuples due to data synchronization.

My mistake

For data synchronization, I used UPSERT (insert when no record exists and update if it exists). I was not aware that an UPDATE command is implemented differently in Postgres.

It adds a new tuple (or row) and marks the old tuple as dead. Afterward, auto-vacuum removes these dead tuples based on auto vacuum settings.

Have a look at this excellent article [2] on this topic — Multi-Version Concurrency Control (MVCC).

Solution

Based on my findings, I came up with a solution to only upsert if and only if there is any change concerning an existing record in DB.

PSUEDO CODE

1. FETCH EXISTING DATA  
2. SEPARATE DATA AS UNCHANGED AND DATA-TO-BE-UPSERTED  
3. UPSERT DATA-TO-BE-UPSERTED

It drastically reduces dead tuples and auto-vacuum. It also helped in resolving performance issues.

Furthermore, my use-case was about data synchronization done at regular intervals, so I barely observed changes in data that often.

Resources

[1] rds/performance-insights/

[2] basic-understanding-bloat-vacuum-postgresql-mvcc


Thanks for reading.

Originally published on medium


Want to connect?

anasanjaria.bio.link

 
Share this