PostgreSQL Bloat Monitoring: How to Detect and Fix Table & Index Bloat

Most PostgreSQL performance issues get blamed on slow queries, missing indexes, or hardware constraints. In practice, however, some of the most damaging problems in production systems are far quieter than that. Disk usage climbs. Queries get slower. I/O pressure builds steadily. No alert fires. By the time something is visibly wrong, the damage has already been done.

That quiet, gradual problem is almost always bloat.

What Is PostgreSQL Bloat?

PostgreSQL's concurrency model — MVCC, or Multi-Version Concurrency Control — means that updates never overwrite existing rows in place. Instead, each update creates a new version of the row, and the old version is marked as dead. Those dead tuples are eventually cleaned up by VACUUM, which reclaims the space they occupy so it can be reused.

When VACUUM cannot keep pace with the workload — whether due to misconfiguration, insufficient resources, or long-running transactions holding back cleanup — dead tuples accumulate inside table and index pages. The space they occupy becomes wasted overhead that PostgreSQL still has to read, cache, and manage.

That accumulated overhead is bloat.

Why Bloat Matters in Production

Bloat is not simply a storage accounting problem. It has a direct and measurable impact on how the database performs under load, and understanding that impact is essential to appreciating why monitoring it consistently matters.

As dead tuples accumulate inside heap pages, sequential scans grow more expensive because PostgreSQL must read more data pages than the live row count would justify. The same applies to indexes: a bloated index contains stale entries that inflate its size and reduce scan efficiency, making lookups slower even when the index is otherwise well-designed.

More pages on disk translate directly to higher I/O costs. PostgreSQL reads and processes more blocks per query, increasing pressure on both throughput and latency. At the memory level, shared buffers and OS page cache fill with less useful data, reducing cache hit rates and causing more frequent disk reads.

Maintenance operations compound the problem. VACUUM takes longer to process bloated tables, which can push it further behind schedule — a feedback loop where cleanup falls behind and bloat continues to grow. In high-throughput systems, even moderate bloat can introduce perceptible latency that worsens gradually and is difficult to attribute without the right monitoring in place.

Two Distinct Problems: Table Bloat and Index Bloat

Table Bloat

Table bloat occurs when dead tuples remain inside heap pages longer than they should, occupying space that VACUUM has not yet reclaimed. It is most prevalent in tables with high write activity — particularly workloads dominated by updates rather than inserts.

Consider an orders table where each row transitions through multiple states during its lifecycle: pending, processed, dispatched, completed. Every status update creates a new row version and marks the old one as dead. If autovacuum is not configured aggressively enough for that table's write rate, dead tuples accumulate over time. The table grows in physical size even though the count of active rows stays essentially constant. That divergence between physical size and useful data is table bloat.

Index Bloat

Index bloat is a subtler and often more damaging problem, in part because it is harder to detect and because PostgreSQL does not reclaim index space as efficiently as it reclaims heap space after row modifications.

When a row is updated on an indexed column, a new index entry is created pointing to the new row version. The old entry may remain in the index structure until a maintenance operation reorganizes it. Over time, this results in indexes that are significantly larger than they need to be, less efficient during lookups, and more expensive to maintain.

The impact is particularly pronounced in tables with high update rates on indexed columns. An index on a frequently changing column can grow to a size that is disproportionate to the table itself, slowing scans and increasing I/O for queries that should otherwise be fast.

Detecting Bloat in Practice

PostgreSQL does not expose a single built-in metric that directly quantifies bloat. Detection always involves indirect indicators, estimation queries, or extensions — and interpreting those signals correctly requires context about the workload, not just the numbers themselves.

Starting Point: Dead Tuple Ratios

The most accessible entry point is pg_stat_user_tables, which tracks live and dead tuple counts per table. Extending the basic query with a dead ratio percentage makes it easier to prioritize:

SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  round(
    n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 2
  ) AS dead_ratio_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 0
ORDER BY dead_ratio_pct DESC;

The critical caveat is that n_dead_tup is a statistics-based estimate, not an exact count. Its reliability depends on how recently ANALYZE or autovacuum has run. Running this query once tells you relatively little — tracking it over time reveals whether dead tuples are accumulating faster than VACUUM is cleaning them up, which is the actual signal worth acting on.

Cross-Referencing Size Against Row Count

A more intuitive approach is comparing physical table size against actual live row count. The bytes_per_live_row metric makes the imbalance between stored data and physical footprint explicit:

SELECT
  relname,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  pg_size_pretty(pg_relation_size(relid))        AS table_size,
  n_live_tup,
  pg_total_relation_size(relid) / nullif(n_live_tup, 0) AS bytes_per_live_row
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

This is not a precise measurement, but it is highly effective at surfacing outliers. An unusually large bytes_per_live_row — especially compared against similar tables in the same schema — is a reliable first indicator that a table warrants closer investigation.

Accurate Measurement with pgstattuple

When estimates are insufficient for a decision, pgstattuple provides physically accurate data by scanning actual table pages rather than relying on statistics:

CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
  table_len,
  tuple_count,
  tuple_len,
  dead_tuple_count,
  dead_tuple_len,
  free_space,
  round(
    (dead_tuple_len + free_space)::numeric / table_len * 100, 2
  ) AS bloat_pct
FROM pgstattuple('your_table');

The bloat_pct value expresses wasted space as a percentage of total physical table size. Values above 20–30% are generally worth addressing, though the right threshold depends on the table's access patterns and churn rate. The trade-off is cost: pgstattuple performs a full sequential scan, making it expensive on large tables and unsuitable for high-frequency use. It belongs in the investigation toolkit, not the continuous monitoring stack.

Detecting Index Bloat

Index bloat requires a different approach because PostgreSQL does not expose internal index fragmentation directly. The most practical starting point is comparing index size against the table it covers:

SELECT
  ui.schemaname,
  ui.relname                                           AS table_name,
  ui.indexrelname                                      AS index_name,
  pg_size_pretty(pg_relation_size(ui.indexrelid))      AS index_size,
  pg_size_pretty(pg_relation_size(i.indrelid))         AS table_size,
  round(
    pg_relation_size(ui.indexrelid)::numeric /
    nullif(pg_relation_size(i.indrelid), 0) * 100, 2
  )                                                    AS index_to_table_pct
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
ORDER BY pg_relation_size(ui.indexrelid) DESC;

An index exceeding the size of its own table is not automatically a problem — multi-column indexes on wide tables can legitimately be large — but it warrants scrutiny. The more reliable signal is growth over time: a stable row count paired with steadily growing indexes is the pattern characteristic of index bloat.

For deeper structural analysis, pgstatindex exposes internal page-level details that make fragmentation concrete:

SELECT
  version,
  tree_level,
  index_size,
  leaf_pages,
  empty_pages,
  deleted_pages,
  avg_leaf_density,
  leaf_fragmentation
FROM pgstatindex('your_index_name');

leaf_fragmentation consistently above 30%, combined with a meaningful number of deleted_pages, is a reliable indicator that a rebuild will produce a measurable performance improvement.

Tracking VACUUM Activity Alongside Bloat

Dead tuple counts become far more useful when placed in context with VACUUM activity. A table where autovacuum runs frequently but dead tuples remain elevated is a table where autovacuum is underpowered for its workload — a very different diagnosis from a table that simply has not been vacuumed recently:

SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  round(
    n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 2
  )                  AS dead_ratio_pct,
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Tables where last_autovacuum is NULL or hours stale deserve immediate attention. Tables with a high autovacuum_count but still-accumulating dead tuples need their per-table autovacuum parameters tuned, not just their workload reduced.

A Production Case Study

In one production system, a heavily updated table had grown to approximately 15 GB and was climbing steadily. At first glance, nothing appeared wrong — query performance was acceptable and no monitors had fired. Over time, however, I/O utilization increased noticeably and response times on certain queries began to drift upward.

Closer investigation revealed that the number of live rows had not changed significantly in months. The workload was almost entirely updates. Running pgstattuple confirmed what the size trend had been hinting at: actual useful data accounted for roughly 6–7 GB of the table's 15 GB footprint. The rest was bloat — dead tuples and free space that VACUUM had marked for reuse but that had never been returned to the operating system.

After running pg_repack online, the table shrank to just under 7 GB. Query latency dropped, I/O utilization fell, and the improvement was visible in application-level metrics within minutes of the operation completing.

Remediation

Standard VACUUM should be the first tool considered. It marks dead space for reuse and is safe to run at any time without heavy locking. It does not shrink the physical file, but for tables under continuous write load it is often sufficient to keep bloat from accumulating once autovacuum is properly tuned.

VACUUM ANALYZE your_table;

VACUUM FULL rewrites the entire table and returns space to the operating system, fully eliminating bloat. It requires an exclusive lock for the duration of the operation, however, making it inappropriate for most production tables during business hours. It is best reserved for maintenance windows or tables with low access rates.

VACUUM FULL your_table;

REINDEX addresses index bloat by rebuilding the index structure from scratch. The CONCURRENTLY option significantly reduces locking overhead and is almost always preferable in production:

REINDEX INDEX CONCURRENTLY your_index_name;

-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY your_table;

For environments where any locking overhead is unacceptable during normal operation, pg_repack is the standard production-grade solution. It rebuilds tables and indexes online without taking heavy locks, and it is the most practical tool available for eliminating severe bloat on live, actively-queried systems.

Prevention

Reacting to bloat after it has accumulated is always more expensive than preventing it in the first place. The two most important levers are autovacuum configuration and transaction discipline.

PostgreSQL's default autovacuum settings are intentionally conservative and are frequently insufficient for write-heavy tables. A table that receives a high volume of updates per minute needs autovacuum tuned to match that reality:

ALTER TABLE your_table SET (
  autovacuum_vacuum_scale_factor  = 0.01,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_vacuum_cost_delay    = 2
);

Long-running transactions are among the most common root causes of bloat accumulation. PostgreSQL cannot vacuum rows that were visible at the start of an open transaction, so a transaction left idle for hours can hold back cleanup across the entire database. Monitoring for stale transactions should be part of any production runbook:

SELECT
  pid,
  now() - query_start AS duration,
  state,
  query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;

Any transaction visible here beyond a few minutes is worth investigating. Beyond a few tens of minutes, it is almost certainly contributing to bloat.

Automating Bloat Monitoring

Manual checks are valuable for debugging, but they do not constitute a monitoring strategy. Bloat accumulates continuously, and without automated tracking it will consistently be discovered only after it has already begun affecting performance.

A practical monitoring setup tracks table and index sizes over time, correlates growth against vacuum activity, and surfaces tables where the dead-to-live ratio is trending in the wrong direction. The goal is to catch bloat early, when it is inexpensive to address — not mid-incident, when it has become the explanation for a performance regression that is already affecting users.

Tools like pghealth automate this process by continuously evaluating storage patterns and highlighting areas where bloat is likely impacting performance, removing the need to run and interpret these queries manually on a regular cadence.

Closing Thoughts

Bloat is one of the most common and least visible problems in PostgreSQL production systems. It does not break anything overnight — it accumulates quietly, degrades performance gradually, and by the time it becomes obvious, the system has usually been underperforming for some time.

The teams that manage it best are not the ones who react fastest. They are the ones who never stop watching. Consistent monitoring, properly tuned autovacuum, and disciplined transaction management together form the only reliable defense against bloat becoming a recurring production problem.

For teams running production PostgreSQL systems, automated health checks provide a more reliable way to detect issues like bloat before they impact performance.

👉 PostgreSQL health check tool

Try pghealth Free Today 🚀

Start your journey toward a healthier PostgreSQL with pghealth.
You can explore all features immediately with a free trial — no agent installation required

👉 Start Free Trial