PostgreSQLDatabaseBackend8 min read

PostgreSQL Performance: What I Learned from 50M-Row Tables

June 8, 2024

The 50M-row reality

When your tables grow beyond a few million rows, queries that used to be fast suddenly aren't. Index scans become sequential scans, joins become disk spills, and your application latency goes from 5ms to 500ms overnight.

Index strategies that work

Partial indexes

If you only query active records, index only those:

CREATE INDEX idx_orders_active ON orders (created_at)
WHERE status IN ('pending', 'processing');

Covering indexes

Include all columns your query needs in the index to avoid heap lookups:

CREATE INDEX idx_users_email_covering ON users (email)
INCLUDE (name, avatar_url, created_at);

Composite indexes

Order columns by selectivity — most selective first:

CREATE INDEX idx_orders_status_created ON orders (status, created_at DESC);

Connection pooling is critical

PostgreSQL forks a process per connection. At 200 connections, you're spending more time on context switching than on actual query execution. Use PgBouncer in transaction mode to multiplex thousands of application connections through a small pool of database connections.

Configuration that matters

work_mem: Set per-session for sort operations (4-8MB is a good start)
shared_buffers: 25% of available RAM
effective_cache_size: 75% of available RAM
random_page_cost: Lower to 1.1 if you're on SSD
max_parallel_workers_per_gather: Set to 2-4 for analytical queries

The biggest win

The single most impactful change we made was adding proper EXPLAIN (ANALYZE, BUFFERS) to our deployment checklist. Every query that ships to production gets its plan reviewed. It caught more performance issues than any monitoring tool.

All posts