PostgreSQL Performance: What I Learned from 50M-Row Tables
June 8, 2024
The 50M-row reality
When your tables grow beyond a few million rows, queries that used to be fast suddenly aren't. Index scans become sequential scans, joins become disk spills, and your application latency goes from 5ms to 500ms overnight.
Index strategies that work
Partial indexes
If you only query active records, index only those:
CREATE INDEX idx_orders_active ON orders (created_at)
WHERE status IN ('pending', 'processing');Covering indexes
Include all columns your query needs in the index to avoid heap lookups:
CREATE INDEX idx_users_email_covering ON users (email)
INCLUDE (name, avatar_url, created_at);Composite indexes
Order columns by selectivity — most selective first:
CREATE INDEX idx_orders_status_created ON orders (status, created_at DESC);Connection pooling is critical
PostgreSQL forks a process per connection. At 200 connections, you're spending more time on context switching than on actual query execution. Use PgBouncer in transaction mode to multiplex thousands of application connections through a small pool of database connections.
Configuration that matters
work_mem: Set per-session for sort operations (4-8MB is a good start)shared_buffers: 25% of available RAMeffective_cache_size: 75% of available RAMrandom_page_cost: Lower to 1.1 if you're on SSDmax_parallel_workers_per_gather: Set to 2-4 for analytical queries
The biggest win
The single most impactful change we made was adding proper EXPLAIN (ANALYZE, BUFFERS) to our deployment checklist. Every query that ships to production gets its plan reviewed. It caught more performance issues than any monitoring tool.