Skip to content

@tank/relational-db-mastery

1.1.0

Relational database performance for PostgreSQL and MySQL. Indexing (B-tree, GIN, GiST, BRIN, composite, partial, covering), query optimization (JOINs, CTEs, pagination, N+1), EXPLAIN ANALYZE, schema design, PostgreSQL tuning (VACUUM, pg_stat_statements), MySQL/InnoDB tuning, monitoring. Triggers: slow query, database performance, index, EXPLAIN ANALYZE, query plan, schema design, composite index, N+1, pagination, VACUUM, autovacuum, table bloat, PostgreSQL, MySQL, InnoDB, SQL performance.


name: "@tank/relational-db-mastery" description: | Relational database performance optimization for PostgreSQL and MySQL. Covers indexing strategies (B-tree, GIN, GiST, BRIN, partial, expression, covering, composite), query optimization (JOINs, CTEs, pagination, N+1), EXPLAIN ANALYZE interpretation, schema design for performance, PostgreSQL tuning (VACUUM, autovacuum, connection pooling, pg_stat_statements), MySQL/InnoDB tuning, and monitoring diagnostics.

Synthesizes Winand (SQL Performance Explained), Petrov (Database Internals), Karwin (SQL Antipatterns), Schwartz (High Performance MySQL), Kleppmann (Designing Data-Intensive Applications), and PostgreSQL/MySQL documentation.

Trigger phrases: "slow query", "query optimization", "database performance", "index", "indexing strategy", "EXPLAIN ANALYZE", "query plan", "schema design", "normalization", "denormalization", "B-tree", "GIN index", "composite index", "covering index", "partial index", "N+1 query", "pagination", "keyset pagination", "cursor pagination", "connection pool", "PgBouncer", "VACUUM", "autovacuum", "table bloat", "dead tuples", "cache hit ratio", "PostgreSQL tuning", "MySQL tuning", "InnoDB", "foreign key index", "missing index", "over-indexing", "database schema", "data types", "partitioning", "pg_stat_statements", "slow query log", "JOINs", "subquery vs CTE", "query rewriting", "database optimization", "SQL performance", "database design", "relational database"

Relational Database Mastery

Core Philosophy

  1. Measure before optimizing — Run EXPLAIN ANALYZE before changing anything. Intuition about query performance is unreliable.
  2. Index for queries, not columns — Design indexes based on actual query patterns, not schema structure.
  3. Normalize first, denormalize with evidence — Start with 3NF. Denormalize only when measurements prove a JOIN is the bottleneck.
  4. The database is not a black box — Understand the planner's decisions. Read query plans. Monitor cache hit ratios.
  5. PostgreSQL and MySQL are different engines — Tuning that works for one may hurt the other. Know which you're targeting.

Quick-Start: Common Problems

"My query is slow"

  1. Get the query → Run EXPLAIN (ANALYZE, BUFFERS) on it
  2. Read the plan bottom-to-top → Find the node with highest actual time
  3. Is it a Seq Scan on a large table? → Add an index (see references/indexing-strategies.md)
  4. Are row estimates wildly wrong? → Run ANALYZE on the table
  5. Is it a disk-bound Sort? → Increase work_mem or add matching index → See references/explain-analyze.md for full plan reading guide

"Which index should I create?"

  1. What's the query pattern? Equality, range, full-text, JSONB, geometric?
  2. Apply the Index Type Selection table → B-tree, GIN, GiST, or BRIN
  3. Composite index? → Apply E-R-S rule: Equality columns → Range columns → Sort columns
  4. Large table, skewed data? → Use partial index
  5. Need index-only scans? → Use covering index with INCLUDE → See references/indexing-strategies.md

"Database is generally slow"

  1. Check cache hit ratio → Below 99%? Increase shared_buffers
  2. Check active connections → Near max? Add connection pooling
  3. Check dead tuple ratio → Above 20%? Tune autovacuum
  4. Check pg_stat_statements → Find top queries by total time
  5. Check for lock contention → Long-running transactions blocking others → See references/monitoring-diagnostics.md and references/postgresql-tuning.md

"Should I normalize or denormalize?"

  1. Is this a new schema? → Normalize to 3NF
  2. Are JOINs proven slow via EXPLAIN? → Consider denormalization
  3. Is it read-heavy with complex aggregations? → Materialized views
  4. Is it write-heavy with consistency needs? → Stay normalized → See references/schema-design.md

Decision Trees

Index Type Selection

Query PatternIndex TypeExample
Equality (=)B-treeWHERE status = 'active'
Range (<, >, BETWEEN)B-treeWHERE created_at > '2024-01-01'
Pattern match (LIKE 'prefix%')B-treeWHERE name LIKE 'John%'
Full-text searchGINWHERE to_tsvector('english', body) @@ query
JSONB containmentGINWHERE metadata @> '{"type":"admin"}'
Array containsGINWHERE tags @> ARRAY['python']
Geometric/spatialGiSTWHERE ST_DWithin(geom, point, 1000)
Range overlapGiSTWHERE tsrange @> now()
Large time-series tableBRINWHERE created_at > '2024-01-01' (billions of rows)

Query Pattern Selection

SituationUseAvoid
Direct table relationshipJOINCorrelated subquery
Existence checkEXISTS (SELECT 1 ...)IN (SELECT ...) on large sets
Multi-step transformationCTE (PG 12+ inlines)Nested subqueries
Top-N per groupLATERAL JOIN or ROW_NUMBER()Correlated subquery per row
Large dataset paginationKeyset/cursor paginationOFFSET on large offsets
Bulk data loadingCOPY (PG) / LOAD DATA (MySQL)Individual INSERT statements

PostgreSQL vs MySQL Tuning

ConcernPostgreSQLMySQL
Main memory settingshared_buffers (25% RAM)innodb_buffer_pool_size (70-80% RAM)
Dead row cleanupVACUUM / autovacuumInnoDB purge thread (automatic)
Connection overheadHeavy (process per conn) → use PgBouncerLight (thread per conn)
Slow query identificationpg_stat_statementsSlow query log + pt-query-digest
Online index creationCREATE INDEX CONCURRENTLYALTER TABLE ... ADD INDEX (online by default)
SSD optimizationrandom_page_cost = 1.1innodb_io_capacity = 2000+

Anti-Patterns Quick Reference

Anti-PatternProblemFix
No index on foreign keysSlow JOINs, slow CASCADE deletesAlways index FKs
SELECT * everywhereExtra I/O, blocks index-only scansSelect only needed columns
WHERE function(column)Kills index usageExpression index or rewrite query
OFFSET 10000 LIMIT 20Scans and discards 10000 rowsKeyset pagination
N+1 queries in ORM1 + N round trips to databaseEager loading or batch fetch
FLOAT for moneyRounding errorsNUMERIC(precision, scale)
UUID v4 primary keysRandom B-tree inserts, fragmentationUUIDv7 or BIGSERIAL + public UUID
Over-indexingWrite amplification, disk wasteAudit with pg_stat_user_indexes
Skipping ANALYZEStale statistics, bad plansRun after bulk data changes

Reference Files

FileContents
references/indexing-strategies.mdAll index types, selection decision tree, composite index column order (E-R-S rule), GIN/GiST/BRIN deep dives, anti-patterns, PG vs MySQL differences
references/query-optimization.mdJOIN optimization, subquery vs CTE vs JOIN, N+1 problem, pagination strategies, batch operations, window functions, query rewriting
references/explain-analyze.mdReading EXPLAIN output, scan types, join types, cost model, red flags, estimation errors, MySQL differences, optimization workflow
references/schema-design.mdNormalization/denormalization, data type selection (UUID vs serial, JSONB vs columns), partitioning, temporal patterns, schema anti-patterns
references/postgresql-tuning.mdMemory config, SSD settings, VACUUM/autovacuum, connection pooling (PgBouncer), lock management, pg_stat_statements, WAL tuning
references/mysql-tuning.mdInnoDB architecture, buffer pool, flush settings, slow query log, optimizer hints, character sets, replication, PG vs MySQL comparison
references/monitoring-diagnostics.mdKey metrics dashboard, diagnostic queries (PG + MySQL), cache hit ratio, index usage, table bloat, alerting thresholds, diagnostic workflow

Command Palette

Search skills, docs, and navigate Tank