Databases

38 articles

Accidental Full Table Scan — The Query That Brought Down Production

The query works fine in development with 1,000 rows. In production with 50 million rows it locks up the database for 3 minutes. One missing WHERE clause, one implicit type cast, one function wrapping an indexed column — and PostgreSQL ignores your index entirely.

March 15, 2026Read →

backend7 min read

Backup That Never Worked — The False Safety Net That Fails When You Need It Most

You''ve been running backups for 18 months. The disk dies. You go to restore. The backup files are empty. Or corrupted. Or the backup job failed silently on month 4 and you''ve been running without a backup ever since. Untested backups are not backups.

March 15, 2026Read →

backend6 min read

Cache Invalidation Hell — The Second Hardest Problem in Computer Science

Users see stale prices. Admins update settings but the old value is served for 10 minutes. You delete a record but it keeps appearing. Cache invalidation is famously hard — and most implementations have subtle bugs that serve wrong data long after the source changed.

March 15, 2026Read →

backend6 min read

Cascade Delete Nightmare — When Deleting One Row Deletes Ten Thousand

You add ON DELETE CASCADE to a foreign key. You delete a test organization. It cascades to users, which cascades to sessions, orders, invoices, activity_logs — 10,000 rows gone in milliseconds. No warning, no undo. Cascade deletes are powerful and dangerous.

March 15, 2026Read →

backend6 min read

Data Corruption from Bad Serialization — When Your Data Silently Changes

You store a price as a JavaScript float. You retrieve it as 19.99. You display it as 20.000000000000004. Or you store a BigInt user ID as JSON and it becomes the wrong number. Serialization bugs corrupt data silently — no error, just wrong values.

March 15, 2026Read →

databases6 min read

Database Branching — Development Workflows With Neon, PlanetScale, and Branch-Per-PR

Use database branching to test migrations safely. Branch per PR, mask PII, and integrate with CI/CD for rapid iteration.

March 15, 2026Read →

backend7 min read

Dealing With Silent System Failure — The Bug That's Been Running for Three Months

The email job has been failing silently for three months. 50,000 emails not sent. Or the background sync has been silently skipping records. Or the backup has been succeeding at creation but failing at upload. Silent failures are the most dangerous kind.

March 15, 2026Read →

backend7 min read

Designing for 10x Growth — What Changes, What Doesn't, and What to Ignore

Your system handles 1,000 users today. You''re designing for 10,000. Not 10 million — 10,000. Most "design for scale" advice is written for companies you''re not. What actually changes at 10x, and what''s over-engineering that will hurt more than help?

March 15, 2026Read →

backend6 min read

Inconsistent Reads — The Eventual Consistency Shock

User updates their profile. Refreshes the page — old data shows. They update again. Still old data. They''re furious. Your system is eventually consistent — but nobody told the user (or the developer who designed the UI). Here''s how to manage consistency expectations in distributed systems.

March 15, 2026Read →

backend7 min read

Founder Demands "Just Make It Fast" — Translating Business Pressure Into Engineering Work

"The app is slow. Fix it." — said by the founder, with no further context. Is the homepage slow? Checkout? API responses? For which users? On mobile? Under what conditions? Turning vague business pressure into actionable performance work requires measurement before code.

March 15, 2026Read →

backend7 min read

GDPR Data Deletion Panic — The "Right to Be Forgotten" Request That Takes Six Weeks

A user submits a GDPR deletion request. You have 30 days to comply. But their data is in the main DB, the analytics DB, S3, Redis, CloudWatch logs, third-party integrations, and three months of database backups. You have 30 days. Start now.

March 15, 2026Read →

backend6 min read

Improper Sharding Strategy — When Your "Scalable" Database Isn't

You shard by user ID. 80% of writes go to 20% of shards because your top customers are assigned to the same shards. Or you shard by date and all writes go to the current month''s shard. Uneven distribution turns a scaling solution into a bottleneck.

March 15, 2026Read →

backend8 min read

Knowing When Architecture Is Overkill — The Senior Engineer's Restraint Problem

The senior engineer proposes Kafka for the notification system. You have 500 users. The junior engineer proposes a direct function call. The senior engineer is technically correct and strategically wrong. Knowing when good architecture is overkill is the skill that separates senior from staff.

March 15, 2026Read →

backend6 min read

Large Offset Query Slowness — The Export Job That Takes 6 Hours

You need to export 10 million rows. You paginate with OFFSET, fetching 1,000 rows at a time. The first batch takes 50ms. By batch 5,000 the offset is 5 million rows and each batch takes 30 seconds. The total job takes 6 hours and gets slower as it goes.

March 15, 2026Read →

backend5 min read

Log Table Filling Disk — When Your Audit Trail Becomes a Crisis

Audit logs are critical for compliance and debugging. But an audit_logs table that grows without bounds will fill your disk, slow every query that touches it, and eventually crash your database. Here''s how to keep your logs without letting them kill production.

March 15, 2026Read →

backend6 min read

Migration Locking the Table — The ALTER TABLE That Took Down Production

You deploy a migration that runs ALTER TABLE on a 40-million row table. PostgreSQL rewrites the entire table. Your app is stuck waiting for the lock. Users see 503s for 8 minutes. Schema changes on large tables require a completely different approach.

March 15, 2026Read →

backend6 min read

Missing Database Index — Why Your App Slows Down as It Grows

Month 1 — queries are fast. Month 6 — users notice slowness. Month 12 — the dashboard times out. The data grew but the indexes didn''t. Finding and adding the right index is often a 10-minute fix that makes queries 1000x faster.

March 15, 2026Read →

neon10 min read

Neon Serverless Postgres — Database Branching, Scale-to-Zero, and When to Use It

Explore Neon''s serverless PostgreSQL: git-like branching for preview environments, scale-to-zero cold starts, connection pooling, point-in-time restore, comparison with PlanetScale and Supabase, cost analysis.

March 15, 2026Read →

backend7 min read

No Rollback Strategy — The Deploy That Can't Be Undone

Error rate spikes after deploy. You need to roll back. But the migration already ran, the old binary can''t read the new schema, and "reverting the deploy" means a data loss decision. Rollback is only possible if you design for it before you deploy.

March 15, 2026Read →

backend6 min read

Pagination Killing Performance — Why OFFSET Gets Slower as Pages Increase

Page 1 loads in 10ms. Page 100 loads in 500ms. Page 1000 loads in 5 seconds. OFFSET pagination makes the database skip rows by reading them all first. Cursor-based pagination fixes this — same performance on page 1 and page 10,000.

March 15, 2026Read →

backend7 min read

Payment Gateway Timeout Chaos — When Stripe Takes 30 Seconds and You Don't Know If the Charge Went Through

Stripe times out at 30 seconds. Did the charge happen? You don''t know. You charge again and double-charge the customer. Or you don''t charge and ship for free. Payment idempotency and webhook reconciliation are the only reliable path through this.

March 15, 2026Read →

postgresql6 min read

PostgreSQL Indexing Strategies — B-Tree, GIN, BRIN, and Partial Indexes in Production

Master PostgreSQL indexing strategies including B-Tree for general queries, GIN for JSONB/arrays, BRIN for time-series, partial indexes, covering indexes, and how to identify unused indexes with pg_stat_user_indexes.

March 15, 2026Read →

postgresql8 min read

PostgreSQL Sharding — When to Shard, How to Shard, and What It Costs You

Learn when sharding becomes necessary, compare hash vs range vs list partitioning, explore Citus for horizontal scaling, and understand the costs of distributed queries and shard key selection.

March 15, 2026Read →

backend6 min read

Race Conditions in Microservices — When Two Services Agree on Something Wrong

Two requests check inventory simultaneously — both see 1 item in stock. Both proceed to purchase. You ship 2 items from 1. Race conditions in distributed systems are subtler than single-process races because you can''t use mutexes across services. Here''s how to prevent them.

March 15, 2026Read →

backend5 min read

Read Replica Lag — Why Your Users See Stale Data After Saving

User saves their profile. Page reloads. Shows old data. They save again — same thing. The write went to the primary. The read came from the replica. The replica is 2 seconds behind. Read-after-write consistency is the hardest problem with read replicas.

March 15, 2026Read →

backend7 min read

Refactoring Without Breaking Everything — The Incremental Path Through Legacy Code

The codebase is a mess. Nobody wants to touch it. The "obvious fix" requires changing 40 files. Every change breaks three things. Refactoring legacy code safely requires the strangler fig pattern, comprehensive tests before changing anything, and very small steps.

March 15, 2026Read →

backend8 min read

Restore That Took 9 Hours — Why You Need to Know Your RTO Before the Incident

The disk dies at 2 AM. You have backups. But the restore takes 9 hours because nobody tested it, the database is 800GB, the download from S3 is throttled, and pg_restore runs single-threaded by default. You could have restored in 45 minutes with the right setup.

March 15, 2026Read →

backend8 min read

Rewrite vs Refactor — The Decision That Defines the Next Two Years of Your Team

The codebase is painful. The team wants to rewrite it. The CTO wants to maintain velocity. Both are right. The rewrite vs refactor decision is one of the highest-stakes calls in software — get it wrong and you lose two years of productivity or two more years of compounding debt.

March 15, 2026Read →

backend7 min read

Scaling Under Black Friday Traffic — When Your Best Day Becomes Your Worst Incident

Traffic spikes 10x at 8 AM on Black Friday. Auto-scaling triggers but takes 4 minutes to add instances. The database connection pool is exhausted at minute 2. The checkout flow is down for your highest-traffic day of the year.

March 15, 2026Read →

backend5 min read

Schema Change Breaking Older Services — When Your Database Migration Breaks Half the Fleet

You rename a column. The new service version uses the new name. The old version, still running during the rolling deploy, tries to use the old name. Database error. The migration that passed all your tests breaks production because both old and new code run simultaneously during deployment.

March 15, 2026Read →

backend4 min read

Shared Database Across Services — The Hidden Monolith

You split into microservices but all of them share the same PostgreSQL database. You have the operational overhead of microservices with none of the independent scalability. A schema migration blocks all teams. A bad query in Service A slows down Service B.

March 15, 2026Read →

backend7 min read

Single Point of Failure Nobody Noticed — Until It Took Down Everything

The database has a replica. The app has multiple pods. You think you''re resilient. Then the single Redis instance goes down, and every service that depended on it — auth, sessions, rate limiting, caching — stops working simultaneously. SPOFs hide in plain sight.

March 15, 2026Read →

backend6 min read

Split Brain Scenario — When Your Cluster Can't Agree on Who's in Charge

Network partition splits your 3-node cluster into two halves. Both halves think they''re the primary. Both accept writes. Network heals. You have two diverged databases with conflicting data. This is split brain — one of the most dangerous failure modes in distributed systems.

March 15, 2026Read →

SQL Injection9 min read

SQL Injection in 2026 — How It Still Happens With ORMs and How to Prevent It

SQL injection persists in ORM applications. Learn why raw(), $executeRaw(), and stored procedures are injection vectors, and how to defend with parameterization.

March 15, 2026Read →

backend6 min read

Timezone Bugs in Distributed Systems — When 9 AM Means Different Things

Your server is in UTC. Your database is in UTC. Your cron job runs at "9 AM" — but 9 AM where? Customer in Tokyo and customer in New York both get charged at your server''s 9 AM. Your "end of day" reports include data from tomorrow. Timezone bugs are invisible until they''re expensive.

March 15, 2026Read →

backend6 min read

Unbounded Table Growth — When Your Database Fills the Disk at 3 AM

Sessions table. Events table. Audit log. Each row is small. But with 100,000 active users writing events every minute, it''s 5 million rows per day. No one added a purge job. Six months later the disk is full and the database crashes.

March 15, 2026Read →

backend7 min read

Underprovisioned Infrastructure Causing Downtime — When "Good Enough" Isn't

The t3.micro database that "works fine in staging" OOMs under real load. The single-AZ deployment that''s been fine for two years fails the week of your biggest launch. Underprovisioning is the other edge of the cost/reliability tradeoff — and it has a much higher price.

March 15, 2026Read →

nodejs9 min read

logixia 1.3.1 — Async-First Logging That Doesn't Block Your Node.js App

Most loggers are synchronous — they block your event loop writing to disk or a remote service. logixia is async-first, with non-blocking transports for PostgreSQL, MySQL, MongoDB, SQLite, file rotation, Kafka, WebSocket, log search, field redaction, and OpenTelemetry request tracing via AsyncLocalStorage.

March 14, 2026Read →