- Published on
The Overconfident Junior Breaking Prod — Guardrails That Protect Without Demoralizing
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Junior engineers break production in predictable ways: they have the right intentions, incomplete mental models of system consequences, and access that should require more friction than it does. The temptation is to respond with surveillance or restriction that demoralizes the engineer and prevents them from growing. The better response is to design systems where the catastrophic mistakes require extra steps — production deploys require peer review, database migrations have mandatory backup steps, and irreversible operations require explicit confirmation — while the engineer retains full development autonomy.
- How Junior Engineers Break Production
- Fix 1: Branch Protection and Required Reviews
- Fix 2: Production Database Safety Rails
- Fix 3: IAM and Access Boundaries
- Fix 4: Incident as Learning, Not Punishment
- Fix 5: Production Access Ladder
- Junior Engineer Safety Checklist
- Conclusion
How Junior Engineers Break Production
Common junior-engineer production incidents:
1. "I'll fix this quickly in prod directly"
→ Direct production database change without a ticket or review
→ Wrong WHERE clause → deletes wrong rows
2. "This migration is the same as staging"
→ Runs migration on prod without checking prod-specific conditions
→ Table size: staging 1000 rows, prod 50M rows → locks table for hours
3. "I need to update this config quickly"
→ Modifies production environment variable
→ Forgets to revert → forgotten change causes incident 3 weeks later
4. "I'll just force push to fix the merge conflict"
→ git push --force on main
→ 3 other engineers' commits gone
5. "This is staging, right?"
→ S3 bucket delete on production bucket
→ All user uploads gone, no versioning enabled
Prevention philosophy:
- Make the dangerous action require extra steps
- Make the safe action require fewer steps
- Don't remove access — add friction to irreversible operations
Fix 1: Branch Protection and Required Reviews
# .github/branch-protection.yml — protect main from force pushes and direct commits
# Configure in GitHub Settings → Branches → Branch protection rules
# Rules for 'main':
# ✅ Require a pull request before merging
# ✅ Require 2 approvals (1 from senior engineer)
# ✅ Dismiss stale reviews when new commits pushed
# ✅ Require status checks to pass (CI, lint, tests)
# ✅ Restrict who can push (no direct push, even for seniors)
# ✅ Require linear history (no merge commits)
# ✅ Include administrators (no exceptions for "quick fixes")
# ✅ Allow force pushes: NEVER
# CODEOWNERS for sensitive paths — senior engineer must review
# .github/CODEOWNERS
/db/migrations/ @senior-engineer @staff-engineer # Migration reviewer required
/infrastructure/ @platform-team # Infra change reviewer required
/scripts/deploy* @senior-engineer # Deploy script change reviewed
Fix 2: Production Database Safety Rails
#!/bin/bash
# safe-migration.sh — migration script that enforces safety steps
set -euo pipefail
ENV="${1:-}"
if [ -z "$ENV" ]; then
echo "Usage: ./safe-migration.sh [staging|production]"
exit 1
fi
# Production requires extra steps
if [ "$ENV" == "production" ]; then
echo "⚠️ Production migration — extra checks required"
# Step 1: Mandatory backup
echo "Creating backup before migration..."
BACKUP_NAME="pre-migration-$(date +%Y%m%d-%H%M%S)"
aws rds create-db-snapshot \
--db-instance-identifier myapp-prod \
--db-snapshot-identifier "$BACKUP_NAME"
echo "Backup created: $BACKUP_NAME"
echo "Waiting for backup to complete..."
aws rds wait db-snapshot-available \
--db-snapshot-identifier "$BACKUP_NAME"
# Step 2: Estimate migration impact
ROW_COUNT=$(psql "$DATABASE_URL" -t -c "SELECT reltuples::bigint FROM pg_class WHERE relname = '${MIGRATION_TABLE:-unknown}'")
echo "Estimated rows affected: $ROW_COUNT"
if [ "$ROW_COUNT" -gt 1000000 ]; then
echo "⚠️ Large table migration (>1M rows). This may lock the table."
read -p "Continue? (type 'yes' to confirm): " CONFIRM
if [ "$CONFIRM" != "yes" ]; then
echo "Aborted."
exit 1
fi
fi
# Step 3: Require second engineer confirmation
read -p "Enter your name (for audit log): " ENGINEER_NAME
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) Production migration by $ENGINEER_NAME" >> /var/log/migration-audit.log
fi
# Run the migration
echo "Running migration..."
yarn migrate:latest
echo "✅ Migration complete"
Fix 3: IAM and Access Boundaries
# Principle of least privilege: junior engineers get dev/staging access
# Production access requires explicit request and senior approval
# AWS IAM: Junior engineer policy
# - Full access to dev/staging environments
# - Read-only access to production
# - Cannot: delete production buckets, modify production RDS, change prod security groups
# aws-iam-junior-engineer.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Environment": ["dev", "staging"]
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:Describe*",
"ecs:Describe*",
"ecs:List*",
"rds:Describe*",
"cloudwatch:Get*",
"cloudwatch:List*",
"logs:Get*",
"logs:FilterLogEvents"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Environment": "production"
}
}
},
{
"Effect": "Deny",
"Action": [
"s3:DeleteBucket",
"s3:DeleteObject",
"rds:DeleteDBInstance",
"rds:DeleteDBSnapshot",
"ec2:TerminateInstances"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Environment": "production"
}
}
}
]
}
Fix 4: Incident as Learning, Not Punishment
// When a junior engineer causes a production incident:
// The system failed before the person did
const incidentResponseForJuniorEngineer = {
immediate: {
priority: 'Fix the incident, not find the blame',
engineerRole: 'Junior engineer learns from the fix, doesn\'t own it alone',
mentorRole: 'Senior engineer guides but doesn\'t push junior aside',
afterAction: 'Thank the engineer for being transparent about what happened',
},
postmortems: {
avoid: [
'Calling out individual mistakes in the public postmortem',
'"This wouldn\'t have happened if they\'d been more careful"',
'Making the engineer feel they can\'t be trusted',
],
do: [
'Focus on which guardrail was missing',
'"The migration script didn\'t require a backup step — that\'s what we\'re fixing"',
'Include the junior engineer in designing the fix — they\'ll never forget it',
],
},
systemFixes: [
'Add the missing guardrail to prevent this class of mistake',
'Add the scenario to onboarding so future engineers learn before doing',
'Review what other paths have the same missing guardrail',
],
}
Fix 5: Production Access Ladder
Graduated access that grows with demonstrated judgment:
Month 1-3 (Onboarding):
- Full dev environment access
- Staging with senior engineer pair
- Read-only production access
- No direct production deploys
Month 3-6 (Foundation):
- Independent staging deploys
- Production deploys via CI/CD with required approval
- Can access production logs
- Still no direct production database access
Month 6-12 (Growing):
- Production deploys independently (still CI/CD, not manual)
- Can run pre-approved maintenance scripts in production
- Read access to production DB through read replica
- No write access to production DB without approval
Year 1+ (Trusted):
- All above, plus escalated access for incidents
- Can run production DB queries with senior review
- On-call rotation begins
The ladder gives junior engineers a clear path to more access
while protecting the system from mistakes that happen during
the learning period.
Junior Engineer Safety Checklist
- ✅ Branch protection: no direct push to main, required reviews for merges
- ✅ CODEOWNERS: migrations and infrastructure require senior reviewer
- ✅ IAM least privilege: production is read-only until access is earned
- ✅ Migration scripts have mandatory backup steps for production
- ✅ Large table migrations warn and require explicit confirmation
- ✅ Graduated access ladder: responsibility grows with demonstrated judgment
- ✅ Incidents treated as system failures, not personal failures — junior engineers design the fix
Conclusion
Overconfident juniors breaking production is a systems design problem, not a people problem. The guardrails that prevent most of these incidents — protected branches, required reviews, IAM least privilege, migration safety scripts, and gradual access ladders — take a few hours to implement and eliminate whole categories of incidents. The right question after an incident isn't "how do we watch this engineer more closely?" but "what made this action too easy to take, and how do we add appropriate friction?" Designed well, those guardrails don't restrict junior engineers — they give them a safe environment to learn and build the judgment that earns greater access over time.