- Published on
Bot Traffic Killing Your APIs — When 80% of Your Traffic Isn't Human
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Bot traffic is the background noise of the internet, but for most APIs, it's not background — it's the majority of traffic. Content scrapers, price monitors, competitor intelligence bots, credential stuffers, and inventory hoarders collectively generate more traffic than real users on most public-facing services. They pay nothing, strain your infrastructure, and if you're not distinguishing them from legitimate traffic, they're affecting your real users' experience. The defense is layered: rate limiting, bot fingerprinting, behavioral analysis, and CAPTCHAs at the right friction points.
- Bot Traffic Patterns
- Fix 1: Rate Limiting With Multiple Dimensions
- Fix 2: Bot Fingerprinting
- Fix 3: Behavioral Analysis for Credential Stuffing
- Fix 4: Protect Scraping-Prone Endpoints
- Fix 5: Cloudflare as the First Line of Defense
- Bot Defense Checklist
- Conclusion
Bot Traffic Patterns
How to identify bot traffic:
1. Velocity attacks — clearly non-human
→ 1,000 requests in 10 seconds from one IP
→ Same endpoint, same payload, sequential IDs
→ No pause between requests
2. Credential stuffing — bot checking stolen passwords
→ Login endpoint: many failed attempts
→ Rotating IPs, but same user-agent
→ Attempts distributed across IPs but same timing pattern
3. Scrapers — taking your content
→ All product pages visited in rapid sequence
→ No CSS/image requests (headless browser or direct HTTP)
→ No session cookies carried between requests
4. Inventory manipulation
→ Add-to-cart without checkout (hoarding)
→ Price change events trigger immediate response
→ No "browse" behavior before purchase
5. Account creation abuse
→ Signup with disposable email domains
→ Same IP, slightly varied user data
→ No email verification follow-through
Fix 1: Rate Limiting With Multiple Dimensions
// Rate limit by IP, by user, and by endpoint — not just one dimension
import { RateLimiterRedis } from 'rate-limiter-flexible'
import { createClient } from 'redis'
const redis = createClient({ url: process.env.REDIS_URL })
// Burst limiter: short window, catches immediate spikes
const burstLimiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: 'rl_burst',
points: 30, // 30 requests
duration: 10, // per 10 seconds
blockDuration: 60, // Block for 60 seconds if exceeded
})
// Sustained limiter: longer window, catches sustained bots
const sustainedLimiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: 'rl_sustained',
points: 500, // 500 requests
duration: 3600, // per hour
blockDuration: 3600,
})
async function rateLimitMiddleware(req: Request, res: Response, next: NextFunction) {
const ip = req.ip
const userId = req.user?.id
try {
// Check both burst and sustained limits
await Promise.all([
burstLimiter.consume(ip),
sustainedLimiter.consume(ip),
userId ? burstLimiter.consume(`user_${userId}`) : Promise.resolve(),
])
next()
} catch (err) {
const retryAfter = Math.ceil((err as any).msBeforeNext / 1000) ?? 60
res.set({
'Retry-After': retryAfter,
'X-RateLimit-Limit': burstLimiter.points,
'X-RateLimit-Reset': new Date(Date.now() + (err as any).msBeforeNext).toISOString(),
})
return res.status(429).json({
error: 'Too many requests',
retryAfter,
})
}
}
Fix 2: Bot Fingerprinting
// Identify bots by their request characteristics
interface BotSignal {
signal: string
weight: number
}
function calculateBotScore(req: Request): number {
const signals: BotSignal[] = []
let score = 0
// Missing common browser headers
if (!req.headers['accept-language']) {
signals.push({ signal: 'no_accept_language', weight: 20 })
}
if (!req.headers['accept-encoding']) {
signals.push({ signal: 'no_accept_encoding', weight: 20 })
}
// Known bot user agents
const ua = req.headers['user-agent'] ?? ''
const botPatterns = [/bot/i, /crawler/i, /spider/i, /curl/i, /wget/i, /python-requests/i, /go-http/i]
if (botPatterns.some(p => p.test(ua))) {
signals.push({ signal: 'bot_user_agent', weight: 60 })
}
// Missing or wrong referer for page navigations
if (req.path.startsWith('/api/') && !req.headers['referer']) {
signals.push({ signal: 'no_referer_on_api', weight: 10 })
}
// Request timing too fast (< 50ms since last request from this IP)
const lastRequest = recentRequests.get(req.ip)
if (lastRequest && Date.now() - lastRequest < 50) {
signals.push({ signal: 'too_fast', weight: 30 })
}
score = signals.reduce((sum, s) => sum + s.weight, 0)
recentRequests.set(req.ip, Date.now())
if (score > 40) {
logger.warn({ ip: req.ip, score, signals }, 'High bot score detected')
}
return score
}
// Apply bot score to routing
app.use((req, res, next) => {
const botScore = calculateBotScore(req)
req.botScore = botScore
if (botScore >= 80) {
// High confidence bot — block or challenge
return res.status(403).json({ error: 'Request blocked' })
}
if (botScore >= 50) {
// Suspected bot — rate limit more aggressively
req.rateLimit = 'strict'
}
next()
})
Fix 3: Behavioral Analysis for Credential Stuffing
// credential-stuffing-detector.ts
// Multiple failed logins across many accounts from same IP cluster
interface LoginAttempt {
ip: string
email: string
success: boolean
timestamp: Date
}
async function detectCredentialStuffing(ip: string, email: string, success: boolean): Promise<void> {
await redis.lpush(`login_attempts:${ip}`, JSON.stringify({
email,
success,
timestamp: Date.now(),
}))
await redis.expire(`login_attempts:${ip}`, 3600)
const attempts = await redis.lrange(`login_attempts:${ip}`, 0, 99)
const parsed: LoginAttempt[] = attempts.map(a => JSON.parse(a))
// Credential stuffing pattern: many different emails, mostly failures
const uniqueEmails = new Set(parsed.map(a => a.email)).size
const failureRate = parsed.filter(a => !a.success).length / parsed.length
if (uniqueEmails > 10 && failureRate > 0.8) {
// This IP is credential stuffing
await redis.set(`blocked_ip:${ip}`, '1', { EX: 86400 }) // Block for 24 hours
await alerting.critical(`Credential stuffing detected from ${ip}: ${uniqueEmails} unique emails, ${(failureRate * 100).toFixed(0)}% failure rate`)
}
// Single account stuffing: too many failures on one account
const accountAttempts = parsed.filter(a => a.email === email)
if (accountAttempts.length > 10) {
// Lock the account temporarily
await redis.set(`account_locked:${email}`, '1', { EX: 900 }) // 15 minutes
logger.warn({ email, ip, attempts: accountAttempts.length }, 'Account temporarily locked')
}
}
Fix 4: Protect Scraping-Prone Endpoints
// Anti-scraping for product catalog or content pages
// Make it expensive to scrape without blocking legitimate users
// 1. Require a browser challenge for high-value pages
// (Use Cloudflare Turnstile, hCaptcha, or similar — not reCAPTCHA v2 which hurts UX)
// 2. Return data gradually — force pagination that bots find expensive
router.get('/api/products', async (req, res) => {
const page = parseInt(req.query.page as string) ?? 1
const limit = Math.min(parseInt(req.query.limit as string) ?? 20, 20) // Max 20 per page
// Add artificial delay for non-authenticated requests to raise scraping cost
if (!req.user && req.botScore > 30) {
await sleep(500 + Math.random() * 500) // 500-1000ms delay
}
const products = await db.query(
'SELECT id, name, price FROM products LIMIT $1 OFFSET $2',
[limit, (page - 1) * limit]
)
// Don't include next_page cursor in response for high bot-score requests
const nextPage = req.botScore < 30 ? page + 1 : undefined
res.json({ products: products.rows, nextPage })
})
// 3. Honeypot endpoint — only bots will visit it
router.get('/api/internal/all-products', (req, res) => {
// Any request here is a bot — log and block the IP
logger.warn({ ip: req.ip, userAgent: req.headers['user-agent'] }, 'Honeypot triggered')
redis.set(`blocked_ip:${req.ip}`, '1', { EX: 86400 })
res.status(404).json({ error: 'Not found' })
})
Fix 5: Cloudflare as the First Line of Defense
# cloudflare-rules.yaml — WAF rules to block obvious bots
# (Even on free plan, Cloudflare handles much of this automatically)
rules:
# Block known bad user agents
- name: Block known crawlers
expression: '(http.user_agent contains "python-requests") or
(http.user_agent contains "Go-http-client") or
(http.user_agent contains "curl") or
(http.user_agent eq "")'
action: block
# Rate limit login endpoint specifically
- name: Login rate limit
expression: 'http.request.uri.path eq "/api/auth/login"'
action: rate_limit
ratelimit:
requests_per_period: 5
period: 60
mitigation_timeout: 300
# Challenge high-risk countries if applicable to your business
- name: Challenge traffic from anonymizing proxies
expression: '(ip.geoip.asnum in {396982 14061 16276})' # Known VPN ASNs
action: managed_challenge
Bot Defense Checklist
- ✅ Rate limiting: burst (30 req/10s) + sustained (500/hour) per IP
- ✅ Bot fingerprinting scores requests by header patterns and timing
- ✅ Credential stuffing detection: block IPs with high failure rates across many accounts
- ✅ Account lockout after repeated failures (with CAPTCHA to unlock, not just time)
- ✅ Honeypot endpoints catch aggressive scrapers
- ✅ Cloudflare or equivalent WAF handles volumetric bot traffic at the edge
- ✅ Bot traffic metrics separated from legitimate traffic in dashboards
Conclusion
Bot traffic is a cost and reliability problem disguised as a security problem. The bots that scrape your product catalog are paying your RDS and egress bills. The credential stuffers are hammering your authentication service. The inventory hoarders are degrading your real users' experience. The defense is layered: rate limiting handles volume, bot fingerprinting handles suspicious clients, behavioral analysis catches sophisticated attacks, and Cloudflare catches everything else at the edge before it hits your servers. No single layer is sufficient — but all five together make your service economically uninteresting to scrape.