Security Glossary

Rate Limiting

Rate limiting is a technique that controls the number of requests a client can make to an API or service within a defined time window, protecting against abuse, brute force attacks, and resource exhaustion.

Understanding Rate Limiting

Rate limiting serves multiple purposes: it prevents brute force attacks on authentication endpoints, protects against denial-of-service by limiting resource consumption, controls costs for APIs that charge per request, and ensures fair usage across all clients. Without rate limiting, a single malicious or misconfigured client can overwhelm your service.

Common algorithms include fixed window (count requests per fixed time period like 100 per minute), sliding window (smoother version that avoids burst spikes at window boundaries), token bucket (tokens are added at a fixed rate and consumed per request, allowing controlled bursts), and leaky bucket (requests are processed at a fixed rate, excess requests are queued or rejected).

Rate limits are typically identified by client IP address, API key, authenticated user, or a combination. IP-based limiting is simplest but can be bypassed with distributed requests and may block multiple users behind a NAT. User-based limiting is more accurate but requires authentication first. Multi-tier approaches apply different limits to different endpoints — a login endpoint might allow 5 attempts per minute while a search endpoint allows 100.

Rate limit responses should use HTTP 429 (Too Many Requests) with a Retry-After header indicating when the client can try again. Well-designed rate limiting includes informational headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-regulate rather than hitting the wall repeatedly.

Why This Matters for Vibe-Coded Apps

AI-generated APIs almost never include rate limiting because it is not part of the core functionality the developer asks for. This leaves vibe-coded applications vulnerable to brute force login attempts, API abuse, and cost explosions if using pay-per-request services. An unprotected authentication endpoint can be attacked with thousands of password attempts per second.

Adding rate limiting to a vibe-coded app is straightforward with middleware packages. For Express/NestJS, use express-rate-limit. For Next.js API routes, use a Redis-backed solution like @upstash/ratelimit. When reviewing your AI-generated app before deployment, adding rate limiting to authentication and sensitive endpoints should be near the top of your security checklist.

Real-World Examples

GitHub API Rate Limiting

GitHub implements a sophisticated rate limiting system: 60 requests per hour for unauthenticated users, 5,000 for authenticated users, and higher limits for GitHub Apps. This tiered approach balances accessibility with abuse prevention and serves as a model for API rate limiting design.

Cloudflare Rate Limiting

Cloudflare offers rate limiting at the CDN edge, blocking abusive traffic before it reaches origin servers. Their system identifies attacks based on request patterns and can apply different rules per URL path, HTTP method, or client characteristic, protecting millions of websites from automated abuse.

Instagram Brute Force via Rate Limit Bypass (2019)

A researcher discovered that Instagram's rate limiting on the login endpoint could be bypassed by rotating IP addresses and modifying request parameters. This allowed brute force attacks against accounts, leading Instagram to implement stricter multi-factor rate limiting that combined IP, device, and account-based restrictions.

Frequently Asked Questions

What rate limit values should I use?

It depends on your use case. For login endpoints, 5-10 attempts per minute per IP/user is standard. For general API endpoints, 60-100 requests per minute per user is common. For search or computationally expensive endpoints, lower limits may be appropriate. Start generous and tighten based on actual usage patterns. Monitor your rate limiting to ensure legitimate users are not being blocked.

Should I rate limit by IP or by user?

Use both. IP-based limiting catches unauthenticated abuse and brute force login attempts. User-based limiting prevents authenticated users from abusing their access. For login endpoints specifically, rate limit by both IP and target username to prevent credential stuffing attacks that distribute attempts across many IPs targeting one account.

How do I handle rate limiting behind a load balancer or CDN?

Use distributed rate limiting backed by a shared store like Redis, not in-memory counters. In-memory counters only track requests hitting one specific server instance. Redis-backed rate limiting (using libraries like @upstash/ratelimit or ioredis with custom logic) shares the counter across all instances. Also ensure you read the client IP from the correct header (X-Forwarded-For, CF-Connecting-IP) rather than the load balancer's IP.

Is rate limiting sufficient DDoS protection?

No. Application-level rate limiting protects against individual abusive clients but cannot handle volumetric DDoS attacks that flood your network infrastructure. For DDoS protection, you need network-level defenses like Cloudflare, AWS Shield, or similar services that can absorb and filter massive traffic volumes before they reach your servers. Rate limiting complements DDoS protection by handling application-layer abuse that gets through.

Is Your App Protected?

VAS automatically scans for vulnerabilities related to rate limiting and provides detailed remediation guidance. Our scanner targets issues common in AI-generated applications.

Scans from $5, results in minutes. Get actionable fixes tailored to your stack.

Get Starter Scan