Today’s Lesson
Security for Legal SaaS — Episode 14: API Gateway Patterns and Rate Limiting
The Single Enforcement Point
As your legal SaaS grows from a monolith to multiple services, security logic scatters. Authentication checks in every microservice — each small, independently deployed backend service. Rate limiting reimplemented per endpoint. Logging inconsistent across teams. An API gateway consolidates these concerns into a single enforcement layer — every request passes through one point where policy is applied consistently.
Key stat: Akamai's State of the Internet report found that API attacks grew 109% year-over-year, with credential abuse and resource exhaustion as the primary vectors. A gateway with proper rate limiting eliminates the majority of these automated attacks.
For legal tech, where a single unthrottled API call might export thousands of privileged documents, the gateway isn’t a performance optimisation — it’s a security boundary.
Gateway as Security Layer
What the Gateway Enforces
| Concern | Without Gateway | With Gateway |
|---|---|---|
| Authentication | Each service validates tokens independently | Gateway validates once; services trust internal traffic |
| Rate limiting | Per-service, inconsistent, often missing | Centralised, per-user, per-endpoint, global |
| Request validation | Scattered schema checks | Single schema enforcement point |
| Logging/audit | Inconsistent formats, gaps | Every request logged uniformly |
| TLS termination | Each service manages certificates | Gateway terminates TLS; internal traffic on private network |
| IP allowlisting | Firewall rules per service | Gateway-level enforcement |
Kong, AWS API Gateway, Envoy, and Traefik are production-grade options — each with different tradeoffs between managed simplicity and configurability.
Rate Limiting Strategies
Rate limiting prevents resource exhaustion, credential stuffing, data scraping, and denial-of-service. The strategy you choose depends on what you’re protecting.
Algorithm Comparison
| Algorithm | Behaviour | Best For |
|---|---|---|
| Fixed window | Count requests per time window (e.g., 100/minute). Resets at window boundary. | Simple, predictable limits |
| Sliding window | Weighted average of current and previous window. Smooths boundary spikes. | Most API rate limiting |
| Token bucket | Tokens refill at steady rate; requests consume tokens. Allows bursts up to bucket size. | Allowing legitimate traffic bursts |
| Leaky bucket | Requests queue and drain at fixed rate. Excess drops. | Smoothing traffic to backend |
Fixed window edge case: A user sends 100 requests at 11:59:59 and 100 more at 12:00:01. Both pass a 100/minute fixed window — but 200 requests hit your backend in 2 seconds. The sliding window algorithm solves this by weighting the overlapping window.
The Token Bucket for Legal SaaS
The token bucket algorithm is ideal for legal applications because it accommodates legitimate bursts — a lawyer batch-downloading case files at the start of their day — while maintaining a steady-state limit:
- Bucket capacity: Maximum burst size (e.g., 50 document downloads)
- Refill rate: Sustained throughput (e.g., 10 downloads per minute)
- Result: A lawyer can immediately download 50 files, then continues at 10/minute
Per-User, Per-Endpoint, and Global Limits
A single rate limit is insufficient. Layer them:
| Scope | Example | Purpose |
|---|---|---|
| Per-user | 1000 API calls/hour per authenticated user | Prevent compromised accounts from bulk exfiltration |
| Per-endpoint | 10 document downloads/minute on /api/documents/{id}/download |
Protect expensive operations |
| Per-IP | 100 unauthenticated requests/minute per IP | Block credential stuffing before auth |
| Global | 10,000 requests/second across all users | Protect infrastructure from DDoS |
Stripe’s rate limiting approach documents this multi-layer pattern in production — per-user limits for fairness, per-endpoint limits for protection, and global limits for stability.
Legal SaaS Attack Scenarios
Document Download Abuse
A compromised user account (or malicious insider) attempts to bulk-download all client files. Without per-endpoint rate limiting on the download API, thousands of privileged documents exfiltrate in minutes.
Mitigation: Per-user download limit of 50 documents/hour. Alert on any user exceeding 80% of the limit. Require step-up authentication (MFA re-verification) for bulk exports exceeding 20 documents in a session.
Search Endpoint Abuse
Full-text search endpoints are computationally expensive and reveal data through query patterns. An attacker systematically queries terms to map your document corpus without downloading files directly.
Mitigation: Rate limit search to 30 queries/minute per user. Log query patterns. Alert on systematic enumeration (sequential date ranges, alphabetical name lists). Consider differential privacy on result counts.
E-Filing Deadline Denial
An attacker targets your e-filing API with legitimate-looking but invalid requests during known deadline periods (discovery deadlines, motion filing windows). The goal isn’t data theft — it’s consuming your rate limit slots so legitimate filings fail.
Mitigation: Priority queues for authenticated users with pending deadlines. Separate rate limit pools for filing operations vs. general API calls. Circuit breakers that shed load from non-critical endpoints to protect filing pathways.
Circuit Breakers
When a downstream service fails, cascading retries can collapse your entire system. The circuit breaker pattern prevents this:
| State | Behaviour |
|---|---|
| Closed (normal) | Requests flow through; failures counted |
| Open (tripped) | Requests fail immediately without calling the downstream service |
| Half-open (testing) | Limited requests probe whether the service has recovered |
For legal SaaS: if your document storage service becomes slow, a circuit breaker prevents the API gateway from queuing thousands of requests that overwhelm memory. Users see a fast “service temporarily unavailable” instead of a 30-second timeout followed by an error.
Response Headers for Rate Limit Transparency
The IETF RateLimit header fields specification standardises how servers communicate rate limit status:
RateLimit-Limit: 100
RateLimit-Remaining: 67
RateLimit-Reset: 1620000000
Include these on every response. Legitimate clients use them to self-throttle. Your own frontend uses them to display “please wait” instead of error messages. GitHub’s API demonstrates this pattern well — clients can check remaining quota before making expensive calls.
Implementation Architecture
Recommended stack: - **Gateway:** Kong, AWS API Gateway, or Envoy (depending on cloud/self-hosted preference) - **Rate limit store:** Redis (atomic INCR with TTL) — Redis rate limiting patterns - **Counters:** Sliding window with per-user and per-endpoint dimensions - **Response:** 429 Too Many Requests with RateLimit headers and Retry-After - **Monitoring:** Alert when any user consistently hits 80% of their limit - **Bypass:** Internal service-to-service traffic bypasses user rate limits (uses separate auth; covered in Episode 15)
Conclusion
An API gateway is where security policy becomes enforceable infrastructure. Rate limiting protects against both external attacks and internal abuse — compromised accounts, malicious insiders, and automated scraping. For legal SaaS, where a single unthrottled endpoint can exfiltrate an entire case file repository, the gateway is not optional architecture — it’s a security control as fundamental as authentication.
Next episode: Service-to-Service Authentication — because your internal services need to prove their identity to each other too.