Security for Legal SaaS

Episode 14 · Module 4 · Transport Security

API Gateway Patterns and Rate Limiting

18 May 2026 · 9:11 · Security for Legal SaaS

0:00 9:11

As legal SaaS grows from monolith to microservices, security logic scatters. Alice and Dan cover API gateway architecture as a single enforcement point, rate limiting algorithms (fixed window, sliding window, token bucket, leaky bucket), per-user and per-endpoint limits, circuit breakers, and the legal-specific attack scenarios — document download abuse, search endpoint enumeration, and e-filing deadline denial — that make rate limiting a security control, not just a performance optimisation.

Today’s Lesson

Security for Legal SaaS — Episode 14: API Gateway Patterns and Rate Limiting

The Single Enforcement Point

As your legal SaaS grows from a monolith to multiple services, security logic scatters. Authentication checks in every microservice — each small, independently deployed backend service. Rate limiting reimplemented per endpoint. Logging inconsistent across teams. An API gateway consolidates these concerns into a single enforcement layer — every request passes through one point where policy is applied consistently.

Key stat: Akamai's State of the Internet report found that API attacks grew 109% year-over-year, with credential abuse and resource exhaustion as the primary vectors. A gateway with proper rate limiting eliminates the majority of these automated attacks.

For legal tech, where a single unthrottled API call might export thousands of privileged documents, the gateway isn’t a performance optimisation — it’s a security boundary.

Gateway as Security Layer

What the Gateway Enforces

Concern Without Gateway With Gateway
Authentication Each service validates tokens independently Gateway validates once; services trust internal traffic
Rate limiting Per-service, inconsistent, often missing Centralised, per-user, per-endpoint, global
Request validation Scattered schema checks Single schema enforcement point
Logging/audit Inconsistent formats, gaps Every request logged uniformly
TLS termination Each service manages certificates Gateway terminates TLS; internal traffic on private network
IP allowlisting Firewall rules per service Gateway-level enforcement

Kong, AWS API Gateway, Envoy, and Traefik are production-grade options — each with different tradeoffs between managed simplicity and configurability.

Rate Limiting Strategies

Rate limiting prevents resource exhaustion, credential stuffing, data scraping, and denial-of-service. The strategy you choose depends on what you’re protecting.

Algorithm Comparison

Algorithm Behaviour Best For
Fixed window Count requests per time window (e.g., 100/minute). Resets at window boundary. Simple, predictable limits
Sliding window Weighted average of current and previous window. Smooths boundary spikes. Most API rate limiting
Token bucket Tokens refill at steady rate; requests consume tokens. Allows bursts up to bucket size. Allowing legitimate traffic bursts
Leaky bucket Requests queue and drain at fixed rate. Excess drops. Smoothing traffic to backend

Fixed window edge case: A user sends 100 requests at 11:59:59 and 100 more at 12:00:01. Both pass a 100/minute fixed window — but 200 requests hit your backend in 2 seconds. The sliding window algorithm solves this by weighting the overlapping window.

The Token Bucket for Legal SaaS

The token bucket algorithm is ideal for legal applications because it accommodates legitimate bursts — a lawyer batch-downloading case files at the start of their day — while maintaining a steady-state limit:

- Bucket capacity: Maximum burst size (e.g., 50 document downloads)

- Refill rate: Sustained throughput (e.g., 10 downloads per minute)

- Result: A lawyer can immediately download 50 files, then continues at 10/minute

Per-User, Per-Endpoint, and Global Limits

A single rate limit is insufficient. Layer them:

Scope Example Purpose
Per-user 1000 API calls/hour per authenticated user Prevent compromised accounts from bulk exfiltration
Per-endpoint 10 document downloads/minute on /api/documents/{id}/download Protect expensive operations
Per-IP 100 unauthenticated requests/minute per IP Block credential stuffing before auth
Global 10,000 requests/second across all users Protect infrastructure from DDoS

Stripe’s rate limiting approach documents this multi-layer pattern in production — per-user limits for fairness, per-endpoint limits for protection, and global limits for stability.

Legal SaaS Attack Scenarios

Document Download Abuse

A compromised user account (or malicious insider) attempts to bulk-download all client files. Without per-endpoint rate limiting on the download API, thousands of privileged documents exfiltrate in minutes.

Mitigation: Per-user download limit of 50 documents/hour. Alert on any user exceeding 80% of the limit. Require step-up authentication (MFA re-verification) for bulk exports exceeding 20 documents in a session.

Search Endpoint Abuse

Full-text search endpoints are computationally expensive and reveal data through query patterns. An attacker systematically queries terms to map your document corpus without downloading files directly.

Mitigation: Rate limit search to 30 queries/minute per user. Log query patterns. Alert on systematic enumeration (sequential date ranges, alphabetical name lists). Consider differential privacy on result counts.

E-Filing Deadline Denial

An attacker targets your e-filing API with legitimate-looking but invalid requests during known deadline periods (discovery deadlines, motion filing windows). The goal isn’t data theft — it’s consuming your rate limit slots so legitimate filings fail.

Mitigation: Priority queues for authenticated users with pending deadlines. Separate rate limit pools for filing operations vs. general API calls. Circuit breakers that shed load from non-critical endpoints to protect filing pathways.

Circuit Breakers

When a downstream service fails, cascading retries can collapse your entire system. The circuit breaker pattern prevents this:

State Behaviour
Closed (normal) Requests flow through; failures counted
Open (tripped) Requests fail immediately without calling the downstream service
Half-open (testing) Limited requests probe whether the service has recovered

For legal SaaS: if your document storage service becomes slow, a circuit breaker prevents the API gateway from queuing thousands of requests that overwhelm memory. Users see a fast “service temporarily unavailable” instead of a 30-second timeout followed by an error.

Response Headers for Rate Limit Transparency

The IETF RateLimit header fields specification standardises how servers communicate rate limit status:

RateLimit-Limit: 100
RateLimit-Remaining: 67
RateLimit-Reset: 1620000000

Include these on every response. Legitimate clients use them to self-throttle. Your own frontend uses them to display “please wait” instead of error messages. GitHub’s API demonstrates this pattern well — clients can check remaining quota before making expensive calls.

Implementation Architecture

Recommended stack: - **Gateway:** Kong, AWS API Gateway, or Envoy (depending on cloud/self-hosted preference) - **Rate limit store:** Redis (atomic INCR with TTL) — Redis rate limiting patterns - **Counters:** Sliding window with per-user and per-endpoint dimensions - **Response:** 429 Too Many Requests with RateLimit headers and Retry-After - **Monitoring:** Alert when any user consistently hits 80% of their limit - **Bypass:** Internal service-to-service traffic bypasses user rate limits (uses separate auth; covered in Episode 15)

Conclusion

An API gateway is where security policy becomes enforceable infrastructure. Rate limiting protects against both external attacks and internal abuse — compromised accounts, malicious insiders, and automated scraping. For legal SaaS, where a single unthrottled endpoint can exfiltrate an entire case file repository, the gateway is not optional architecture — it’s a security control as fundamental as authentication.

Next episode: Service-to-Service Authentication — because your internal services need to prove their identity to each other too.

Sources & references

  1. NGINX, "What is an API Gateway?" Gateway architecture overview
  2. Akamai, "State of the Internet / Security Report." API attack trends and statistics
  3. Kong, "Kong Gateway Documentation." Open-source API gateway
  4. AWS, "Amazon API Gateway Developer Guide." Managed API gateway service
  5. Envoy Proxy, "Envoy Documentation." High-performance service proxy
  6. Traefik Labs, "Traefik Documentation." Cloud-native reverse proxy
  7. Cloudflare, "Counting Things." Sliding window rate limiting at scale
  8. Wikipedia, "Token bucket." Algorithm description and properties
  9. Stripe Engineering, "Scaling your API with rate limiters." Production rate limiting patterns
  10. OWASP, "API4:2023 Unrestricted Resource Consumption"
  11. Microsoft Azure, "Circuit Breaker pattern." Resilience pattern for distributed systems
  12. IETF, "RateLimit header fields for HTTP." Standardised rate limit response headers
  13. GitHub, "Rate limiting for the REST API." Production rate limit header implementation
  14. Redis, "Rate Limiting." Redis-based rate limiting patterns with atomic operations