Today’s Lesson
Security for Legal SaaS — Episode 38: LLM API Key Isolation and Inference Gateways
One Key to Rule Them All — and That's the Problem
In Episode 37, we established that AI should draft and humans should approve. This episode focuses on the infrastructure connecting your legal AI to the cloud providers that power it — specifically, the API keys that authenticate every request and the gateways that should sit between your application and those providers.
An API key — a programmatic credential that proves your application's identity to a service — is the single token that authorises your application to call an LLM provider like OpenAI, Anthropic, or Google. If that key is stolen, the attacker has everything they need to run up your bill, exfiltrate your prompts and responses, and potentially access any data flowing through the API.
The LLMjacking Threat
"LLMjacking" is a term coined by the Sysdig Threat Research Team to describe attackers stealing cloud credentials to hijack access to large language model services.1 It follows the pattern of cryptojacking (stealing compute for cryptocurrency mining) and proxyjacking (reselling stolen bandwidth), but the economics are different — and worse.
| Attack | What's Stolen | Typical Cost to Victim |
|---|---|---|
| Cryptojacking | GPU/CPU cycles | Electricity + cloud compute bills |
| Proxyjacking | Network bandwidth | Bandwidth overages |
| LLMjacking | LLM API access | $100,000+ per day at scale |
Sysdig documented a 376% increase in credential theft specifically targeting AI services between Q4 2025 and Q1 2026.2 The financial impact is staggering: one startup's monthly OpenAI bill went from $400 to $67,000 after their API key was exposed in a public GitHub repository for 11 days.3
Case study: LiteLLM Supply Chain Compromise (March 2026). LiteLLM — the most popular open-source LLM proxy in the Python ecosystem, with approximately 97 million monthly downloads — was compromised when attackers hijacked the maintainer's PyPI account. Version 1.82.8 included a malicious file that silently exfiltrated cloud credentials, SSH keys, and Kubernetes secrets. Because LiteLLM runs as a centralised API gateway, compromising it yielded not just cloud credentials but LLM API keys for OpenAI, Anthropic, Azure AI, and others simultaneously.4
Case study: Microsoft v. Storm-2139 (January 2025). Microsoft filed a civil lawsuit against a criminal syndicate called Storm-2139 that had industrialised LLMjacking across Azure, OpenAI, AWS Bedrock, Anthropic, Google Vertex AI, and Mistral. The syndicate built a custom tool called de3u that allowed users to generate images using stolen Azure API keys, with features designed to circumvent content safety filters.5
Why Law Firms Are Particularly Vulnerable
Legal AI deployments face specific risks that make API key isolation critical:
- High-value prompts. Legal prompts contain privileged information, case strategy, client names, and confidential deal terms. A stolen key doesn't just cost money — it exposes every prompt and response flowing through it.
- Long-lived keys. Many firms set up API keys once and forget them. Unlike passwords, API keys rarely expire by default, and few organisations rotate them regularly.
- Shared across environments. Development, staging, and production environments often share a single API key — meaning a key leaked from a developer's laptop compromises the production system.
- Embedded in code. Despite decades of warnings, API keys still appear in source code, configuration files, and CI/CD pipelines. GitHub's secret scanning detects millions of leaked secrets annually.6
Per-Environment Key Isolation
The first principle: every environment gets its own API key. Development, staging, and production must never share credentials.
| Environment | Key Scope | Budget Limit | Monitoring |
|---|---|---|---|
| Development | Individual developer sandboxes | $50/day hard cap | Alert on any production model usage |
| Staging | Shared test environment | $200/day hard cap | Alert on usage exceeding test patterns |
| Production | Production application only | Based on projected usage + 20% buffer | Real-time anomaly detection |
Each key should have the minimum permissions required. If your production application only calls GPT-4 for contract review, the production key should not have access to DALL-E, Whisper, or other models. This limits the blast radius if a key is compromised.7
The Inference Gateway Pattern
An inference gateway is a proxy layer that sits between your application and the LLM provider. Every request flows through the gateway, which adds authentication, logging, rate limiting, and policy enforcement before forwarding the request to the provider.
Your Application → Inference Gateway → LLM Provider (OpenAI, Anthropic, etc.)
↓
Logs, rate limits,
budget controls,
PII scanning,
key rotation
What the Gateway Enforces
| Control | Without Gateway | With Gateway |
|---|---|---|
| Authentication | Application holds raw API key | Application authenticates to gateway; gateway holds the provider key |
| Rate limiting | Provider's default limits only | Custom per-user, per-team, per-matter limits |
| Budget controls | Monthly bill surprise | Hard spending caps with real-time tracking |
| Logging | Whatever the provider retains | Full prompt/response logging under your control |
| PII scanning | None | Pre-flight redaction before data reaches the provider |
| Key rotation | Manual, risky | Gateway-level rotation; applications never see the provider key |
The critical architectural point: your application code never holds the LLM provider's API key. It authenticates to the gateway using internal credentials (a service token, mTLS certificate, or OAuth client credentials — patterns we covered in Episode 15 and Episode 20). The gateway holds and manages the provider key. If your application is compromised, the attacker gets an internal token that only works through your gateway — not a raw OpenAI key they can use from anywhere.8
Budget Controls and Anomaly Detection
Inference gateways should enforce hard budget caps — not just alerts, but actual request rejection when spending exceeds thresholds. Track usage in real time and flag anomalies:
- A user who normally makes 50 API calls per day suddenly making 5,000
- API calls at 3 AM when no attorneys are working
- Requests to models your application doesn't normally use
- Prompt patterns that don't match your application's template structure
Overlap with secrets management (Episode 30): LLM API keys are high-rotation, high-value secrets. They belong in your secrets manager (HashiCorp Vault, AWS Secrets Manager, or equivalent), not in environment variables, configuration files, or — worst case — source code. The inference gateway retrieves keys from the secrets manager, and the keys are rotated on a schedule without any application changes.
Key Rotation Without Downtime
The gateway architecture makes key rotation simple. The process:
- Generate a new API key from the provider
- Add the new key to the gateway's configuration (both old and new keys are valid)
- Verify the new key works by routing a percentage of traffic through it
- Remove the old key from the gateway
- Revoke the old key at the provider
At no point does the application need to change. It authenticates to the gateway with its internal credentials, which remain stable. The provider key rotation is entirely transparent.9
Open-Source and Commercial Options
| Tool | Type | Key Feature |
|---|---|---|
| LiteLLM | Open-source proxy | Multi-provider support, budget tracking (note: verify package integrity post-2026 compromise)4 |
| Portkey | Commercial gateway | Policy enforcement, caching, observability |
| Helicone | Commercial gateway | Usage analytics, prompt management, cost tracking |
| Kong AI Gateway | Commercial gateway | Enterprise API management with AI-specific plugins |
| Custom (nginx/Envoy + middleware) | Self-built | Full control, but maintenance burden |
For firms that prefer to build their own, an Envoy or nginx reverse proxy with custom middleware for authentication, logging, and rate limiting is a viable approach. The tradeoff is maintenance overhead — the commercial gateways handle provider API changes, new model support, and billing integration automatically.10
What's Next
Episode 39 covers Redaction Pipelines for Cloud AI — how to strip sensitive information from prompts before they reach a cloud provider, and how to reassemble the redacted information when the response comes back. Because the best way to protect client data from cloud AI is to never send it in the first place.
Sources & Further Reading
Sources & references
- Sysdig, LLMjacking: Stolen Cloud Credentials Used in New AI Attack (May 2024).
- Sysdig, What is LLMjacking?.
- Prompt Guardrails, LLMjacking: The $100K-Per-Day Attack Draining Enterprise AI Budgets.
- Trend Micro, Your AI Gateway Was a Backdoor: Inside the LiteLLM Supply Chain Compromise (March 2026).
- CSO Online, Microsoft Files Lawsuit Against LLMjacking Gang That Bypassed AI Safeguards (January 2025).
- GitHub Blog, Secret Scanning.
- API7.ai, How AI Gateways Enforce Security and Compliance for LLMs.
- Noma Security, How an AI Agent Vulnerability in LangSmith Could Lead to Stolen API Keys.
- BeyondScale, LLMjacking: AI API Key Theft Defense Guide.
- DreamFactory, The LiteLLM Supply Chain Attack: A Complete Technical Breakdown.
- Aikido, GPT-Proxy Backdoor in npm and PyPI Turns Servers into Chinese LLM Relays.