Security for Legal SaaS

Episode 38 · Module 8 · AI Security

LLM API Key Isolation and Inference Gateways

19 May 2026 · 7:59 · Security for Legal SaaS

7:59 7:59

In Episode 37, we established that AI should draft and humans should approve. This episode focuses on the infrastructure connecting your legal AI to the cloud providers that power it — specifically, the API keys that authenticate every request and the gateways that should sit between your application and those providers. An API key — a programmatic credential that proves your application's identity to a service — is the single token that authorises your application to call an LLM provider like OpenAI, Anthropic, or Google.

Today’s Lesson

Security for Legal SaaS — Episode 38: LLM API Key Isolation and Inference Gateways

One Key to Rule Them All — and That's the Problem

In Episode 37, we established that AI should draft and humans should approve. This episode focuses on the infrastructure connecting your legal AI to the cloud providers that power it — specifically, the API keys that authenticate every request and the gateways that should sit between your application and those providers.

An API key — a programmatic credential that proves your application's identity to a service — is the single token that authorises your application to call an LLM provider like OpenAI, Anthropic, or Google. If that key is stolen, the attacker has everything they need to run up your bill, exfiltrate your prompts and responses, and potentially access any data flowing through the API.

The LLMjacking Threat

"LLMjacking" is a term coined by the Sysdig Threat Research Team to describe attackers stealing cloud credentials to hijack access to large language model services.1 It follows the pattern of cryptojacking (stealing compute for cryptocurrency mining) and proxyjacking (reselling stolen bandwidth), but the economics are different — and worse.

Attack What's Stolen Typical Cost to Victim
Cryptojacking GPU/CPU cycles Electricity + cloud compute bills
Proxyjacking Network bandwidth Bandwidth overages
LLMjacking LLM API access $100,000+ per day at scale

Sysdig documented a 376% increase in credential theft specifically targeting AI services between Q4 2025 and Q1 2026.2 The financial impact is staggering: one startup's monthly OpenAI bill went from $400 to $67,000 after their API key was exposed in a public GitHub repository for 11 days.3

Case study: LiteLLM Supply Chain Compromise (March 2026). LiteLLM — the most popular open-source LLM proxy in the Python ecosystem, with approximately 97 million monthly downloads — was compromised when attackers hijacked the maintainer's PyPI account. Version 1.82.8 included a malicious file that silently exfiltrated cloud credentials, SSH keys, and Kubernetes secrets. Because LiteLLM runs as a centralised API gateway, compromising it yielded not just cloud credentials but LLM API keys for OpenAI, Anthropic, Azure AI, and others simultaneously.4
Case study: Microsoft v. Storm-2139 (January 2025). Microsoft filed a civil lawsuit against a criminal syndicate called Storm-2139 that had industrialised LLMjacking across Azure, OpenAI, AWS Bedrock, Anthropic, Google Vertex AI, and Mistral. The syndicate built a custom tool called de3u that allowed users to generate images using stolen Azure API keys, with features designed to circumvent content safety filters.5

Why Law Firms Are Particularly Vulnerable

Legal AI deployments face specific risks that make API key isolation critical:

  1. High-value prompts. Legal prompts contain privileged information, case strategy, client names, and confidential deal terms. A stolen key doesn't just cost money — it exposes every prompt and response flowing through it.
  2. Long-lived keys. Many firms set up API keys once and forget them. Unlike passwords, API keys rarely expire by default, and few organisations rotate them regularly.
  3. Shared across environments. Development, staging, and production environments often share a single API key — meaning a key leaked from a developer's laptop compromises the production system.
  4. Embedded in code. Despite decades of warnings, API keys still appear in source code, configuration files, and CI/CD pipelines. GitHub's secret scanning detects millions of leaked secrets annually.6

Per-Environment Key Isolation

The first principle: every environment gets its own API key. Development, staging, and production must never share credentials.

Environment Key Scope Budget Limit Monitoring
Development Individual developer sandboxes $50/day hard cap Alert on any production model usage
Staging Shared test environment $200/day hard cap Alert on usage exceeding test patterns
Production Production application only Based on projected usage + 20% buffer Real-time anomaly detection

Each key should have the minimum permissions required. If your production application only calls GPT-4 for contract review, the production key should not have access to DALL-E, Whisper, or other models. This limits the blast radius if a key is compromised.7

The Inference Gateway Pattern

An inference gateway is a proxy layer that sits between your application and the LLM provider. Every request flows through the gateway, which adds authentication, logging, rate limiting, and policy enforcement before forwarding the request to the provider.

Your Application → Inference Gateway → LLM Provider (OpenAI, Anthropic, etc.)
                        ↓
                  Logs, rate limits,
                  budget controls,
                  PII scanning,
                  key rotation

What the Gateway Enforces

Control Without Gateway With Gateway
Authentication Application holds raw API key Application authenticates to gateway; gateway holds the provider key
Rate limiting Provider's default limits only Custom per-user, per-team, per-matter limits
Budget controls Monthly bill surprise Hard spending caps with real-time tracking
Logging Whatever the provider retains Full prompt/response logging under your control
PII scanning None Pre-flight redaction before data reaches the provider
Key rotation Manual, risky Gateway-level rotation; applications never see the provider key

The critical architectural point: your application code never holds the LLM provider's API key. It authenticates to the gateway using internal credentials (a service token, mTLS certificate, or OAuth client credentials — patterns we covered in Episode 15 and Episode 20). The gateway holds and manages the provider key. If your application is compromised, the attacker gets an internal token that only works through your gateway — not a raw OpenAI key they can use from anywhere.8

Budget Controls and Anomaly Detection

Inference gateways should enforce hard budget caps — not just alerts, but actual request rejection when spending exceeds thresholds. Track usage in real time and flag anomalies:

Overlap with secrets management (Episode 30): LLM API keys are high-rotation, high-value secrets. They belong in your secrets manager (HashiCorp Vault, AWS Secrets Manager, or equivalent), not in environment variables, configuration files, or — worst case — source code. The inference gateway retrieves keys from the secrets manager, and the keys are rotated on a schedule without any application changes.

Key Rotation Without Downtime

The gateway architecture makes key rotation simple. The process:

  1. Generate a new API key from the provider
  2. Add the new key to the gateway's configuration (both old and new keys are valid)
  3. Verify the new key works by routing a percentage of traffic through it
  4. Remove the old key from the gateway
  5. Revoke the old key at the provider

At no point does the application need to change. It authenticates to the gateway with its internal credentials, which remain stable. The provider key rotation is entirely transparent.9

Open-Source and Commercial Options

Tool Type Key Feature
LiteLLM Open-source proxy Multi-provider support, budget tracking (note: verify package integrity post-2026 compromise)4
Portkey Commercial gateway Policy enforcement, caching, observability
Helicone Commercial gateway Usage analytics, prompt management, cost tracking
Kong AI Gateway Commercial gateway Enterprise API management with AI-specific plugins
Custom (nginx/Envoy + middleware) Self-built Full control, but maintenance burden

For firms that prefer to build their own, an Envoy or nginx reverse proxy with custom middleware for authentication, logging, and rate limiting is a viable approach. The tradeoff is maintenance overhead — the commercial gateways handle provider API changes, new model support, and billing integration automatically.10

What's Next

Episode 39 covers Redaction Pipelines for Cloud AI — how to strip sensitive information from prompts before they reach a cloud provider, and how to reassemble the redacted information when the response comes back. Because the best way to protect client data from cloud AI is to never send it in the first place.

Sources & Further Reading

Sources & references

  1. Sysdig, LLMjacking: Stolen Cloud Credentials Used in New AI Attack (May 2024).
  2. Sysdig, What is LLMjacking?.
  3. Prompt Guardrails, LLMjacking: The $100K-Per-Day Attack Draining Enterprise AI Budgets.
  4. Trend Micro, Your AI Gateway Was a Backdoor: Inside the LiteLLM Supply Chain Compromise (March 2026).
  5. CSO Online, Microsoft Files Lawsuit Against LLMjacking Gang That Bypassed AI Safeguards (January 2025).
  6. GitHub Blog, Secret Scanning.
  7. API7.ai, How AI Gateways Enforce Security and Compliance for LLMs.
  8. Noma Security, How an AI Agent Vulnerability in LangSmith Could Lead to Stolen API Keys.
  9. BeyondScale, LLMjacking: AI API Key Theft Defense Guide.
  10. DreamFactory, The LiteLLM Supply Chain Attack: A Complete Technical Breakdown.
  11. Aikido, GPT-Proxy Backdoor in npm and PyPI Turns Servers into Chinese LLM Relays.