Episode 38 · Module 8 · AI Security

LLM API Key Isolation and Inference Gateways

19 May 2026 · 7:59 · Security for Legal SaaS

7:59 7:59

In Episode 37, we established that AI should draft and humans should approve. This episode focuses on the infrastructure connecting your legal AI to the cloud providers that power it — specifically, the API keys that authenticate every request and the gateways that should sit between your application and those providers. An API key — a programmatic credential that proves your application's identity to a service — is the single token that authorises your application to call an LLM provider like OpenAI, Anthropic, or Google.

Today’s Lesson

Security for Legal SaaS — Episode 38: LLM API Key Isolation and Inference Gateways

One Key to Rule Them All — and That's the Problem

An API key — a programmatic credential that proves your application's identity to a service — is the single token that authorises your application to call an LLM provider like OpenAI, Anthropic, or Google. If that key is stolen, the attacker has everything they need to run up your bill, exfiltrate your prompts and responses, and potentially access any data flowing through the API.

The LLMjacking Threat

"LLMjacking" is a term coined by the Sysdig Threat Research Team to describe attackers stealing cloud credentials to hijack access to large language model services.¹ It follows the pattern of cryptojacking (stealing compute for cryptocurrency mining) and proxyjacking (reselling stolen bandwidth), but the economics are different — and worse.

Attack	What's Stolen	Typical Cost to Victim
Cryptojacking	GPU/CPU cycles	Electricity + cloud compute bills
Proxyjacking	Network bandwidth	Bandwidth overages
LLMjacking	LLM API access	$100,000+ per day at scale

Sysdig documented a 376% increase in credential theft specifically targeting AI services between Q4 2025 and Q1 2026.² The financial impact is staggering: one startup's monthly OpenAI bill went from $400 to $67,000 after their API key was exposed in a public GitHub repository for 11 days.³

Case study: LiteLLM Supply Chain Compromise (March 2026). LiteLLM — the most popular open-source LLM proxy in the Python ecosystem, with approximately 97 million monthly downloads — was compromised when attackers hijacked the maintainer's PyPI account. Version 1.82.8 included a malicious file that silently exfiltrated cloud credentials, SSH keys, and Kubernetes secrets. Because LiteLLM runs as a centralised API gateway, compromising it yielded not just cloud credentials but LLM API keys for OpenAI, Anthropic, Azure AI, and others simultaneously.⁴

Case study: Microsoft v. Storm-2139 (January 2025). Microsoft filed a civil lawsuit against a criminal syndicate called Storm-2139 that had industrialised LLMjacking across Azure, OpenAI, AWS Bedrock, Anthropic, Google Vertex AI, and Mistral. The syndicate built a custom tool called de3u that allowed users to generate images using stolen Azure API keys, with features designed to circumvent content safety filters.⁵

Why Law Firms Are Particularly Vulnerable

Legal AI deployments face specific risks that make API key isolation critical:

High-value prompts. Legal prompts contain privileged information, case strategy, client names, and confidential deal terms. A stolen key doesn't just cost money — it exposes every prompt and response flowing through it.
Long-lived keys. Many firms set up API keys once and forget them. Unlike passwords, API keys rarely expire by default, and few organisations rotate them regularly.
Shared across environments. Development, staging, and production environments often share a single API key — meaning a key leaked from a developer's laptop compromises the production system.
Embedded in code. Despite decades of warnings, API keys still appear in source code, configuration files, and CI/CD pipelines. GitHub's secret scanning detects millions of leaked secrets annually.⁶

Per-Environment Key Isolation

The first principle: every environment gets its own API key. Development, staging, and production must never share credentials.

Environment	Key Scope	Budget Limit	Monitoring
Development	Individual developer sandboxes	$50/day hard cap	Alert on any production model usage
Staging	Shared test environment	$200/day hard cap	Alert on usage exceeding test patterns
Production	Production application only	Based on projected usage + 20% buffer	Real-time anomaly detection

Each key should have the minimum permissions required. If your production application only calls GPT-4 for contract review, the production key should not have access to DALL-E, Whisper, or other models. This limits the blast radius if a key is compromised.⁷

The Inference Gateway Pattern

An inference gateway is a proxy layer that sits between your application and the LLM provider. Every request flows through the gateway, which adds authentication, logging, rate limiting, and policy enforcement before forwarding the request to the provider.

Your Application → Inference Gateway → LLM Provider (OpenAI, Anthropic, etc.)
                        ↓
                  Logs, rate limits,
                  budget controls,
                  PII scanning,
                  key rotation

What the Gateway Enforces

Control	Without Gateway	With Gateway
Authentication	Application holds raw API key	Application authenticates to gateway; gateway holds the provider key
Rate limiting	Provider's default limits only	Custom per-user, per-team, per-matter limits
Budget controls	Monthly bill surprise	Hard spending caps with real-time tracking
Logging	Whatever the provider retains	Full prompt/response logging under your control
PII scanning	None	Pre-flight redaction before data reaches the provider
Key rotation	Manual, risky	Gateway-level rotation; applications never see the provider key

The critical architectural point: your application code never holds the LLM provider's API key. It authenticates to the gateway using internal credentials (a service token, mTLS certificate, or OAuth client credentials — patterns we covered in Episode 15 and Episode 20). The gateway holds and manages the provider key. If your application is compromised, the attacker gets an internal token that only works through your gateway — not a raw OpenAI key they can use from anywhere.⁸

Budget Controls and Anomaly Detection

Inference gateways should enforce hard budget caps — not just alerts, but actual request rejection when spending exceeds thresholds. Track usage in real time and flag anomalies:

A user who normally makes 50 API calls per day suddenly making 5,000
API calls at 3 AM when no attorneys are working
Requests to models your application doesn't normally use
Prompt patterns that don't match your application's template structure

Overlap with secrets management (Episode 30): LLM API keys are high-rotation, high-value secrets. They belong in your secrets manager (HashiCorp Vault, AWS Secrets Manager, or equivalent), not in environment variables, configuration files, or — worst case — source code. The inference gateway retrieves keys from the secrets manager, and the keys are rotated on a schedule without any application changes.

Key Rotation Without Downtime

The gateway architecture makes key rotation simple. The process:

Generate a new API key from the provider
Add the new key to the gateway's configuration (both old and new keys are valid)
Verify the new key works by routing a percentage of traffic through it
Remove the old key from the gateway
Revoke the old key at the provider

At no point does the application need to change. It authenticates to the gateway with its internal credentials, which remain stable. The provider key rotation is entirely transparent.⁹

Open-Source and Commercial Options

Tool	Type	Key Feature
LiteLLM	Open-source proxy	Multi-provider support, budget tracking (note: verify package integrity post-2026 compromise)⁴
Portkey	Commercial gateway	Policy enforcement, caching, observability
Helicone	Commercial gateway	Usage analytics, prompt management, cost tracking
Kong AI Gateway	Commercial gateway	Enterprise API management with AI-specific plugins
Custom (nginx/Envoy + middleware)	Self-built	Full control, but maintenance burden

For firms that prefer to build their own, an Envoy or nginx reverse proxy with custom middleware for authentication, logging, and rate limiting is a viable approach. The tradeoff is maintenance overhead — the commercial gateways handle provider API changes, new model support, and billing integration automatically.¹⁰

What's Next

Episode 39 covers Redaction Pipelines for Cloud AI — how to strip sensitive information from prompts before they reach a cloud provider, and how to reassemble the redacted information when the response comes back. Because the best way to protect client data from cloud AI is to never send it in the first place.

Sources & Further Reading

Sources & references

Sysdig, LLMjacking: Stolen Cloud Credentials Used in New AI Attack (May 2024).
Sysdig, What is LLMjacking?.
Prompt Guardrails, LLMjacking: The $100K-Per-Day Attack Draining Enterprise AI Budgets.
Trend Micro, Your AI Gateway Was a Backdoor: Inside the LiteLLM Supply Chain Compromise (March 2026).
CSO Online, Microsoft Files Lawsuit Against LLMjacking Gang That Bypassed AI Safeguards (January 2025).
GitHub Blog, Secret Scanning.
API7.ai, How AI Gateways Enforce Security and Compliance for LLMs.
Noma Security, How an AI Agent Vulnerability in LangSmith Could Lead to Stolen API Keys.
BeyondScale, LLMjacking: AI API Key Theft Defense Guide.
DreamFactory, The LiteLLM Supply Chain Attack: A Complete Technical Breakdown.
Aikido, GPT-Proxy Backdoor in npm and PyPI Turns Servers into Chinese LLM Relays.

Alice: Welcome back to Security for Legal SaaS. I'm Alice.

Dan: And I'm Dan. Episode 38 — LLM API key isolation and inference gateways. Alice, I'll be honest, when I first saw this topic I thought — how much is there to say about an API key? It's just a password for a service, right?

Alice: It's a credential, yes — a long string of characters that proves your application is authorised to use a service like OpenAI or Anthropic. We covered API keys way back in Episode 3. But here's why this deserves its own episode: there's now a named attack category called LLMjacking, coined by Sysdig's threat research team. It's exactly what it sounds like — attackers steal your API key and hijack your access to cloud AI services. And the economics are brutal.

Dan: Mm. How brutal are we talking?

Alice: One startup had a $400-a-month OpenAI bill. Their API key ended up in a public GitHub repository for eleven days. The next invoice was $67,000. The attacker had been running commercial AI services through the stolen key for over a week. And that's a small case. Sysdig documented attacks costing over $100,000 per day when targeting enterprise accounts with access to models like GPT-4 or Claude Opus.

Dan: Right. And for a law firm, the cost isn't just the bill — it's every prompt that went through that key.

Alice: Exactly. Every prompt your legal AI sends contains something — a client name, a contract clause, a case strategy question, a privileged communication. If the attacker has your key, they can set up a proxy that logs every request and response flowing through it. The financial damage is real, but the data exposure is the existential risk.

Dan: Hmm. So how do these keys get stolen in the first place?

Alice: The usual ways. Hardcoded in source code that gets pushed to a public repository. Stored in environment variable files that get committed accidentally. Left in CI/CD pipeline configurations. Shared over Slack or email. And here's one that's newer — in March 2026, a popular open-source LLM proxy called LiteLLM was compromised. Attackers hijacked the maintainer's account on PyPI — that's the Python package repository — and pushed a version that silently exfiltrated cloud credentials, SSH keys, and API keys. LiteLLM had 97 million monthly downloads. Because it runs as a centralised gateway, compromising it gave the attackers access to LLM API keys for every provider its users connected to — OpenAI, Anthropic, Azure, all of them.

Dan: Mm. That's a supply chain attack on the infrastructure that's supposed to manage the keys.

Alice: Which brings us to the solution — and the irony. The right architecture is an inference gateway, which is conceptually similar to what LiteLLM does. The difference is how you deploy and secure it.

Dan: Yeah. Walk me through what an inference gateway actually is.

Alice: Think of it as a proxy layer between your application and the LLM provider. Your application never holds the OpenAI or Anthropic API key directly. Instead, it authenticates to your gateway using internal credentials — a service token, an mTLS certificate, the kind of internal authentication we covered in Episode 15. The gateway holds and manages the provider's API key. Every request flows through the gateway, which adds logging, rate limiting, budget controls, and policy enforcement before forwarding the request to the provider.

Dan: Mm-hmm. So if someone compromises your application, they get an internal token that only works through your gateway — not a raw API key they can use from anywhere?

Alice: Exactly. And the gateway gives you controls you don't have when your application talks directly to the provider. Hard budget caps — not just alerts, but actual request rejection when spending exceeds a threshold. Per-user and per-team rate limits. Anomaly detection — someone making five thousand API calls at 3 AM when no attorneys are working should trigger an alert. And critically, the gateway is where you can intercept prompts and scan them for sensitive information before they ever leave your network. That's the redaction pipeline we'll cover next episode.

Dan: Right. What about key rotation? I know from Episode 30 on secrets management that you should rotate credentials regularly. How does that work when your whole AI pipeline depends on one API key?

Alice: This is one of the biggest advantages of the gateway pattern. Key rotation happens at the gateway level, completely invisible to your application. You generate a new key from the provider, add it to the gateway configuration alongside the old one, verify it works by routing some traffic through it, then remove the old key and revoke it. Your application code doesn't change at all — it still authenticates to the gateway with the same internal credentials. Compare that to rotating a key that's hardcoded in fifteen different microservices. That's a deployment event. With a gateway, it's a configuration change.

Dan: Yeah. So the first principle is per-environment isolation?

Alice: <sigh> It should be obvious, but it still isn't in practice. Development, staging, and production must each have their own API key. A developer's sandbox key should have a hard cap of maybe fifty dollars a day and should not have access to production-grade models. If a developer's key leaks — and developer keys leak constantly — the blast radius is a small bill and no exposure to production data. The production key should be locked down with the minimum permissions required. If your application only uses one specific model for contract review, the key shouldn't have access to image generation, speech-to-text, or any other service.

Dan: Mm. Principle of least privilege, applied to AI services.

Alice: Exactly what we covered in Episode 8, now applied to a new category of credential. And store these keys in your secrets manager — HashiCorp Vault, AWS Secrets Manager, whatever your firm uses. Not in environment variables. Not in configuration files. Definitely not in source code. The gateway retrieves keys from the secrets manager at runtime, and the keys are rotated on a schedule. Microsoft filed a lawsuit in January 2025 against a criminal syndicate called Storm-2139 that had built an entire commercial hacking-as-a-service operation using stolen Azure API keys. This is organised crime, not script kiddies.

Dan: Hmm. For someone building a legal AI tool today — maybe a contract review platform or a research assistant — what's the minimum they should implement?

Alice: Three things. First, separate keys per environment with hard budget caps on each. Second, an inference gateway — even a simple one using an nginx reverse proxy with basic auth, logging, and rate limiting. Third, never put the provider's API key in your application code or anywhere a developer can see it in plaintext. Those three things would have prevented most of the LLMjacking incidents we've seen.

Dan: Next episode — Redaction Pipelines for Cloud AI. How to strip sensitive data from prompts before they leave your network.

Alice: Until then, I'm Alice.

Dan: And I'm Dan.

Alice: Security for Legal SaaS is a series written with AI assistance. Alice and Dan are AI-generated voices — no professional advice here, just education.

Security for Legal SaaS is a series written with AI assistance. Alice and Dan are AI-generated voices — no professional advice here, just education.