Security for Legal SaaS

Episode 40 · Module 8 · AI Security

Local vs. Cloud AI — Security Boundaries

19 May 2026 · 8:30 · Security for Legal SaaS

8:30 8:30

In Episode 39, we built a redaction pipeline to strip sensitive data before sending prompts to cloud AI. This episode asks the next logical question: what if you skip the cloud entirely and run the AI model on your own hardware? Local AI deployment — running large language models on servers you own and control — is increasingly practical. Models like Llama, Mistral, Qwen, and Gemma can run on consumer-grade GPUs. For legal AI handling the most sensitive data — privileged communications, litigation strategy, government investigation materials — keeping everything within your network perimeter can eliminate entire categories of risk.

Today’s Lesson

Security for Legal SaaS — Episode 40: Local vs. Cloud AI — Security Boundaries

Your Data Never Leaves. But "Local" Has Its Own Risks.

In Episode 39, we built a redaction pipeline to strip sensitive data before sending prompts to cloud AI. This episode asks the next logical question: what if you skip the cloud entirely and run the AI model on your own hardware?

Local AI deployment — running large language models on servers you own and control — is increasingly practical. Models like Llama, Mistral, Qwen, and Gemma can run on consumer-grade GPUs. For legal AI handling the most sensitive data — privileged communications, litigation strategy, government investigation materials — keeping everything within your network perimeter can eliminate entire categories of risk. But "local" introduces a different threat surface, and understanding the tradeoffs is the difference between genuine security and security theatre.

Cloud AI: What You Gain and What You Give Up

When you call a cloud LLM provider — OpenAI, Anthropic, Google, AWS Bedrock — your data traverses networks and is processed on infrastructure outside your control. Even with contractual protections (data processing agreements, zero-retention policies, SOC 2 attestations), several fundamental risks remain:1

Cloud Risk Description
Data in transit Prompts and responses travel over the public internet (encrypted, but still)
Provider-side breach Your data is on someone else's servers; their breach is your breach
Subpoena exposure A subpoena served on the provider could capture your data
Jurisdictional issues Data may be processed in regions with different privacy laws
Model training risk Without zero-retention agreements, data may be used to improve models
Third-party access Provider employees with administrative access could theoretically access data

What you gain is substantial: state-of-the-art model quality, automatic scaling, zero infrastructure management, and access to the largest and most capable models that require hundreds of GPUs to run.

Local AI: What You Gain and What You Give Up

Running a model locally means your data never leaves your network. No prompts cross the internet. No third party processes your privileged communications. For compliance and privilege protection, this is the strongest possible posture.2

But the threat surface shifts — it doesn't disappear:

Local Risk Description
You own the patching Model vulnerabilities, OS patches, driver updates — all your responsibility
Physical security The hardware running the model needs the same physical protection as your file servers
Model provenance Who built the model? Was the model file tampered with between download and deployment?
Network exposure A locally deployed model with a network-accessible API is an attack surface
Insider threats Internal users with access to the model endpoint can extract data through the same techniques we covered in EP36 (model inversion, membership inference)
Capability gap Smaller models make more errors; lower quality increases the risk of incorrect legal analysis
Operational burden GPU procurement, cooling, power, monitoring, failover — all on you
Vitalik Buterin's local LLM setup (April 2026): Ethereum founder Vitalik Buterin published his personal local AI security setup, arguing that for sensitive personal data, "running the model locally means the data never leaves your machine" — but noting that local deployment requires ongoing vigilance around model integrity, network isolation, and access controls.3

The Hybrid Architecture

The most practical approach for legal AI is hybrid: route sensitive data to local models and general tasks to cloud AI. This combines the privacy of local inference with the capability of cloud models.

Data Category Routing Rationale
Privileged communications Local only Privilege waiver risk from cloud exposure
Litigation strategy memos Local only Highest-sensitivity work product
Client PII (names, financials) Local, or cloud with redaction (EP39) Confidentiality obligations
General legal research Cloud (no client data in prompt) Public information, no confidentiality risk
Document formatting/structure Cloud (with redaction) Low-sensitivity task, benefits from larger models
Contract template analysis Cloud (anonymised) Templates contain no client-specific data

Decision Framework

Ask four questions for each AI task:

  1. Data sensitivity: Does the prompt contain privileged, confidential, or personally identifiable information?
  2. Model capability: Does the task require a frontier model (GPT-4, Claude Opus), or can a smaller local model handle it?
  3. Compliance mandate: Do applicable regulations (GDPR, HIPAA, data localisation laws) require data to stay within your jurisdiction?
  4. Acceptable risk: If the data were exposed, what is the worst-case consequence?

If the answer to question 1 is "yes" and the answer to question 2 is "a local model can handle it," the choice is clear: run locally.4

Securing Local Inference

"Local" does not mean "automatically secure." A model running on your network needs the same security controls as any other service:

Network Isolation

The local model endpoint should not be accessible from the public internet. Deploy it in an isolated network segment — the same principle we covered in Episode 15 on network segmentation. Only your application servers should be able to reach the model API. Use mTLS (mutual TLS, from Episode 15) to authenticate clients connecting to the model endpoint.5

Access Controls

Not every user in your organisation should have access to the local model. Apply role-based access controls. Log every inference request with the authenticated identity of the requester, the prompt content (encrypted), and the response. These logs feed into the audit trail we'll design in Episode 41.

Model File Integrity

When you download a model from Hugging Face, Ollama, or any other source, you are trusting that the file has not been tampered with. Verify checksums. Use signed model files where available. Store models in a read-only filesystem. Monitor for unexpected changes to model files — a compromised model could exfiltrate data through its outputs or produce subtly incorrect legal analysis.6

Inference Endpoint Security

Even a local model API should have:

The Capability-Security Tradeoff

The honest challenge with local AI: smaller models are less capable. A 7-billion-parameter model running on a single GPU will not match GPT-4 or Claude Opus on complex legal reasoning, nuanced contract analysis, or multi-jurisdictional research. The security benefit of local deployment must be weighed against the quality risk of using a less capable model for consequential legal work.7

Model Size Typical Hardware Rough Capability Best For
7B parameters Single consumer GPU (24GB VRAM) Basic summarisation, simple Q&A Document triage, simple classification
14-35B parameters Single high-end GPU (48GB VRAM) Competent drafting, clause analysis Contract review, privilege screening
70B+ parameters Multi-GPU or cloud Near-frontier quality Complex legal reasoning, research memos
Frontier (200B+) Cloud only State of the art Everything; required for some tasks
Cost reality check: A cloud server with 8x NVIDIA H100 GPUs costs approximately $98 per hour. The same hardware on-premises costs about $0.87 per hour in electricity. The breakeven point is roughly 12 months of continuous use — after which on-premises is dramatically cheaper.8 For a firm running AI workloads continuously, the economics of self-hosting are compelling. For occasional use, cloud is more practical.

Regulatory Drivers

Several regulatory frameworks push toward local or on-premises AI for sensitive data:

For law firms handling international matters, the safest approach is often to keep the AI local and avoid the jurisdictional analysis entirely.

Practical Recommendations

  1. Default to cloud with redaction (EP39) for general-purpose tasks where client data can be effectively stripped
  2. Deploy a local model for privileged, high-sensitivity, or regulated data that should not leave your network under any circumstances
  3. Use the inference gateway (EP38) as the routing decision point — it examines each prompt, checks data classification, and routes to local or cloud accordingly
  4. Secure the local endpoint with network isolation, authentication, rate limiting, and model file integrity verification
  5. Monitor both paths — audit logs should capture every inference request regardless of whether it went to cloud or local, with the routing decision and its rationale logged

What's Next

Episode 41 moves to Module 9 — Audit and Logging, starting with Audit Log Design — the structured records that capture who did what, when, and to which resource. Every control we've discussed across 40 episodes depends on logs. If you can't prove it happened, it didn't happen.

Sources & Further Reading

Sources & references

  1. Prediction Guard, Self-Hosted vs. Third-Party Deployment: A Technical Evaluation Guide for Regulated Enterprises.
  2. DataNorth AI, Local LLM: Privacy, Security, and Control.
  3. Vitalik Buterin, My Self-Sovereign / Local / Private / Secure LLM Setup (April 2026).
  4. AIMultiple, Cloud LLM vs Local LLMs: Examples & Benefits.
  5. Digital Applied, Local LLM Deployment: Privacy-First AI Complete Guide.
  6. EPAM SolutionsHub, Open LLM Security Risks and Best Practices.
  7. Unified AI Hub, On-Prem LLMs vs Cloud APIs: When to Run Models Locally.
  8. GodOfPrompt, Local LLM Setup for Privacy-Conscious Businesses.
  9. Spellbook, Most Private AI for Lawyers: Why Zero Data Retention Wins in 2026.
  10. Matillion, Public vs Private LLMs: Secure AI for Enterprises.