Today’s Lesson
Security for Legal SaaS — Episode 40: Local vs. Cloud AI — Security Boundaries
Your Data Never Leaves. But "Local" Has Its Own Risks.
In Episode 39, we built a redaction pipeline to strip sensitive data before sending prompts to cloud AI. This episode asks the next logical question: what if you skip the cloud entirely and run the AI model on your own hardware?
Local AI deployment — running large language models on servers you own and control — is increasingly practical. Models like Llama, Mistral, Qwen, and Gemma can run on consumer-grade GPUs. For legal AI handling the most sensitive data — privileged communications, litigation strategy, government investigation materials — keeping everything within your network perimeter can eliminate entire categories of risk. But "local" introduces a different threat surface, and understanding the tradeoffs is the difference between genuine security and security theatre.
Cloud AI: What You Gain and What You Give Up
When you call a cloud LLM provider — OpenAI, Anthropic, Google, AWS Bedrock — your data traverses networks and is processed on infrastructure outside your control. Even with contractual protections (data processing agreements, zero-retention policies, SOC 2 attestations), several fundamental risks remain:1
| Cloud Risk | Description |
|---|---|
| Data in transit | Prompts and responses travel over the public internet (encrypted, but still) |
| Provider-side breach | Your data is on someone else's servers; their breach is your breach |
| Subpoena exposure | A subpoena served on the provider could capture your data |
| Jurisdictional issues | Data may be processed in regions with different privacy laws |
| Model training risk | Without zero-retention agreements, data may be used to improve models |
| Third-party access | Provider employees with administrative access could theoretically access data |
What you gain is substantial: state-of-the-art model quality, automatic scaling, zero infrastructure management, and access to the largest and most capable models that require hundreds of GPUs to run.
Local AI: What You Gain and What You Give Up
Running a model locally means your data never leaves your network. No prompts cross the internet. No third party processes your privileged communications. For compliance and privilege protection, this is the strongest possible posture.2
But the threat surface shifts — it doesn't disappear:
| Local Risk | Description |
|---|---|
| You own the patching | Model vulnerabilities, OS patches, driver updates — all your responsibility |
| Physical security | The hardware running the model needs the same physical protection as your file servers |
| Model provenance | Who built the model? Was the model file tampered with between download and deployment? |
| Network exposure | A locally deployed model with a network-accessible API is an attack surface |
| Insider threats | Internal users with access to the model endpoint can extract data through the same techniques we covered in EP36 (model inversion, membership inference) |
| Capability gap | Smaller models make more errors; lower quality increases the risk of incorrect legal analysis |
| Operational burden | GPU procurement, cooling, power, monitoring, failover — all on you |
Vitalik Buterin's local LLM setup (April 2026): Ethereum founder Vitalik Buterin published his personal local AI security setup, arguing that for sensitive personal data, "running the model locally means the data never leaves your machine" — but noting that local deployment requires ongoing vigilance around model integrity, network isolation, and access controls.3
The Hybrid Architecture
The most practical approach for legal AI is hybrid: route sensitive data to local models and general tasks to cloud AI. This combines the privacy of local inference with the capability of cloud models.
| Data Category | Routing | Rationale |
|---|---|---|
| Privileged communications | Local only | Privilege waiver risk from cloud exposure |
| Litigation strategy memos | Local only | Highest-sensitivity work product |
| Client PII (names, financials) | Local, or cloud with redaction (EP39) | Confidentiality obligations |
| General legal research | Cloud (no client data in prompt) | Public information, no confidentiality risk |
| Document formatting/structure | Cloud (with redaction) | Low-sensitivity task, benefits from larger models |
| Contract template analysis | Cloud (anonymised) | Templates contain no client-specific data |
Decision Framework
Ask four questions for each AI task:
- Data sensitivity: Does the prompt contain privileged, confidential, or personally identifiable information?
- Model capability: Does the task require a frontier model (GPT-4, Claude Opus), or can a smaller local model handle it?
- Compliance mandate: Do applicable regulations (GDPR, HIPAA, data localisation laws) require data to stay within your jurisdiction?
- Acceptable risk: If the data were exposed, what is the worst-case consequence?
If the answer to question 1 is "yes" and the answer to question 2 is "a local model can handle it," the choice is clear: run locally.4
Securing Local Inference
"Local" does not mean "automatically secure." A model running on your network needs the same security controls as any other service:
Network Isolation
The local model endpoint should not be accessible from the public internet. Deploy it in an isolated network segment — the same principle we covered in Episode 15 on network segmentation. Only your application servers should be able to reach the model API. Use mTLS (mutual TLS, from Episode 15) to authenticate clients connecting to the model endpoint.5
Access Controls
Not every user in your organisation should have access to the local model. Apply role-based access controls. Log every inference request with the authenticated identity of the requester, the prompt content (encrypted), and the response. These logs feed into the audit trail we'll design in Episode 41.
Model File Integrity
When you download a model from Hugging Face, Ollama, or any other source, you are trusting that the file has not been tampered with. Verify checksums. Use signed model files where available. Store models in a read-only filesystem. Monitor for unexpected changes to model files — a compromised model could exfiltrate data through its outputs or produce subtly incorrect legal analysis.6
Inference Endpoint Security
Even a local model API should have:
- Authentication: Every request must prove the caller's identity
- Rate limiting: Prevent abuse and detect anomalous query patterns (EP36 inference attacks)
- Input validation: Reject prompts that exceed expected length or contain suspicious patterns
- Output filtering: Scan responses for unintended data leakage before returning to the user
The Capability-Security Tradeoff
The honest challenge with local AI: smaller models are less capable. A 7-billion-parameter model running on a single GPU will not match GPT-4 or Claude Opus on complex legal reasoning, nuanced contract analysis, or multi-jurisdictional research. The security benefit of local deployment must be weighed against the quality risk of using a less capable model for consequential legal work.7
| Model Size | Typical Hardware | Rough Capability | Best For |
|---|---|---|---|
| 7B parameters | Single consumer GPU (24GB VRAM) | Basic summarisation, simple Q&A | Document triage, simple classification |
| 14-35B parameters | Single high-end GPU (48GB VRAM) | Competent drafting, clause analysis | Contract review, privilege screening |
| 70B+ parameters | Multi-GPU or cloud | Near-frontier quality | Complex legal reasoning, research memos |
| Frontier (200B+) | Cloud only | State of the art | Everything; required for some tasks |
Cost reality check: A cloud server with 8x NVIDIA H100 GPUs costs approximately $98 per hour. The same hardware on-premises costs about $0.87 per hour in electricity. The breakeven point is roughly 12 months of continuous use — after which on-premises is dramatically cheaper.8 For a firm running AI workloads continuously, the economics of self-hosting are compelling. For occasional use, cloud is more practical.
Regulatory Drivers
Several regulatory frameworks push toward local or on-premises AI for sensitive data:
- GDPR Article 44-49: Cross-border data transfers require adequate protections. Local deployment eliminates transfer concerns entirely.
- HIPAA (for health-adjacent legal work): Protected health information processed through an AI system must be covered by a Business Associate Agreement with the provider — or processed locally.
- Data localisation laws: Jurisdictions including China, Russia, India, and increasingly the EU mandate that certain data categories remain within national borders.
- Legal professional privilege (UK/SG): Disclosure to a third-party processor may not inherently waive privilege, but it complicates privilege claims and adds litigation risk.9
For law firms handling international matters, the safest approach is often to keep the AI local and avoid the jurisdictional analysis entirely.
Practical Recommendations
- Default to cloud with redaction (EP39) for general-purpose tasks where client data can be effectively stripped
- Deploy a local model for privileged, high-sensitivity, or regulated data that should not leave your network under any circumstances
- Use the inference gateway (EP38) as the routing decision point — it examines each prompt, checks data classification, and routes to local or cloud accordingly
- Secure the local endpoint with network isolation, authentication, rate limiting, and model file integrity verification
- Monitor both paths — audit logs should capture every inference request regardless of whether it went to cloud or local, with the routing decision and its rationale logged
What's Next
Episode 41 moves to Module 9 — Audit and Logging, starting with Audit Log Design — the structured records that capture who did what, when, and to which resource. Every control we've discussed across 40 episodes depends on logs. If you can't prove it happened, it didn't happen.
Sources & Further Reading
Sources & references
- Prediction Guard, Self-Hosted vs. Third-Party Deployment: A Technical Evaluation Guide for Regulated Enterprises.
- DataNorth AI, Local LLM: Privacy, Security, and Control.
- Vitalik Buterin, My Self-Sovereign / Local / Private / Secure LLM Setup (April 2026).
- AIMultiple, Cloud LLM vs Local LLMs: Examples & Benefits.
- Digital Applied, Local LLM Deployment: Privacy-First AI Complete Guide.
- EPAM SolutionsHub, Open LLM Security Risks and Best Practices.
- Unified AI Hub, On-Prem LLMs vs Cloud APIs: When to Run Models Locally.
- GodOfPrompt, Local LLM Setup for Privacy-Conscious Businesses.
- Spellbook, Most Private AI for Lawyers: Why Zero Data Retention Wins in 2026.
- Matillion, Public vs Private LLMs: Secure AI for Enterprises.