Today’s Lesson
Security for Legal SaaS — Episode 34: RAG Poisoning and Document Trust Tiers
Your AI's Knowledge Base Is an Attack Surface
Retrieval-Augmented Generation (RAG) is the most common architecture for AI-powered legal tools. Instead of relying solely on the model's training data, a RAG system retrieves relevant documents from a knowledge base and includes them in the model's context before generating a response. This is how legal research tools find relevant case law, how contract review systems reference clause libraries, and how document summarisation tools pull from firm knowledge bases.
The security assumption underpinning most RAG systems is dangerously wrong: retrieved content is implicitly trusted. User input is filtered, validated, and treated as potentially adversarial. But documents pulled from the knowledge base are treated as authoritative context — fed directly into the model without scrutiny. As security researcher Christian Schneider documented, "the architecture creates an implicit trust distinction that most security teams never question."1
This episode covers how attackers exploit that trust assumption, and how to build a document trust hierarchy that prevents contaminated sources from corrupting AI outputs.
How RAG Works — and Where Trust Enters
A standard RAG pipeline has three stages:
| Stage | What Happens | Trust Assumption |
|---|---|---|
| Indexing | Documents are converted to vector embeddings (numerical representations) and stored in a vector database | Assumes documents are legitimate and unmodified |
| Retrieval | User query is converted to an embedding; nearest neighbours in the vector database are retrieved | Assumes retrieved documents are relevant and trustworthy |
| Generation | Retrieved documents are inserted into the model's context alongside the user query; the model generates a response | Assumes all context is authoritative |
The vulnerability exists at every stage. A malicious document inserted at the indexing stage will be retrieved and trusted at the generation stage. The model has no mechanism to distinguish a firm's internal legal memorandum from a poisoned document planted by an adversary.2
Poisoning Vectors
Research has demonstrated several practical approaches to corrupting a RAG knowledge base:
1. Direct Document Injection
An attacker adds malicious documents to the knowledge base. In legal SaaS, this could occur through:
- Client document uploads. A client (or someone impersonating a client) uploads a document containing adversarial content designed to manipulate future AI queries.
- Third-party data feeds. If your knowledge base ingests content from external sources — case law databases, regulatory feeds, legal news — any of those sources could be compromised.
- Shared knowledge bases. Multi-tenant platforms where multiple firms contribute to a shared clause library or precedent database create cross-contamination risk.
The efficiency of this attack is striking. Academic research has demonstrated that as few as 5 adversarial documents in a corpus of millions can achieve over 90% attack success rates.3 A separate study found that 10 adversarial passages — just 0.04% of a corpus — can achieve 98.2% retrieval success.4
2. Metadata Manipulation
Vector databases store metadata alongside embeddings — document titles, dates, authors, categories. An attacker who can modify metadata can change which documents get retrieved for which queries, even without altering document content. Mislabelling a poisoned document as "firm-authored" or tagging it with relevant case topics ensures it appears in retrieval results.5
3. Embedding Space Manipulation
A sophisticated attacker can craft documents whose embeddings (the numerical vectors that represent their meaning) are specifically optimised to be retrieved for target queries. The document's readable text might appear benign, but its embedding is engineered to sit close to the embeddings of queries about specific legal topics — ensuring the poisoned document is always retrieved when lawyers research those topics.6
Document Trust Tiers
The architectural defence against RAG poisoning is a trust-tiered retrieval system that classifies every document by provenance and enforces different policies per tier:
| Trust Tier | Source | Policy | Examples |
|---|---|---|---|
| Tier 1: Firm-Authored | Created internally by the firm's lawyers | Highest trust; included in all retrieval contexts | Internal memos, clause libraries, firm precedents |
| Tier 2: Client-Provided | Uploaded by verified clients | Moderate trust; included in retrieval for the specific matter | Client contracts, instructions, evidence packages |
| Tier 3: Third-Party Verified | From authenticated external sources with integrity checks | Lower trust; results marked with provenance | Court filings (verified via e-filing API), regulatory databases |
| Tier 4: Third-Party Unverified | External content without integrity guarantees | Lowest trust; results always flagged; never mixed with Tier 1 context | Web-sourced case commentaries, news articles, public datasets |
Retrieval Filtering
The trust tier must be enforced at retrieval time, not just at display time. When the model asks for relevant documents, the retrieval layer should:
- Filter by the maximum trust tier appropriate for the query context.
- Never mix Tier 4 content with Tier 1 context in the same retrieval result set.
- Include provenance metadata in the context provided to the model: "Source: firm precedent library, authored by [partner name], dated [date]" vs. "Source: external web content, unverified."
This ensures the model has information about the reliability of its sources — even if it cannot perfectly act on that information, the provenance is available for human review.7
Provenance Tracking: Every Output Should Cite Its Sources
A RAG system should maintain a complete chain from input query to retrieved documents to generated output. For every AI-generated response:
- Which documents were retrieved? List the specific documents, with trust tiers and metadata.
- Which passages were most influential? Modern RAG frameworks can identify which retrieved chunks the model relied on most heavily.
- What was the generation confidence? Some frameworks provide confidence scores or uncertainty estimates.
This provenance chain serves three purposes: it enables human review ("the AI cited this document — is that the right source?"), it supports audit requirements ("show me the basis for this AI-generated summary"), and it enables forensic investigation when something goes wrong ("which document caused the model to produce an incorrect output?").8
The RAGShield approach: Recent research has proposed provenance-verified defence systems specifically for government RAG systems. RAGShield implements cryptographic attestation of document provenance, ensuring that every retrieved document can be traced back to its original source with tamper evidence.9
Practical Architecture: Trust-Aware RAG
A minimal trust-aware RAG architecture for legal SaaS:
- Ingest pipeline with classification. Every document is assigned a trust tier at ingestion based on its source. Client uploads go through validation (authenticated upload, verified client identity). External sources go through integrity checks (hash verification, source authentication).
- Separate vector collections per tier. Tier 1 and Tier 4 documents live in different vector database collections (or namespaces). This prevents similarity search from accidentally returning poisoned content alongside authoritative precedent.
- Query-time tier enforcement. The retrieval API accepts a maximum trust tier parameter. Queries about firm precedent search only Tier 1. Queries about opposing party documents search Tier 2-4 with provenance labels.
- Output provenance injection. Every AI response includes citations with trust tier labels visible to the reviewing lawyer.
- Anomaly detection on retrieval patterns. Monitor for sudden changes in which documents are being retrieved, new documents appearing with suspicious metadata, or retrieval patterns that diverge from historical norms.
What's Next
Episode 35 covers Embedding Security and Vector Database Isolation — the infrastructure layer beneath RAG, where multi-tenant vector databases can leak data between clients through cosine similarity queries if isolation is not enforced at the embedding level.
Sources & Further Reading
Sources & references
- Christian Schneider, RAG Security: The Forgotten Attack Surface.
- arXiv, Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks.
- arXiv, Secure Retrieval-Augmented Generation against Poisoning Attacks.
- arXiv, Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions.
- arXiv, RAG Security and Privacy: Formalizing the Threat Model and Attack Surface.
- USENIX, Machine Against the RAG: Jamming Retrieval.
- arXiv, Benchmarking Poisoning Attacks against Retrieval-Augmented Generation.
- OWASP, LLM08:2025 Vector and Embedding Weaknesses.
- arXiv, RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning.
- arXiv, RAG Safety: Exploring Knowledge Poisoning Attacks.