Today’s Lesson
Security for Legal SaaS — Episode 31: PII Handling and Anonymisation
Personal Data in Legal Tech Is Double-Sensitive
Most SaaS platforms handle personally identifiable information (PII) — names, email addresses, phone numbers. Legal SaaS platforms handle PII that is often simultaneously legally privileged. A client's name attached to a litigation strategy memo. A witness's home address in a deposition transcript. Financial records produced in discovery. This is not just personal data under privacy regulations — it is data protected by attorney-client privilege, work product doctrine, or court-ordered confidentiality.
The sensitivity is compounded, and so is the obligation. Get PII handling wrong in a generic SaaS product and you face regulatory fines. Get it wrong in legal SaaS and you face regulatory fines, malpractice liability, and potential waiver of privilege.
What Counts as PII in Legal Tech
PII is any information that can identify a specific individual, either directly or in combination with other data. In the legal technology context, the scope is broader than most developers expect:
| Category | Examples | Why It Matters in Legal Tech |
|---|---|---|
| Direct identifiers | Full name, SSN, passport number, email address | Present in virtually every legal document |
| Case metadata | Case numbers, court filing dates, docket entries | Combined with public records, can identify parties |
| Financial records | Bank statements, tax returns, billing records | Common in discovery; regulated by multiple frameworks |
| Health information | Medical records, insurance claims | Subject to HIPAA in the US; frequent in personal injury and employment cases |
| Communications metadata | Email timestamps, call logs, IP addresses | Can reveal attorney-client relationships even without content |
| Biometric data | Voice recordings (depositions), facial images | Increasingly regulated; used in remote hearings |
Key principle: In legal SaaS, always assume data is PII unless proven otherwise. A document number that seems anonymous can become identifying when cross-referenced with a public court docket.1
Data Minimisation: Collect Less, Protect Less
The first line of defence is not a technical control — it is restraint. GDPR Article 5(1)(c) codifies data minimisation: personal data must be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed."2
For legal SaaS, this means:
- Collect only what your feature requires. If your document review tool needs to classify documents by type, it does not need to index the full text of every exhibit.
- Retain only as long as necessary. Define retention periods per data category. Case data may need to be retained for the statute of limitations period. Audit logs may have a different retention schedule. Activity analytics should expire after months, not years.
- Delete completely when the period expires. This means database records, search indices, backups, caches, log files, and — as we discussed in Episode 29 — crypto-shredding of encrypted data by destroying its encryption keys.
Pseudonymisation vs. Anonymisation: The Difference Matters
These two terms are frequently confused, and the distinction has direct legal consequences under GDPR:3
| Property | Pseudonymisation | Anonymisation |
|---|---|---|
| Definition | Replace identifiers with artificial tokens; the mapping exists somewhere | Remove or transform identifiers so that re-identification is impossible |
| Reversible? | Yes — with the mapping key | No — by definition, irreversible |
| Still personal data under GDPR? | Yes — GDPR still applies in full | No — falls outside GDPR scope entirely |
| Example | Replace "Jane Smith" with "Client-7A3F" in a case file; store the mapping in a separate secure system | Aggregate case outcomes into statistics with no individual-level records |
Pseudonymisation is a security measure, not a de-identification technique. GDPR Article 4(5) defines it as processing personal data so it "can no longer be attributed to a specific data subject without the use of additional information," provided that additional information is kept separately with appropriate technical controls.4
Anonymisation removes the data from GDPR scope entirely — but the bar is high. The Article 29 Working Party (now the EDPB) requires that re-identification must be impossible considering "all the means reasonably likely to be used" — including cross-referencing with other datasets, future technological advances, and publicly available information.5
The misclassification trap: Research consistently shows that organisations confuse pseudonymisation with anonymisation. If your legal SaaS platform pseudonymises client data (replacing names with tokens) but still stores the mapping, that data is personal data under GDPR. Treating it as anonymous — for example, sharing it with analytics vendors or using it for model training without consent — is a compliance violation.6
Tokenisation for Search: Maintaining Functionality
A practical challenge in legal SaaS is maintaining search functionality without exposing raw PII. If a lawyer needs to search for "all documents mentioning Jane Smith," but Jane Smith's name has been pseudonymised to "Client-7A3F" in the search index, how does the search work?
Tokenisation addresses this. The application:
- Tokenises PII at ingestion (replacing "Jane Smith" with a deterministic token derived from a keyed hash).
- Stores the token in the search index alongside non-PII metadata.
- At search time, applies the same tokenisation function to the search query, so the lawyer searching for "Jane Smith" generates the same token and matches the indexed records.
- The display layer de-tokenises for authorised users, showing the original name.
The search index never contains raw PII. The tokenisation key is managed through the key management infrastructure covered in Episode 29. If the tokenisation key is destroyed, the search index becomes permanently unlinkable — another form of crypto-shredding.7
Data Subject Access Requests: The Technical Architecture
Under GDPR, individuals have the right to request access to their personal data (Article 15), correction (Article 16), and deletion (Article 17). For legal SaaS platforms, this means you need a technical architecture that can:8
- Discover all PII for a given individual across all systems (database, search index, backups, logs, third-party integrations).
- Export that data in a portable format (Article 20 — data portability).
- Delete it completely, or document why deletion is legally excluded (e.g., legal hold, regulatory retention requirement).
- Respond within 30 days (extendable to 90 for complex requests).
This requires a PII inventory — a mapping of every data store that contains personal data, what PII fields it holds, and how to query it for a specific individual. Without this inventory, DSAR fulfilment becomes a manual, error-prone scramble across dozens of systems.
Legal Holds and the Deletion Tension
Legal SaaS faces a unique tension: data protection law says "delete personal data when no longer needed," but litigation holds say "preserve everything potentially relevant to pending or anticipated litigation." These obligations can conflict directly.
The resolution is documented exception handling:
- Legal holds override deletion timelines for data within the hold's scope.
- Hold metadata (which case, which custodian, which date range) must be tracked so that once the hold is released, deletion resumes.
- Privilege logs documenting why specific data was retained satisfy both the data protection authority and the court.
Real-World PII Failures in Legal Technology
| Incident | What Went Wrong | Impact |
|---|---|---|
| Orrick, Herrington & Sutcliffe (2023) | Attackers accessed personal data — names, SSNs, financial data, health information — of over 600,000 individuals stored in the firm's systems | $8 million class action settlement; reputational damage9 |
| Latitude Financial (2023) | Retained driver's licence numbers and passport data years beyond any legitimate business need; attackers exfiltrated 14 million records | Demonstrated that data minimisation failures multiply breach impact; retained PII that should have been deleted years earlier |
| Clearview AI (multiple, 2020-2024) | Scraped billions of facial images from public sources for biometric identification; regulators in multiple jurisdictions found GDPR violations | Fines exceeding EUR 20 million; demonstrated that "publicly available" does not mean "lawfully processable" |
For legal SaaS developers, the Orrick breach illustrates a pattern: the firm stored PII from individuals who were not even clients — they were individuals whose data was held because Orrick served as legal counsel to other companies that had been breached. The firm became a secondary target because it accumulated PII through its advisory role. Data minimisation — deleting data as soon as the advisory engagement ends — would have limited the exposure.
The privilege multiplier: When PII in a legal SaaS system is also covered by attorney-client privilege, a breach creates a dual harm: the privacy violation under data protection law AND potential waiver of privilege. Courts have held that failure to maintain reasonable security for privileged communications can constitute waiver. The PII breach becomes a privilege breach.10
What's Next
Episode 32 covers Database Security Hardening — protecting the system where all this PII ultimately lives, from network isolation to audit logging to backup encryption.
Sources & Further Reading
Sources & references
- GDPR Local, Data Pseudonymisation vs Anonymisation: Key Differences.
- GDPR, Article 5(1)(c) — Principles Relating to Processing of Personal Data.
- MOSTLY AI, Pseudonymization vs Anonymization: Ensure GDPR Compliance.
- GDPR, Article 4(5) — Definition of Pseudonymisation.
- ENISA, Pseudonymisation Techniques and Best Practices.
- Privacy Company, What Are the Differences Between Anonymisation and Pseudonymisation.
- Protecto AI, Pseudonymization vs Anonymization: Key Differences, Examples & GDPR Guide.
- GDPR, Article 15 — Right of Access by the Data Subject.
- GDPR Register, Is Pseudonymised Data Personal Data? 2025 Guide.
- Xata, Data Pseudonymization Explained: When Anonymization Isn't Enough.
- Piwik PRO, The Most Important Benefits of Data Pseudonymization and Anonymization Under GDPR.