Security for Legal SaaS

Episode 7 · Module 3 · App Security

Input Validation and Sanitisation

18 May 2026 · 10:12 · Security for Legal SaaS

0:00 10:12

Every piece of data entering your legal SaaS arrives through a gate. Alice and Dan cover allowlists versus denylists, server-side versus client-side validation, domain-specific patterns for legal citations, and why silent sanitisation masks the attacks you need to see.

Today’s Lesson

Security for Legal SaaS — Episode 7: Input Validation and Sanitisation

The Gatekeeper Problem

Every piece of data that enters your legal SaaS application arrives through a gate. A form field, an API (Application Programming Interface — the way software components communicate with each other) parameter, a file upload, a webhook payload. Input validation is the practice of ensuring that data conforms to expected formats, types, and ranges before your application processes it.

Get this wrong, and you enable every injection attack that follows — SQL injection, XSS, command injection, path traversal. The 2023 OWASP Top 10 lists injection as A03, and virtually every injection vulnerability begins with insufficient input validation.

Key principle: Input validation is your first line of defence, but not your only one. OWASP advises treating validation as defence-in-depth — layer it with parameterised queries, output encoding, and architectural controls.

Allowlists vs Denylists

The most fundamental decision in validation design: do you define what's allowed, or what's forbidden?

Approach Definition Example
Allowlist (positive validation)Only explicitly permitted patterns passCourt ID must match [A-Z]{2}\d{4}/\d{4}
Denylist (negative validation)Block known-bad patterns, allow everything elseStrip <script> tags from input

OWASP unequivocally recommends allowlisting: "Input validation should be applied using an allowlist approach, not a denylist." Denylists fail because:

Legal SaaS example: A case reference field that strips angle brackets (< and >) to prevent XSS can be bypassed with event handlers: onerror=alert(1) embedded in an image tag using backtick substitution. An allowlist that requires the pattern [A-Z]{2}\s?\d{1,5}/\d{4} rejects everything malicious by definition — only valid case references pass.

Boundary Validation

Where should validation happen? The answer is at every trust boundary — but the mandatory boundary is the server.

The Validation Stack

Layer Purpose Enforces Security?
Client-side (browser)UX feedback, reduce round-tripsNo — attacker bypasses trivially
API gateway / middlewareSchema validation, rate limitingPartially — coarse-grained
Application layer (server)Business rule validationYes — primary enforcement point
Database layerType constraints, foreign keysYes — last-resort constraint

Client-side validation is never a security control. An attacker can disable JavaScript, modify the DOM (Document Object Model — the browser’s internal representation of a webpage that JavaScript can read and modify), or bypass the browser entirely with direct API calls. Every validation the client performs must be repeated on the server, independently.

Boundary validation means validating data every time it crosses a trust boundary — not just at initial input. A value that was safe when received might become dangerous after transformation. If you URL-decode user input and then use it in a database query, you must validate after decoding, not before.

Legal-Specific Input Patterns

Legal SaaS handles distinctive input types that demand domain-specific validation rules:

Input Type Expected Format Validation Approach
Singapore case citation SGCA 15Regex: \[\d{4}\]\s+[A-Z]{2,6}\s+\d{1,5}
UK neutral citation UKSC 42Regex: \[\d{4}\]\s+[A-Z]{2,6}\s+\d{1,5}
US case docket1:23-cv-04521Regex: \d{1,2}:\d{2}-[a-z]{2,3}-\d{4,6}
Court identifierSGCA, SGHC, UKSCAllowlist of known court codes
Statute references 34(1)(a)Pattern with known section formats
Date fieldsISO 8601 or jurisdiction-specificStrict date parsing, range validation
Client matter numberFirm-specific formatConfigurable regex per firm tenant

CWE-20 (Improper Input Validation) is the root weakness behind dozens of more specific vulnerabilities. For legal identifiers, always validate against the jurisdiction's actual format specification — not a loose pattern that happens to work for your test data.

Server-Side Validation Is Mandatory

The OWASP Application Security Verification Standard (ASVS) Level 1 requires:

Why not just sanitise? Sanitisation (modifying input to make it safe) is riskier than rejection because:

  1. You might miss an encoding your sanitiser doesn't handle
  2. Modified input might have unintended meaning in business context
  3. Silent sanitisation masks attacks — you should log and alert on validation failures

The safe pattern: validate first (reject if invalid), then sanitise where necessary for specific output contexts (HTML encoding, SQL parameterisation).

Schema Validation with ORMs

Modern legal SaaS typically uses an ORM (Object-Relational Mapper — a library that lets you write database queries in your programming language instead of raw SQL, the standard language for querying databases) like Prisma, SQLAlchemy, or TypeORM. ORMs provide schema-level validation as a byproduct of their type system:

Prisma's schema validation enforces types at the application boundary:

Best practice: Use Zod (TypeScript) or Pydantic (Python) as a validation layer between your API endpoints and your ORM. These schema validation libraries catch malformed input before it reaches your business logic:

const CaseInput = z.object({
  citation: z.string().regex(/^\[\d{4}\]\s+[A-Z]{2,6}\s+\d{1,5}$/),
  courtCode: z.enum(['SGCA', 'SGHC', 'SGDC', 'UKSC', 'EWCA']),
  filedAt: z.string().datetime(),
});

Common Validation Failures in Legal Tech

Failure Mode Consequence Fix
Trusting client-side validationInjection attacks bypass browser checksDuplicate all validation server-side
Validating before decodingEncoded payloads bypass checksCanonicalise first, validate second
Type coercion assumptions"0" == false in JS, bypasses boolean checksUse strict type checking
Missing length limitsBuffer overflows, DoS via megabyte-sized inputsEnforce max lengths on all string inputs
Unicode normalisation gapsHomoglyph attacks, normalisation-based bypassesNormalise to NFC before validation
Allowing null bytesNull byte injection truncates strings in C-based systemsReject null bytes unconditionally

PortSwigger's research on encoding-based bypasses demonstrates that sophisticated attackers routinely exploit the gap between what validators check and what interpreters execute. Double-encoding, Unicode escapes, and mixed encoding schemes bypass denylists consistently.

Validation as Logging Trigger

OWASP's Logging Cheat Sheet recommends logging all input validation failures. In legal SaaS, a sudden spike in validation failures on a case search endpoint could indicate:

Log the failure, the input (sanitised for the log itself), the source IP, and the authenticated user. Feed this into your SIEM (Security Information and Event Management — the system that aggregates and analyses security logs). NIST SP 800-92 provides guidance on log management for security monitoring.

Conclusion

Input validation is where security begins — the point where untrusted data from the outside world meets your application's internal logic. Allowlist over denylist. Server-side over client-side. Reject over sanitise. Log over ignore. Combined with schema validation from ORMs and runtime validators like Zod or Pydantic, you create multiple layers of defence before data reaches your business logic.

Next episode: SQL Injection and ORM Safety — what happens when input validation fails and untrusted data reaches your database queries.

Sources & references

  1. OWASP, "Input Validation Cheat Sheet." Primary reference for validation strategies
  2. OWASP Top 10:2021, "A03 Injection."
  3. PortSwigger, "Obfuscating attacks using encodings." Research on encoding-based validation bypasses
  4. OWASP, "Threat Modeling." Trust boundaries as validation points
  5. PortSwigger, "Business logic vulnerabilities." Client-side validation bypasses
  6. MITRE CWE-20, "Improper Input Validation." Root cause classification
  7. OWASP Application Security Verification Standard (ASVS). Server-side validation requirements
  8. OWASP, "Logging Cheat Sheet." Logging validation failures for security monitoring
  9. Prisma, "Prisma Schema." Type-safe schema validation at the ORM layer
  10. Zod, TypeScript-first schema validation library
  11. Pydantic, Python data validation using type annotations
  12. Unicode Technical Report #15, "Unicode Normalization Forms."
  13. NIST SP 800-92, "Guide to Computer Security Log Management."