Episode 55 · Module 11 · Monitoring & Incident Response

Disaster Recovery and Business Continuity

19 May 2026 · 8:50 · Security for Legal SaaS

8:50 8:50

On a Monday morning, the cloud region hosting your legal SaaS platform experiences a catastrophic failure. Power outage, network partition, natural disaster — the cause doesn't matter. What matters is that hundreds of lawyers at dozens of firms cannot access their case files, their court deadlines are in hours, and your phone is ringing. Disaster recovery (DR) and business continuity (BC) planning answer a single question: how long until your clients can work again, and how much data do they lose? For most SaaS platforms, downtime is inconvenient. For legal SaaS, it can be professionally catastrophic. Court filing deadlines don't move because your server crashed.

Today’s Lesson

Security for Legal SaaS — Episode 55: Disaster Recovery and Business Continuity

Your Primary Region Is Underwater

Disaster recovery (DR) and business continuity (BC) planning answer a single question: how long until your clients can work again, and how much data do they lose? For most SaaS platforms, downtime is inconvenient. For legal SaaS, it can be professionally catastrophic. Court filing deadlines don't move because your server crashed. A missed limitation period cannot be refiled. GDPR Article 32(1)(c) explicitly requires "the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident" — making disaster recovery not just an operational concern but a data protection compliance requirement.¹

RTO and RPO: The Two Numbers That Define Your Recovery

Every disaster recovery plan is built around two metrics:

RTO (Recovery Time Objective): The maximum acceptable time your system can be down. If your RTO is four hours, your DR plan must be capable of restoring service within four hours of a disaster declaration.

RPO (Recovery Point Objective): The maximum acceptable amount of data loss, measured in time. If your RPO is one hour, you must have backups or replication no more than one hour old. Any data created in the gap between the last backup and the disaster is lost.

Platform Type	Typical RTO	Typical RPO	Justification
Case management SaaS	1-4 hours	15 minutes	Court deadlines; active matters need near-continuous access
Document review platform	4-8 hours	1 hour	Large datasets; some delay tolerable during off-peak
E-filing integration	< 1 hour	Near-zero	Filing deadlines are absolute; missed filings have legal consequences
Contract repository	4-12 hours	1 hour	Reference material; lower urgency than active litigation tools

The legal-specific risk: A general-purpose SaaS platform might tolerate 24-hour RTO because users can wait. Legal SaaS cannot. A lawyer who can't access case files before a hearing, can't retrieve a contract before a signing deadline, or can't file a document before a court deadline faces professional liability — potentially malpractice claims. Your DR plan must account for the legal urgency of your users' work, not just the technical difficulty of restoring service.

Backup Strategies

Backups are the foundation of any DR plan. But "we have backups" is not a DR plan — it's the beginning of one.

Backup Types

Strategy	How It Works	RPO Achievable	Restore Speed
Full backups	Complete copy of all data at a point in time	Depends on frequency (daily = 24hr RPO)	Fastest — single restore
Incremental backups	Only changes since the last backup	Depends on frequency	Slower — must replay from last full + all increments
Continuous Data Protection (CDP)	Every write is replicated in near-real-time	Seconds	Fast — minimal data loss
Snapshot-based	Point-in-time copy of storage volume	Minutes to hours	Fast — restore from snapshot
Cross-region replication	Data continuously replicated to another geographic region	Seconds to minutes	Fastest failover — region already has current data

The 3-2-1 Rule

At minimum, maintain:

3 copies of your data
On 2 different types of storage media
With 1 copy offsite (different geographic region)

For legal SaaS handling privileged communications, add encryption at rest for all backup copies. AWS, Azure, and GCP all provide managed backup services with built-in encryption and cross-region replication.²

The Backup Nobody Tests

An untested backup is not a backup. Regularly test that you can actually restore from your backups to a functioning system. At least quarterly, perform a full restore test: take a backup, restore it to a clean environment, verify the application works and the data is intact. The number of organisations that discover their backups are corrupted or incomplete during an actual disaster is sobering. Testing is the only proof.

Multi-Region Failover

For RTOs below four hours, single-region deployments with backup restoration are usually too slow. You need multi-region architecture.

Active-Passive

Your application runs in a primary region. A secondary region has infrastructure provisioned but not actively serving traffic. Data is continuously replicated from primary to secondary. When the primary fails, traffic is redirected to the secondary.

Pros: Lower cost than active-active. Simpler to implement.

Cons: Failover takes time (minutes to an hour). Secondary region may be stale by the RPO window.

Buxton Consulting's DR guide for SaaS recommends active-passive as the pragmatic starting point for most SaaS platforms, with active-active reserved for mission-critical workloads with near-zero RTO requirements.³

Active-Active

Both regions actively serve traffic simultaneously. Data is replicated bidirectionally. If one region fails, the other absorbs the full load with zero or near-zero downtime.

Pros: Near-zero RTO. No failover delay.

Cons: Significantly more expensive. Bidirectional data replication introduces conflict resolution complexity — what happens when two users edit the same document in different regions simultaneously?

AWS Well-Architected DR Strategies

The AWS Well-Architected Framework's Reliability Pillar defines four DR strategies mapped to RTO/RPO ranges:⁴

Strategy	RTO	RPO	Cost
Backup & Restore	Hours	Hours	Lowest
Pilot Light	Minutes to hours	Minutes	Low-medium
Warm Standby	Minutes	Seconds to minutes	Medium-high
Multi-Site Active-Active	Near-zero	Near-zero	Highest

DR Testing: Proving Your Plan Works

Tabletop Exercises

Gather your team around a table (or a video call). Present a disaster scenario: "It's 9 AM Monday. AWS us-east-1 is completely down. Your monitoring shows all services unreachable. What do you do?" Walk through the response step by step. Who makes the failover decision? Who communicates with clients? How long does each step take?

Tabletop exercises are cheap, fast, and reveal gaps in your plan that aren't visible on paper. Run them quarterly.

Actual Failover Drills

At least annually, execute a real failover. Simulate a primary region failure and switch to your DR region. Verify:

Application functions correctly in the DR region
Data is complete and current (within RPO)
External integrations (court filing systems, email providers, payment processors) reconnect
Performance is acceptable under full load
The failback to the primary region works cleanly

Cloud4C's 2026 business resilience guide recommends automated failover drills with synthetic traffic, reducing the risk and effort of manual testing.⁵

Chaos Engineering

For mature teams, introduce controlled failures in production: terminate a database replica, simulate a network partition, corrupt a cache. This validates not just your DR plan but your system's day-to-day resilience. Netflix's Chaos Monkey pioneered this approach; tools like AWS Fault Injection Simulator and Gremlin provide managed chaos engineering platforms.

Legal-Specific DR Considerations

Court Filing Deadlines

Court filing systems have hard deadlines that cannot be extended because your platform is down. Your DR plan must ensure that e-filing integrations have independent failover paths — either through redundant integration endpoints or manual filing procedures that lawyers can follow when the platform is unavailable.

Privileged Communication Continuity

If your platform stores attorney-client privileged communications, DR replication must maintain the same access controls and encryption in the DR region as in production. A failover that exposes privileged documents to users who shouldn't see them is worse than downtime.

Regulatory Requirements

Beyond GDPR's availability requirement, the EU's Digital Operational Resilience Act (DORA), enforceable since January 2025, requires financial entities to maintain ICT continuity policies with defined RTOs and RPOs.⁵ While DORA targets financial services, legal SaaS platforms serving regulated industries must meet their clients' compliance requirements — which increasingly include DR certification.

The DR Minimum for Legal SaaS

Control	Purpose
Defined RTO and RPO for each service tier	Sets recovery expectations and drives architecture decisions
Automated daily backups with cross-region replication	Ensures data survives regional failures
Encrypted backup storage with access controls	Protects client data at rest in backup locations
Quarterly backup restore tests	Proves backups actually work
Documented failover procedures	Enables fast, reliable region switches
Annual failover drill	Validates the entire DR chain under realistic conditions
Client communication template for outages	Prepares clear, consistent messaging during incidents
Manual filing procedures for e-filing integrations	Provides lawyers a fallback when automation fails

Disaster recovery isn't about preventing disasters — it's about ensuring your clients can keep working when one happens. For legal SaaS, "keep working" means meeting court deadlines, accessing case files, and preserving privileged communications. Plan for the worst. Test regularly. The drill that feels unnecessary is the one that saves you.

Next episode: security testing in your development process — how to find vulnerabilities before attackers do, from code commit to production.

Sources & references

Konfirmity, GDPR Incident Response Plan: A Practical Guide — Article 32 DR Requirements.
Microsoft, Business Continuity, High Availability, and Disaster Recovery.
Buxton Consulting, Building and Testing Disaster Recovery Plans for SaaS Applications.
AWS, Well-Architected Framework: Plan for Disaster Recovery.
Cloud4C, Business Resilience in 2026: A Cross-Sector Guide to Rapid Disaster Recovery.
ATOZDEBUG, Disaster Recovery for SaaS — A Complete 2025 Strategy Guide.
GainHQ, Disaster Recovery SaaS Guide For Business Continuity In 2026.
Opsio Cloud, Disaster Recovery & Business Continuity in the Cloud.
Exodata, IT Disaster Recovery Plan Template (2026).
N2W Software, Best Cloud Recovery Tools for Business Continuity: Top 5 in 2026.

Alice: Welcome back to Security for Legal SaaS. I'm Alice.

Dan: And I'm Dan. Episode 55 — disaster recovery and business continuity. Alice, last two episodes we covered monitoring and incident response — detecting attacks and containing them. This episode is different. This isn't about an attacker. This is about everything just... going down.

Alice: Exactly. Your primary cloud region fails. Power outage, network partition, maybe a natural disaster. The cause is irrelevant. What matters is that hundreds of lawyers at dozens of firms cannot access their case files. Court deadlines are in hours. Your phone is ringing. What happens next depends entirely on what you planned before this moment.

Dan: Mm. And for legal SaaS, downtime is worse than for most platforms, right?

Alice: Much worse. For a social media app, four hours of downtime is embarrassing. For a legal SaaS platform, four hours could mean a missed court filing deadline. A missed limitation period that can never be refiled. A lawyer who can't access case documents before a hearing. These aren't inconveniences — they're potential malpractice claims. Court deadlines don't move because your server crashed.

Dan: Right. So where does the planning start?

Alice: Two numbers. RTO and RPO. RTO is Recovery Time Objective — the maximum acceptable time your system can be down. If your RTO is four hours, your plan must restore service within four hours. RPO is Recovery Point Objective — the maximum acceptable data loss, measured in time. If your RPO is one hour, your last backup must be no more than one hour old. Anything created between the last backup and the disaster is gone.

Dan: Mm-hmm. And those numbers are different for different types of legal platforms?

Alice: Significantly. An e-filing integration — where missing a deadline has direct legal consequences — needs an RTO under one hour and an RPO near zero. A case management system might tolerate one to four hours of downtime. A contract repository that's mostly reference material might be acceptable at four to twelve hours. But you can't set these numbers in a vacuum. You need to talk to your customers about what they can tolerate, and then build architecture that delivers it.

Dan: Yeah. Let's talk about backups. Everyone says they have backups. But having backups and having a recovery plan are different things.

Alice: "We have backups" is the beginning of a plan, not the plan itself. The critical questions are: how often? How far back? Are they tested? Where are they stored? A daily backup gives you a 24-hour RPO — meaning you could lose a full day of work. For a platform handling active litigation, that's probably not acceptable. Continuous data protection replicates every write in near-real-time — RPO of seconds. Cross-region replication keeps a copy of your data in a completely different geographic area, so a regional disaster doesn't destroy both your primary and your backup.

Dan: Mm. The 3-2-1 rule?

Alice: Three copies of your data, on two different types of storage, with one copy offsite. That's the minimum. And for legal SaaS, add encryption at rest for every backup copy. Your backups contain the same privileged communications and client data as your production system. They need the same protection.

Dan: Mm. And testing. I imagine most teams don't test their restores.

Alice: <sigh> Most teams discover their backups are corrupted or incomplete during an actual disaster. That's why you test quarterly — at minimum. Take a backup, restore it to a clean environment, verify the application starts, verify the data is intact, verify client records are accessible. An untested backup is not a backup. It's a hope.

Dan: Right. What about multi-region? When do you need that?

Alice: If your RTO is below four hours, restoring from backups in a single region is usually too slow. You need your application running — or ready to run — in a second geographic region. There are two approaches. Active-passive: your app runs in one region, a second region has infrastructure ready but not serving traffic, and data replicates continuously. If the primary fails, you switch traffic to the secondary. It takes minutes to about an hour. Active-active: both regions serve traffic simultaneously. If one fails, the other absorbs the load with near-zero downtime.

Dan: Mm. Active-active sounds better. Why wouldn't everyone do it?

Alice: Cost and complexity. Active-active means running your entire stack twice. And bidirectional data replication introduces conflict resolution problems. What happens when two users edit the same document in different regions at the same time? That's a hard problem. AWS actually maps this out nicely — their Well-Architected Framework defines four DR strategies with increasing cost and decreasing RTO. Backup and restore is the cheapest with the longest recovery. Pilot light keeps minimal infrastructure running. Warm standby runs a scaled-down version. Multi-site active-active is the most expensive with near-zero downtime. You pick based on your RTO requirements and your budget.

Dan: Yeah. And you actually drill this? Like a fire drill?

Alice: At least annually. Simulate a primary region failure and actually switch to your DR region. Verify the application works, the data is current, external integrations reconnect — court filing systems, email providers, payment processors. Then verify the failback to the primary works cleanly. Between drills, do quarterly tabletop exercises. Gather the team, present a scenario: "It's 9 AM Monday, AWS us-east-1 is down, what do you do?" Walk through the response step by step. Who makes the failover decision? Who communicates to clients? How long does each step take? Tabletops are cheap and they reveal gaps that aren't visible on paper.

Dan: Mm-hmm. For legal SaaS specifically — what are the unique considerations?

Alice: Three big ones. First: court filing deadlines. Your e-filing integrations need independent failover paths. If the platform is down and a lawyer has a filing due in two hours, there must be a documented manual procedure they can follow. Second: privilege continuity. When you fail over to a DR region, the access controls and encryption must be identical to production. A failover that exposes privileged documents to users who shouldn't see them is worse than downtime. Third: regulatory compliance. GDPR Article 32 explicitly requires the ability to restore availability in a timely manner. The EU's Digital Operational Resilience Act — DORA — which became enforceable in January 2025, requires organisations to maintain continuity policies with defined RTOs and RPOs. If your clients are in regulated industries, your DR certification becomes part of their compliance story.

Dan: Right. So the minimum DR checklist for a legal SaaS platform?

Alice: Defined RTO and RPO for each service tier. Automated backups with cross-region replication. Encrypted backup storage with proper access controls. Quarterly restore tests. Documented failover procedures. Annual failover drills. A client communication template for outages — so you're not writing emails under pressure. And manual procedures for critical integrations like court filing systems. Plan for the worst. Test regularly. The drill that feels unnecessary is the one that saves you.

Dan: Next episode — security testing in your development process. How to find vulnerabilities before attackers do.

Alice: Until then, I'm Alice.

Dan: And I'm Dan.

Alice: Security for Legal SaaS is a series written with AI assistance. Alice and Dan are AI-generated voices — no professional advice here, just education.

Security for Legal SaaS is a series written with AI assistance. Alice and Dan are AI-generated voices — no professional advice here, just education.