What 40 Years of Exercises Tell Us

A synthesis of cross-cutting findings from 20 pandemic exercises, crisis simulations, and strategic war games (1983–2025). This is the evidence chain — from prediction to failure to consequence.

1

Surge Capacity Collapses Within Hours, Not Days

Every exercise that tested hospital surge found catastrophic failure. ICU capacity exhausted within 24–48 hours. Ventilator supply chains collapse. Staff ratios become untenable. COVID-19 validated this across every nation — Italy (March 2020), UK (April 2020), US (Winter 2020–21).
Dark Winter
2001
Hospitals overwhelmed within first scenario turn. Smallpox cases exceeded all available isolation beds.
View →
TOPOFF 1–4
2000–07
Real hospital surge tested. Denver hospitals hit capacity in TOPOFF 1 within hours of simulated plague release.
View →
Crimson Contagion
2019
Draft AAR found “federal government lacks the capacity to surge” — 4 months before COVID.
View →
Exercise Cygnus
2016
UK NHS surge failed in simulation. 22 recommendations made. Zero implemented before COVID.
View →
Sources: CSIS Dark Winter AAR (2001) • FEMA TOPOFF 1 AAR (2000) • HHS Crimson Contagion Draft AAR (2019, leaked) • UK Cygnus AAR (2017, FOI 2020) • UK COVID-19 Inquiry Module 1 (2024)
2

Vaccine Nationalism Overrides International Cooperation Every Time

When vaccine supply is limited, every exercise shows nations prioritising domestic populations over global equity. Atlantic Storm (2005) predicted this 15 years before COVAX struggled. Event 201 (2019) explicitly warned about equitable distribution. Lock Step (2010) described authoritarian pandemic governance. The pattern is: cooperation collapses at the point of scarcity.
Atlantic Storm
2005
10 heads of state chose national stockpiling over WHO allocation. Vaccine inequality predicted 15 years before COVID.
View →
Event 201
2019
Recommendation 6: equitable global distribution. Failed in COVID — wealthy nations secured 4B+ doses before LMIC access.
View →
Lock Step
2010
Scenario predicted authoritarian pandemic governance and national-first responses. Validated by export bans in 2020–21.
View →
NTI Bio
2021
Munich exercise found no mechanism for equitable MCM distribution. Recommended international stockpile — still unbuilt.
View →
Sources: Smith et al. “Navigating the Storm” (2005) • JHU/CHS Event 201 Recommendations (2019) • Rockefeller Foundation “Scenarios for the Future” (2010) • NTI Munich AAR (2022)
3

Communication Fragments Under Pressure

Interoperable communications fail technically (TOPOFF), strategically (SPARS predicted social media misinformation), and politically (Cygnus showed Whitehall messaging chaos). COVID-19 saw contradictory guidance between agencies, nations, and levels of government simultaneously.
TOPOFF 1
2000
Radio interoperability failed between first responder agencies in Denver. Same failure occurred on 9/11.
View →
SPARS
2017
Predicted vaccine misinformation, social media amplification, and public trust erosion. COVID anti-vax movement validated.
View →
Winter Willow
2007
UK cross-government communication failed under simulated pandemic load. Messaging coordination absent.
View →
Sources: FEMA TOPOFF AAR (2000) • JHU SPARS Pandemic Scenario (2017) • UK Winter Willow AAR (2007, partial release)
4

International Coordination Exists on Paper Only

The IHR (2005) was adopted partly because of Atlantic Storm. The JEE process was designed to verify compliance. But exercises consistently show that multilateral response mechanisms are untested and non-operational. WHO authority is advisory, not directive. Regional cooperation depends on individual relationships, not systems.
Atlantic Storm
2005
WHO budget for bioterror: $6.3M. “Like a mid-sized hospital.” Directly influenced IHR 2005 adoption.
View →
Global Mercury
2003
WHO-led smallpox TTX found zero mechanism for coordinated international bioterror response.
View →
Mataika
2023
Pacific regional exercise showed small island states lack basic surveillance capacity. Regional coordination depends on Australia/NZ.
View →
Catastrophic Contagion
2022
Post-COVID TTX still found no operational mechanism for coordinated pandemic response at WHO level.
View →
Sources: Atlantic Storm Report (2005) • IHR Review Committee (2005) • WHO/SPC Mataika AAR (2023) • JHU/WHO Catastrophic Contagion Summary (2023)
5

The System Fails to Learn From Its Own Tests

This is the meta-finding. The most damning pattern is not any individual failure — it is the systematic inability to convert exercise findings into policy action. Cygnus (2016) predicted UK COVID failures exactly — zero of 22 recommendations implemented. Crimson Contagion (2019) found US federal response gaps 4 months before COVID — zero changes. Event 201 published 7 recommendations 3 months before COVID — none adopted. The exercises work. The implementation pipeline is broken.
Cygnus → COVID
2016 → 2020
22 recommendations. 0 implemented. UK COVID-19 Inquiry concluded: “the UK was not prepared.”
View →
Crimson Contagion → COVID
2019 → 2020
Found federal response gaps. Completed Aug 2019. COVID arrived Jan 2020. Zero changes made in 4 months.
View →
Event 201 → COVID
Oct 2019 → Jan 2020
7 public recommendations. Coronavirus pandemic began 3 months later. None adopted in time.
View →
Sources: UK COVID-19 Inquiry Module 1 (2024) • NYT Crimson Contagion Investigation (2020) • Reddin et al. BMC Public Health PMC8020603 (2021) • JHU/CHS Event 201 Recommendations (2019)
6

The Exercise Portfolio Has Structural Blind Spots

Only 7/20 exercises test bioterrorism. Only 2 test nuclear scenarios. Zero test AI-enhanced biological threats, synthetic biology attacks, or combined cyber-bio scenarios. The GHSI identifies “Prevention” as Category 1 — yet prevention-focused exercises are the rarest type. We are testing response when we should be testing detection and prevention.
Pandemic-focused
10 of 20
50% of exercises test natural pandemic response. Overrepresented relative to threat landscape.
Bioterror-focused
7 of 20
35% test deliberate biological attacks. Mostly smallpox scenarios — narrow pathogen range.
Nuclear/Military
3 of 20
15% test strategic/nuclear scenarios. Concentrated in 1983 Cold War era. None since 2002.
AI + Bio
0 of 20
Zero exercises test AI-enhanced bioweapons, CRISPR-enabled threats, or cyber-bio convergence.
Sources: PSEF-X ETEM Matrix Analysis • GHSI 2021 Methodology • RAND Bio-Domain Research (2024) • NTI Nuclear Threat Assessment (2024)
7

Exercise Rigour Is Declining, Not Improving

Cold War exercises (Able Archer, Proud Prophet) featured genuine adversarial play with real consequences. Proud Prophet changed Reagan’s nuclear policy. Millennium Challenge had a genuine red team that defeated the blue force. Modern pandemic exercises trend toward scripted TTXs with predetermined conclusions. ETEM data shows average Red Team Quality declining from 5.0 (1983) to 2.1 (2020s). We are testing less rigorously while facing more complex threats.
Proud Prophet
1983
Red Team quality: 5/5. Genuine adversarial play with Soviet doctrine. Changed Reagan’s nuclear policy within months.
View →
MC 2002
2002
Red Team quality: 5/5. Lt Gen Van Riper defeated Blue force. Then the exercise was reset — the system rejected the lesson.
View →
Modern TTXs
2017–25
Average Red Team quality: 2.1/5. Most exercises use “virus as opponent” without structured adversarial analysis.
View ETEM →
Sources: Shappell, ISR 26(1) (2024) • NATO ACT Wargaming Handbook (2023) • UK MOD Red Teaming Handbook 3rd Ed (2021) • PSEF-X ETEM Matrix scores

What This Means for Biological Response

These seven findings are not abstract — they are the operational reality that BioR exists to address. The PSEF Benchmark evaluates whether biosurveillance platforms can detect threats early enough to break the pattern. The Regulatory KB maps whether compliance frameworks are strong enough to enforce action. PSEF-X proves that without these systems, the same failures will repeat in the next pandemic.

Recommended Next Directions

1

Exercise AI + Bio Convergence Scenarios

No exercise has tested AI-enhanced bioweapons, CRISPR-based threats, or automated pathogen design. This is the most critical gap in the exercise portfolio.

2

Mandate Implementation Tracking

Every exercise should include a funded implementation plan with timeline, responsible parties, and public audit trail. The current model of “recommendations without accountability” has failed for 40 years.

3

Restore Adversarial Rigour

Return to genuine red teaming. Modern TTXs must include structured adversarial analysis (UK MOD methodology), not just “virus as opponent.” Exercises should be allowed to fail.

4

Connect Exercises to Surveillance Systems

Exercises should directly test biosurveillance platforms (PSEF-benchmarked systems) and regulatory compliance (RSKB frameworks). Currently, exercises and surveillance systems exist in separate silos.

5

Build the Evidence-to-Action Pipeline

PSEF-X documents the evidence. BioR provides the platform. The missing piece is a systematic mechanism to convert exercise findings into operational change — the thing that has been missing for 40 years.

Explore the ETEM Matrix — 20 × 12 Scoring