Synthesis — What 40 Years of Exercises Tell Us

Surge Capacity Collapses Within Hours, Not Days

Every exercise that tested hospital surge found catastrophic failure. ICU capacity exhausted within 24–48 hours. Ventilator supply chains collapse. Staff ratios become untenable. COVID-19 validated this across every nation — Italy (March 2020), UK (April 2020), US (Winter 2020–21).

Dark Winter

2001

Hospitals overwhelmed within first scenario turn. Smallpox cases exceeded all available isolation beds.

View →

TOPOFF 1–4

2000–07

Real hospital surge tested. Denver hospitals hit capacity in TOPOFF 1 within hours of simulated plague release.

View →

Crimson Contagion

2019

Draft AAR found “federal government lacks the capacity to surge” — 4 months before COVID.

View →

Exercise Cygnus

2016

UK NHS surge failed in simulation. 22 recommendations made. Zero implemented before COVID.

View →

Sources: CSIS Dark Winter AAR (2001) • FEMA TOPOFF 1 AAR (2000) • HHS Crimson Contagion Draft AAR (2019, leaked) • UK Cygnus AAR (2017, FOI 2020) • UK COVID-19 Inquiry Module 1 (2024)

Vaccine Nationalism Overrides International Cooperation Every Time

When vaccine supply is limited, every exercise shows nations prioritising domestic populations over global equity. Atlantic Storm (2005) predicted this 15 years before COVAX struggled. Event 201 (2019) explicitly warned about equitable distribution. Lock Step (2010) described authoritarian pandemic governance. The pattern is: cooperation collapses at the point of scarcity.

Atlantic Storm

2005

10 heads of state chose national stockpiling over WHO allocation. Vaccine inequality predicted 15 years before COVID.

View →

Event 201

2019

Recommendation 6: equitable global distribution. Failed in COVID — wealthy nations secured 4B+ doses before LMIC access.

View →

Lock Step

2010

Scenario predicted authoritarian pandemic governance and national-first responses. Validated by export bans in 2020–21.

View →

NTI Bio

2021

Munich exercise found no mechanism for equitable MCM distribution. Recommended international stockpile — still unbuilt.

View →

Sources: Smith et al. “Navigating the Storm” (2005) • JHU/CHS Event 201 Recommendations (2019) • Rockefeller Foundation “Scenarios for the Future” (2010) • NTI Munich AAR (2022)

Communication Fragments Under Pressure

Interoperable communications fail technically (TOPOFF), strategically (SPARS predicted social media misinformation), and politically (Cygnus showed Whitehall messaging chaos). COVID-19 saw contradictory guidance between agencies, nations, and levels of government simultaneously.

TOPOFF 1

2000

Radio interoperability failed between first responder agencies in Denver. Same failure occurred on 9/11.

View →

SPARS

2017

Predicted vaccine misinformation, social media amplification, and public trust erosion. COVID anti-vax movement validated.

View →

Winter Willow

2007

UK cross-government communication failed under simulated pandemic load. Messaging coordination absent.

View →

Sources: FEMA TOPOFF AAR (2000) • JHU SPARS Pandemic Scenario (2017) • UK Winter Willow AAR (2007, partial release)

International Coordination Exists on Paper Only

The IHR (2005) was adopted partly because of Atlantic Storm. The JEE process was designed to verify compliance. But exercises consistently show that multilateral response mechanisms are untested and non-operational. WHO authority is advisory, not directive. Regional cooperation depends on individual relationships, not systems.

Atlantic Storm

2005

WHO budget for bioterror: $6.3M. “Like a mid-sized hospital.” Directly influenced IHR 2005 adoption.

View →

Global Mercury

2003

WHO-led smallpox TTX found zero mechanism for coordinated international bioterror response.

View →

Mataika

2023

Pacific regional exercise showed small island states lack basic surveillance capacity. Regional coordination depends on Australia/NZ.

View →

Catastrophic Contagion

2022

Post-COVID TTX still found no operational mechanism for coordinated pandemic response at WHO level.

View →

Sources: Atlantic Storm Report (2005) • IHR Review Committee (2005) • WHO/SPC Mataika AAR (2023) • JHU/WHO Catastrophic Contagion Summary (2023)

The System Fails to Learn From Its Own Tests

This is the meta-finding. The most damning pattern is not any individual failure — it is the systematic inability to convert exercise findings into policy action. Cygnus (2016) predicted UK COVID failures exactly — zero of 22 recommendations implemented. Crimson Contagion (2019) found US federal response gaps 4 months before COVID — zero changes. Event 201 published 7 recommendations 3 months before COVID — none adopted. The exercises work. The implementation pipeline is broken.

Cygnus → COVID

2016 → 2020

22 recommendations. 0 implemented. UK COVID-19 Inquiry concluded: “the UK was not prepared.”

View →

Crimson Contagion → COVID

2019 → 2020

Found federal response gaps. Completed Aug 2019. COVID arrived Jan 2020. Zero changes made in 4 months.

View →

Event 201 → COVID

Oct 2019 → Jan 2020

7 public recommendations. Coronavirus pandemic began 3 months later. None adopted in time.

View →

Sources: UK COVID-19 Inquiry Module 1 (2024) • NYT Crimson Contagion Investigation (2020) • Reddin et al. BMC Public Health PMC8020603 (2021) • JHU/CHS Event 201 Recommendations (2019)

The Exercise Portfolio Has Structural Blind Spots

Only 7/20 exercises test bioterrorism. Only 2 test nuclear scenarios. Zero test AI-enhanced biological threats, synthetic biology attacks, or combined cyber-bio scenarios. The GHSI identifies “Prevention” as Category 1 — yet prevention-focused exercises are the rarest type. We are testing response when we should be testing detection and prevention.

Pandemic-focused

10 of 20

50% of exercises test natural pandemic response. Overrepresented relative to threat landscape.

Bioterror-focused

7 of 20

35% test deliberate biological attacks. Mostly smallpox scenarios — narrow pathogen range.

Nuclear/Military

3 of 20

15% test strategic/nuclear scenarios. Concentrated in 1983 Cold War era. None since 2002.

AI + Bio

0 of 20

Zero exercises test AI-enhanced bioweapons, CRISPR-enabled threats, or cyber-bio convergence.

Sources: PSEF-X ETEM Matrix Analysis • GHSI 2021 Methodology • RAND Bio-Domain Research (2024) • NTI Nuclear Threat Assessment (2024)

Exercise Rigour Is Declining, Not Improving

Cold War exercises (Able Archer, Proud Prophet) featured genuine adversarial play with real consequences. Proud Prophet changed Reagan’s nuclear policy. Millennium Challenge had a genuine red team that defeated the blue force. Modern pandemic exercises trend toward scripted TTXs with predetermined conclusions. ETEM data shows average Red Team Quality declining from 5.0 (1983) to 2.1 (2020s). We are testing less rigorously while facing more complex threats.

Proud Prophet

1983

Red Team quality: 5/5. Genuine adversarial play with Soviet doctrine. Changed Reagan’s nuclear policy within months.

View →

MC 2002

2002

Red Team quality: 5/5. Lt Gen Van Riper defeated Blue force. Then the exercise was reset — the system rejected the lesson.

View →

Modern TTXs

2017–25

Average Red Team quality: 2.1/5. Most exercises use “virus as opponent” without structured adversarial analysis.

View ETEM →

Sources: Shappell, ISR 26(1) (2024) • NATO ACT Wargaming Handbook (2023) • UK MOD Red Teaming Handbook 3rd Ed (2021) • PSEF-X ETEM Matrix scores

What This Means for Biological Response

These seven findings are not abstract — they are the operational reality that BioR exists to address. The PSEF Benchmark evaluates whether biosurveillance platforms can detect threats early enough to break the pattern. The Regulatory KB maps whether compliance frameworks are strong enough to enforce action. PSEF-X proves that without these systems, the same failures will repeat in the next pandemic.

Recommended Next Directions

Exercise AI + Bio Convergence Scenarios

No exercise has tested AI-enhanced bioweapons, CRISPR-based threats, or automated pathogen design. This is the most critical gap in the exercise portfolio.

Mandate Implementation Tracking

Every exercise should include a funded implementation plan with timeline, responsible parties, and public audit trail. The current model of “recommendations without accountability” has failed for 40 years.

Restore Adversarial Rigour

Return to genuine red teaming. Modern TTXs must include structured adversarial analysis (UK MOD methodology), not just “virus as opponent.” Exercises should be allowed to fail.

Connect Exercises to Surveillance Systems

Exercises should directly test biosurveillance platforms (PSEF-benchmarked systems) and regulatory compliance (RSKB frameworks). Currently, exercises and surveillance systems exist in separate silos.

Build the Evidence-to-Action Pipeline

PSEF-X documents the evidence. BioR provides the platform. The missing piece is a systematic mechanism to convert exercise findings into operational change — the thing that has been missing for 40 years.

Explore the ETEM Matrix — 20 × 12 Scoring