Exercise Typology & Effectiveness Matrix

Evidence-Based Comparative Analysis of 20 Pandemic & Strategic Exercises (1983–2025)

20 exercises × 8 frameworks × 4 metrics = 240 data points

F1 HSEEP F2 WHO-EPPP F3 GHSI F4 NATO-WG F5 UKMOD-WG F6 UKMOD-RT F7 SCENARIO F8 5MM

Scoring Methodology & Framework Citations

Each exercise is scored 1–5 across 8 framework dimensions plus 4 implementation metrics. Scores are derived from published AARs, peer-reviewed literature, de-classified government reports, and FOI documents.

F1 — HSEEP Exercise Type
FEMA / DHS, 2020 Revision
Homeland Security Exercise and Evaluation Program. Seven-type taxonomy: Seminar → Workshop → TTX → Game → Drill → Functional Exercise → Full-Scale Exercise. Progressive complexity scale maps to operational realism. Score: 1 = Seminar/Workshop, 2 = TTX, 3 = Game/War Game, 4 = Functional Exercise, 5 = Full-Scale Exercise.
FEMA (2020). Homeland Security Exercise and Evaluation Program Doctrine. Rev 2-2-25. fema.gov/emergency-managers/national-preparedness/exercises/hseep
F2 — WHO Process Quality
WHO, 2017 (WHO-WHE-CPI-2017.10)
WHO Simulation Exercise Manual — 7-component lifecycle: Selection, Planning, Scenario Development, Pandemic Description, Evaluation Planning, Staging, Post-Exercise Actions. Score: number of WHO-EPPP components fully evidenced (1–5 scale, where 5 = all 7 components documented, 4 = 5–6, 3 = 3–4, 2 = 1–2, 1 = none).
WHO (2017). Simulation Exercise Manual. WHO-WHE-CPI-2017.10. who.int/publications/i/item/WHO-WHE-CPI-2017.10 • Reddin et al. (2021). PMC8020603.
F3 — GHS Capability Coverage
NTI / Johns Hopkins / Economist Impact, 2021
Global Health Security Index — 6 categories: Prevention, Detection & Reporting, Rapid Response, Health System, Compliance with International Norms, Risk Environment. 37 indicators, 96 sub-indicators, 171 questions. Score: number of GHS categories meaningfully tested (1–5 scale, 5 = all 6 categories).
NTI/JHU/EIU (2021). GHS Index Methodology. ghsindex.org/wp-content/uploads/2021/11/2021_GHSindex_Methodology_FINAL.pdf
F4 — NATO Wargaming Type
NATO ACT, 2023
NATO Wargaming Handbook — 3-type taxonomy (Educational, Experiential, Analytical) with 5-phase lifecycle (Initiate, Design, Develop, Execute, Analyse). Includes deductive/inductive/abductive analysis modes. Score: 1 = Educational only, 2 = Experiential, 3 = Analytical (single method), 4 = Analytical (mixed methods), 5 = Full analytical with multi-phase lifecycle.
NATO ACT (2023). Wargaming Handbook. act.nato.int • NATO WIN24. act.nato.int/wp-content/uploads/2024/08/WIN24_Booklet.pdf
F5 — UK MOD Wargame Rigour
UK MOD DCDC, 2017
UK MOD Wargaming Handbook — Classifies: Seminar, COA Wargame, Matrix Game, Kriegsspiel, Business Wargame. 5-step lifecycle linked to Defence Cycle of Research. Score: 1 = Seminar-level, 2 = COA, 3 = Matrix game, 4 = Kriegsspiel/complex, 5 = Full defence research cycle integrated.
UK MOD DCDC (2017). Wargaming Handbook. assets.publishing.service.gov.uk • professionalwargaming.co.uk
F6 — Red Team Quality
UK MOD DCDC, 2021 (3rd Ed.)
UK MOD Red Teaming Handbook — Structured adversarial thinking: Alternative Analysis, Pre-Mortem, Devil's Advocacy, Team A/B. Score: 1 = No adversarial element, 2 = Basic contrarian role, 3 = Structured red cell, 4 = Multi-method red teaming, 5 = Independent red team with full alternative analysis.
UK MOD DCDC (2021). Red Teaming Handbook, 3rd Ed. gov.uk/government/publications/a-guide-to-red-teaming
F7 — Scenario Planning Rigour
Shell/GBN, Schwartz, 1991+
Shell/GBN Intuitive Logics — 8-step process: Focal Issue → Key Forces → Driving Forces → Rank by Importance/Uncertainty → Scenario Logics (2×2) → Flesh Out Scenarios → Implications → Early Indicators. Score: number of IL steps evidenced (1–5 scale).
Schwartz, P. (1991). The Art of the Long View. Currency/Doubleday. • Bradfield, R. et al. (2005). Calif. Mgmt. Rev. 48(1). • Shell (2013). Scenarios 40-Year Report.
F8 — Epistemological Quality (5MM)
Shappell, Oxford ISR, 2024
Five Methodological Machineries: Representation, Consequential Decision-Making, Adjudication, Immersion, Bespoke Design. Academic framework assessing epistemological quality of wargames as knowledge-production instruments. Score: number of machineries robustly present (1–5).
Shappell (2024). The Methodological Machinery of Wargaming. International Studies Review 26(1). doi:10.1093/isr/viae002

Implementation Metrics (M1–M4)

Derived from After-Action Reports, government inquiries, FOI documents, and retrospective COVID-19 validation.

M1 — Findings Transparency
Evidence: Published AARs, FOI disclosures
Degree to which exercise findings, recommendations, and AARs are publicly available. 1 = Classified/no disclosure, 2 = Partial leak/FOI, 3 = Summary published, 4 = Full AAR published, 5 = Full AAR + data + methodology published.
Reddin et al. (2021). Evaluating simulations as preparation for health crises. BMC Public Health. PMC8020603.
M2 — Implementation Rate
Evidence: COVID-19 inquiries, government audits
Percentage of exercise recommendations implemented before the next relevant real-world event. 1 = <10% implemented, 2 = 10–25%, 3 = 25–50%, 4 = 50–75%, 5 = >75% implemented.
UK COVID-19 Inquiry (2024). Module 1 Report. • US GAO Reports on pandemic preparedness gaps.
M3 — Predictive Accuracy
Evidence: Retrospective COVID-19 validation
How accurately the exercise scenario predicted real-world outcomes (validated against COVID-19 and other events). 1 = No predictive value, 2 = Marginal, 3 = Moderate overlap, 4 = Strong prediction, 5 = Remarkably prescient.
Johns Hopkins CHS (2019). Event 201 Recommendations vs COVID-19 Reality. • UK Cygnus findings vs COVID-19 outcomes.
M4 — Time-to-Implementation
Evidence: Post-exercise audit trails
Speed of converting findings to policy/operational changes. 1 = Never implemented, 2 = >5 years, 3 = 2–5 years, 4 = 6–24 months, 5 = <6 months. Scored inversely: faster = higher.
FEMA (2020). HSEEP Improvement Planning Guide. • WHO (2017). Post-Exercise Action tracking.

Universal Scoring Scale

1
Minimal / Absent
2
Basic / Partial
3
Moderate
4
Strong / Robust
5
Exemplary / Full

Full Scoring Matrix — 20 × 12

Click column headers to sort. Hover scores for evidence citations. Colour: 4–5 3 1–2

AVG = row average across all 12 dimensions • Scores sourced from published AARs, FOI documents, peer-reviewed literature, and government inquiry reports

Pairwise Similarity Heatmap

Cosine similarity across all 12 dimensions. Darker green = more similar exercise profiles. Diagonal = 1.00 (self-similarity).

0.00 (Dissimilar)
1.00 (Identical)

GHS Capability Radar Overlay

Select up to 4 exercises to compare their 12-dimension profiles on a spider chart.

Implementation Gap Analysis

X-axis: Average Framework Score (design quality) | Y-axis: Implementation Rate (M2). Quadrants reveal which well-designed exercises failed to drive change.

Q1 — High Design, High Implementation
Gold standard exercises
Q2 — High Design, Low Implementation
Wasted potential — systemic failure
Q3 — Low Design, Low Implementation
Expected — poor design, poor follow-through
Q4 — Low Design, High Implementation
Simple but effective exercises

Exercise Design Decision Tool

Select your criteria weights to find the most relevant historical exercise for your planning needs.

Low5High
Low5High
Low5High