ETEM — Exercise Typology & Effectiveness Matrix

Scoring Methodology & Framework Citations

Each exercise is scored 1–5 across 8 framework dimensions plus 4 implementation metrics. Scores are derived from published AARs, peer-reviewed literature, de-classified government reports, and FOI documents.

F1 — HSEEP Exercise Type

FEMA / DHS, 2020 Revision

Homeland Security Exercise and Evaluation Program. Seven-type taxonomy: Seminar → Workshop → TTX → Game → Drill → Functional Exercise → Full-Scale Exercise. Progressive complexity scale maps to operational realism. Score: 1 = Seminar/Workshop, 2 = TTX, 3 = Game/War Game, 4 = Functional Exercise, 5 = Full-Scale Exercise.

FEMA (2020). Homeland Security Exercise and Evaluation Program Doctrine. Rev 2-2-25. fema.gov/emergency-managers/national-preparedness/exercises/hseep

F2 — WHO Process Quality

WHO, 2017 (WHO-WHE-CPI-2017.10)

WHO Simulation Exercise Manual — 7-component lifecycle: Selection, Planning, Scenario Development, Pandemic Description, Evaluation Planning, Staging, Post-Exercise Actions. Score: number of WHO-EPPP components fully evidenced (1–5 scale, where 5 = all 7 components documented, 4 = 5–6, 3 = 3–4, 2 = 1–2, 1 = none).

WHO (2017). Simulation Exercise Manual. WHO-WHE-CPI-2017.10. who.int/publications/i/item/WHO-WHE-CPI-2017.10 • Reddin et al. (2021). PMC8020603.

F3 — GHS Capability Coverage

NTI / Johns Hopkins / Economist Impact, 2021

Global Health Security Index — 6 categories: Prevention, Detection & Reporting, Rapid Response, Health System, Compliance with International Norms, Risk Environment. 37 indicators, 96 sub-indicators, 171 questions. Score: number of GHS categories meaningfully tested (1–5 scale, 5 = all 6 categories).

NTI/JHU/EIU (2021). GHS Index Methodology. ghsindex.org/wp-content/uploads/2021/11/2021_GHSindex_Methodology_FINAL.pdf

F4 — NATO Wargaming Type

NATO ACT, 2023

NATO Wargaming Handbook — 3-type taxonomy (Educational, Experiential, Analytical) with 5-phase lifecycle (Initiate, Design, Develop, Execute, Analyse). Includes deductive/inductive/abductive analysis modes. Score: 1 = Educational only, 2 = Experiential, 3 = Analytical (single method), 4 = Analytical (mixed methods), 5 = Full analytical with multi-phase lifecycle.

NATO ACT (2023). Wargaming Handbook. act.nato.int • NATO WIN24. act.nato.int/wp-content/uploads/2024/08/WIN24_Booklet.pdf

F5 — UK MOD Wargame Rigour

UK MOD DCDC, 2017

UK MOD Wargaming Handbook — Classifies: Seminar, COA Wargame, Matrix Game, Kriegsspiel, Business Wargame. 5-step lifecycle linked to Defence Cycle of Research. Score: 1 = Seminar-level, 2 = COA, 3 = Matrix game, 4 = Kriegsspiel/complex, 5 = Full defence research cycle integrated.

UK MOD DCDC (2017). Wargaming Handbook. assets.publishing.service.gov.uk • professionalwargaming.co.uk

F6 — Red Team Quality

UK MOD DCDC, 2021 (3rd Ed.)

UK MOD Red Teaming Handbook — Structured adversarial thinking: Alternative Analysis, Pre-Mortem, Devil's Advocacy, Team A/B. Score: 1 = No adversarial element, 2 = Basic contrarian role, 3 = Structured red cell, 4 = Multi-method red teaming, 5 = Independent red team with full alternative analysis.

UK MOD DCDC (2021). Red Teaming Handbook, 3rd Ed. gov.uk/government/publications/a-guide-to-red-teaming

F7 — Scenario Planning Rigour

Shell/GBN, Schwartz, 1991+

Shell/GBN Intuitive Logics — 8-step process: Focal Issue → Key Forces → Driving Forces → Rank by Importance/Uncertainty → Scenario Logics (2×2) → Flesh Out Scenarios → Implications → Early Indicators. Score: number of IL steps evidenced (1–5 scale).

Schwartz, P. (1991). The Art of the Long View. Currency/Doubleday. • Bradfield, R. et al. (2005). Calif. Mgmt. Rev. 48(1). • Shell (2013). Scenarios 40-Year Report.

F8 — Epistemological Quality (5MM)

Shappell, Oxford ISR, 2024

Five Methodological Machineries: Representation, Consequential Decision-Making, Adjudication, Immersion, Bespoke Design. Academic framework assessing epistemological quality of wargames as knowledge-production instruments. Score: number of machineries robustly present (1–5).

Shappell (2024). The Methodological Machinery of Wargaming. International Studies Review 26(1). doi:10.1093/isr/viae002

Implementation Metrics (M1–M4)

Derived from After-Action Reports, government inquiries, FOI documents, and retrospective COVID-19 validation.

M1 — Findings Transparency

Evidence: Published AARs, FOI disclosures

Degree to which exercise findings, recommendations, and AARs are publicly available. 1 = Classified/no disclosure, 2 = Partial leak/FOI, 3 = Summary published, 4 = Full AAR published, 5 = Full AAR + data + methodology published.

Reddin et al. (2021). Evaluating simulations as preparation for health crises. BMC Public Health. PMC8020603.

M2 — Implementation Rate

Evidence: COVID-19 inquiries, government audits

Percentage of exercise recommendations implemented before the next relevant real-world event. 1 = <10% implemented, 2 = 10–25%, 3 = 25–50%, 4 = 50–75%, 5 = >75% implemented.

UK COVID-19 Inquiry (2024). Module 1 Report. • US GAO Reports on pandemic preparedness gaps.

M3 — Predictive Accuracy

Evidence: Retrospective COVID-19 validation

How accurately the exercise scenario predicted real-world outcomes (validated against COVID-19 and other events). 1 = No predictive value, 2 = Marginal, 3 = Moderate overlap, 4 = Strong prediction, 5 = Remarkably prescient.

Johns Hopkins CHS (2019). Event 201 Recommendations vs COVID-19 Reality. • UK Cygnus findings vs COVID-19 outcomes.

M4 — Time-to-Implementation

Evidence: Post-exercise audit trails

Speed of converting findings to policy/operational changes. 1 = Never implemented, 2 = >5 years, 3 = 2–5 years, 4 = 6–24 months, 5 = <6 months. Scored inversely: faster = higher.

FEMA (2020). HSEEP Improvement Planning Guide. • WHO (2017). Post-Exercise Action tracking.

Universal Scoring Scale

Minimal / Absent

Basic / Partial

Moderate

Strong / Robust

Exemplary / Full

Exercise Typology & Effectiveness Matrix

Scoring Methodology & Framework Citations

Implementation Metrics (M1–M4)

Universal Scoring Scale

Full Scoring Matrix — 20 × 12

Pairwise Similarity Heatmap

GHS Capability Radar Overlay

Implementation Gap Analysis

Exercise Design Decision Tool