Causal Investigation

Overview

The Causal Investigation Framework is the territory framework for T4 — the operations that take an outcome, symptom, or pattern of events and trace backward to causes, mechanisms, or generative structures. The territory exists because “why did this happen” is a structurally different question from “how does this currently work” (T17 — process analysis) and from “how do the parts produce the whole’s behavior” (T16 — mechanism understanding). T4 answers questions that begin with something having gone wrong and the user wanting to know why — not how to do it differently, not how the system normally operates, but why this specific failure happened, why it keeps happening despite intervention, or what structure is generating the recurring pattern.

The framework hosts four modes arranged along a complexity axis. Root-cause-analysis is the lightest path — a backward-chain trace from observed symptom to root cause using Ishikawa fishbone categorization (6M for manufacturing, 4P for service, 4S for organizational, 8P for process) and the five-whys protocol with one operational rule: one more “why” on each candidate root before stopping. No chain terminates at human error without a process or incentive sub-cause beneath it (Dekker’s Just Culture discipline). Systems-dynamics-causal handles recurring symptoms driven by feedback loops — the Forrester/Senge tradition with explicit system boundary, variables, feedback loops with polarity (R or B, polarity-parity rule as verification check), delays, named archetypes, and leverage points ranked by Meadows depth. Causal-dag is the formal graph analysis using Pearl’s framework — locking the question at a Pearl-ladder rung (observation, intervention, or counterfactual), classifying variables by role (cause / effect / confounder / mediator / collider / instrument), and applying the back-door or front-door criterion to check identifiability. Process-tracing is the historical-event-specific pass using Bennett-Checkel test classification (hoop = necessary not sufficient; smoking-gun = sufficient not necessary; doubly-decisive = both; straw-in-the-wind = neither) with explicit Bayesian updating per evidence piece.

The framework’s load-bearing intellectual content is the complexity-axis disambiguation that selects among the four modes and the process-not-people discipline that keeps root-cause-analysis from terminating at human error. “One thing went wrong, chain back to root” routes to root-cause-analysis; “feedback or things reinforcing each other” routes to systems-dynamics-causal; “formal causal model with arrows” routes to causal-dag; “specific historical event, step by step” routes to process-tracing. The default when ambiguous is root-cause-analysis with an escalation hook to systems-dynamics-causal — root-cause surfaces feedback signals naturally during execution and the escalation lands cleanly when the analysis reveals the symptom is structural rather than chain-shaped.

The process-not-people discipline defends against the most common root-cause failure mode: chains that terminate at “the engineer ran the wrong command” without naming the process that permitted or incentivized the behavior. The five-whys protocol’s one more “why” rule catches this. Premature termination at human error produces actionable-feeling but ineffective interventions (more training, more attention) instead of the structural changes (runbook confirmation steps, interface affordances, incentive realignment) that actually prevent recurrence.

The framework is honest about what T4 does not do. It does not produce process maps of how the system normally works (that is T17). It does not explain how the parts produce the whole’s behavior at the principle level (that is T16). It does not adjudicate which paradigm is correct (that is T9 — frame-as-cause is the boundary case). When formal Bayesian diagnosticity is needed to weigh competing causal hypotheses against evidence, the framework escalates sideways to T5 (competing-hypotheses or bayesian-hypothesis-network) rather than stretching root-cause-analysis beyond its lane.

Systemic context

The Causal Investigation Framework is the territory framework for T4 in Ora’s territory map. It composes downward into four mode files (Modes/root-cause-analysis.md, Modes/systems-dynamics-causal.md, Modes/causal-dag.md, Modes/process-tracing.md). It composes outward through four cross-territory adjacencies — T4↔T9 (within-frame causes vs. frame-as-cause); T4↔T16 (backward-to-causes vs. how-the-parts-produce-the-whole); T4↔T17 (why-it-keeps-happening vs. how-it-currently-works); T4↔T5 (causal-investigation vs. formal-Bayesian-hypothesis-evaluation when multiple causal hypotheses must be adjudicated). The four-stage pre-routing pipeline classifies prompts into T4 when the input is an outcome or pattern and the question is why did this happen. The complexity disambiguation question then dispatches to the right mode along the complexity axis.

Ora’s output

Example question: Our customer-onboarding flow has a 35% drop-off rate at step 4 of 7. We have tried three different fixes over the past year — better copy, a progress indicator, removing one field — and the drop-off rate has not moved. What is the right way to figure out what is actually going on?

Ora’s response (running the territory framework’s disambiguation):

The fact that three interventions have failed to move the metric is itself diagnostically important — it suggests either you are addressing symptoms rather than root cause, or the structure producing the drop-off is a feedback loop that the interventions did not target, or both. Let me walk through T4’s complexity-axis disambiguation and recommend a mode.

T4’s central disambiguation question. “Is the question more like ‘what one thing went wrong here,’ or more like ‘what set of things keep producing this,’ or do you want a formal causal model with arrows you can reason over, or are you tracing how a specific historical event actually unfolded?” Four plain-English answers route differently.

Root-cause-analysis (Tier-2). If you read the situation as “one thing is causing the drop-off and we need to find it and fix it,” that is root-cause territory. The mode declares the categorization framework first (8P for process — Product, Price, Place, Promotion, People, Process, Physical evidence, Productivity — fits an onboarding flow well), populates each category with candidate causes, applies five-whys to deepen each branch with one operational rule: one more “why” on each candidate root before stopping. The mode would specifically refuse to terminate at “users do not understand the field” without surfacing the process that permitted the field to be unclear (interface design assumptions, copy review process, user-research feedback loop).

Systems-dynamics-causal (Tier-2). If you read the situation as “the drop-off is being held in place by something structural we have not surfaced — interventions targeting individual aspects of step 4 keep failing,” that is feedback-loop territory. The mode states the system boundary explicitly (the onboarding flow plus its surrounding context — support, refunds, customer-success follow-up), identifies variables (drop-off rate, support-ticket volume, refund rate, customer-success time-to-touch), draws feedback loops with polarity, surfaces delays between intervention and observed effect, names system archetypes (Fixes That Fail is the canonical archetype for “we keep trying interventions that look reasonable and the metric does not move”), and ranks leverage points by Meadows depth.

Causal-dag (Tier-3). If you want a formal causal model with arrows you can reason over — confounders, mediators, colliders all named, identifiability checked, and an answer to “what would happen if we intervened on X” produced under explicit assumptions — that is Pearl-ladder territory. Useful when the data exists to populate the DAG and the question warrants the formal lift; less useful for quick-turn investigation.

Process-tracing (Tier-3). If the question is really about a specific historical onboarding instance (a specific user’s drop-off, traced step by step through their session data and support transcript), that is Bennett-Checkel territory. The mode locks the case, names competing causal hypotheses, classifies each evidence piece by test type (hoop, smoking-gun, doubly-decisive, straw-in-the-wind), and reconstructs the causal chain in temporal sequence.

For your onboarding drop-off specifically. Two diagnostic signals point me toward systems-dynamics-causal. First, three interventions targeting different aspects of step 4 have all failed — that is the canonical Fixes-That-Fail signature. Second, you described the situation as a recurring pattern resistant to intervention, not as a one-off failure with an identifiable root. The systems-dynamics analysis would surface whether the drop-off is held in place by a balancing loop (something in the system actively counter-acts your interventions), a delayed reinforcing loop (the interventions reduce drop-off temporarily but trigger a downstream effect that brings it back), or shifting-the-burden (interventions addressing surface symptoms while the underlying capability erodes).

Sequencing. Start with systems-dynamics-causal to surface the structural picture. If the analysis reveals a single cause-chain rather than a feedback structure, the escalation hook routes back to root-cause-analysis. If the dispute turns out to be about how the onboarding flow is being framed rather than about the flow itself, that is T9 territory. If multiple causal hypotheses must be weighed against evidence with formal probabilistic rigor, the sideways escalation to T5 (Bayesian Hypothesis Network Analysis) applies.

One caution. The framework will refuse to let any chain terminate at “users are at fault” or “the support team did not flag this earlier” without a process or incentive sub-cause beneath. The process-not-people discipline is load-bearing for producing root causes whose removal actually prevents recurrence.

Commercial AI comparison

Comparison content auto-populates when the comparison-refresh framework runs against this question. Drafters do not author this section.

Brief comparison commentary

Auto-populates with the comparison content above.

How to use this framework

You can run the Causal Investigation pattern with any AI of your choice. The composition is single-pass against the complexity disambiguation followed by dispatch to the selected mode.

The prompt:

[Paste the framework specification]

Run T4 disambiguation on this question.

Outcome or pattern: [What has gone wrong, or what keeps going wrong.]

History (optional): [What you have tried and what has happened — useful for surfacing Fixes-That-Fail archetype signatures and for distinguishing one-off failures from structural patterns.]

Domain (optional): [Manufacturing / service / organizational / process / historical-event — informs the categorization framework selection in root-cause-analysis.]

Available data (optional): [What evidence you have access to — relevant for causal-dag identifiability and for process-tracing evidence inventory.]

The AI runs T4’s complexity question, identifies the right mode, and dispatches. The output follows the mode-specific structure (root-cause-analysis: presented problem, framework, category analysis, root causes vs. contributing factors, evidence assessment, recommendations, confidence; systems-dynamics-causal: system boundary, variables, feedback loops with polarity, delays, archetypes, leverage points, counterintuitive behaviors; causal-dag: question locked at Pearl rung, variable inventory with roles, DAG specification, identifiability verdict, intervention or counterfactual answer, assumption inventory; process-tracing: case and question, competing hypotheses, evidence inventory with provenance, test classification per evidence piece, hypothesis status, causal chain reconstruction, residual uncertainty).

For best results:

Be honest about what you have already tried. The history of failed interventions is diagnostic. Three interventions targeting symptom aspects with no movement of the metric is the Fixes-That-Fail signature; one intervention targeting a candidate root cause with no movement is a different kind of signal.
Do not pre-select the mode. The complexity disambiguation is the framework’s value. If you walk in saying “do a five-whys” when your situation is feedback-driven, you will get a five-whys that bottoms out in “the system is the way it is” without surfacing the structure.
Accept the process-not-people discipline. When the framework refuses to let a chain terminate at human error without a process sub-cause, that is the framework working as designed. Pushing back against the discipline produces actionable-feeling recommendations that do not prevent recurrence.
Take the leverage-point ranking seriously in systems-dynamics-causal. Parameters (Meadows leverage 12) are the lowest-leverage interventions; paradigm changes (Meadows 2) are the highest. Recommendations that look like “tweak the parameter” presented as “systemic intervention” are exactly the named failure mode the mode is defending against.

The framework is deliberately tool-agnostic. The four-mode taxonomy, the complexity-axis disambiguation, the process-not-people discipline, the polarity-parity rule for feedback loops, the Pearl ladder for causal-DAG, and the Bennett-Checkel test classification for process-tracing all survive the lift to any environment.

Other examples

A manufacturing-line defect rate that has crept up over six months. The disambiguation routes to root-cause-analysis with the 6M framework. Five-whys surfaces a Materials root cause (supplier-quality drift the procurement process did not catch) plus a Methods contributing factor (QA sampling rate reduced months earlier). Demonstrates root-cause-analysis on a domain where the canonical categorization is well-fit.
A startup’s repeated-pivot pattern over three years. The disambiguation routes to systems-dynamics-causal because the pattern is recurring and prior interventions have not changed it. The mode surfaces a reinforcing loop: each pivot accumulates founder-credibility debt with the team, making the next pivot’s commitment shakier, making early validation harder, producing another pivot. Leverage-point analysis identifies parameter tweaks (validation discipline) at Meadows 12 while structural interventions (founder-team contracting about pivot conditions) sit at Meadows 5–7. Demonstrates systems-dynamics-causal on patterns invisible to parameter-level interventions.
A historical investigation tracing why a specific government policy was adopted at a specific moment. The disambiguation routes to process-tracing. The mode locks the case, names three competing hypotheses (electoral-pressure, bureaucratic-momentum, ideological-shift), inventories evidence (cabinet papers, memos, press coverage, oral histories), classifies by test type, and reconstructs the causal chain. The hoop-test on one piece eliminates the bureaucratic-momentum hypothesis; the smoking-gun test on another strongly confirms electoral-pressure. Demonstrates process-tracing on the historical-event case for which Bennett-Checkel was designed.

Citations

The Causal Investigation Framework draws on four converging traditions. Root-cause and quality engineering: Ishikawa’s Guide to Quality Control (1968) for the fishbone diagram; Ohno’s Toyota Production System (1988) for five-whys via Sakichi Toyoda; Reason’s Human Error (1990) for the Swiss-cheese model; Dekker’s Just Culture (2007) for the process-not-people reframing the framework’s human-error-terminal failure mode enforces. Systems dynamics: Forrester’s Industrial Dynamics (1961); Senge’s The Fifth Discipline (1990) for system archetypes; Sterman’s Business Dynamics (2000); Meadows’ Thinking in Systems (2008) for the twelve leverage points. Causal inference (Pearl tradition): Pearl’s Causality (2009) and The Book of Why (2018) for do-calculus and the ladder of causation; Hernán and Robins’ Causal Inference: What If (2020). Process tracing: Bennett and Checkel’s Process Tracing: From Metaphor to Analytic Tool (2015); van Evera’s Guide to Methods for Students of Political Science (1997) for the test-classification origin.

The complexity-axis disambiguation and four-mode dispatch are internal to Ora, emerging from the observation that conventional “root cause analysis” treats all backward-causal questions as single-chain failures — producing poorly-fit five-whys against feedback patterns and informal cause-chains against historical questions where Bennett-Checkel would be more rigorous. The framework was compiled 2026-05-01; v1.0 is the current version.

Downloads

Framework specification (PDF) — link to ora-ai.org canonical artifact when published
Framework specification (plain text) — link to ora-ai.org canonical artifact when published
Full white paper (PDF) — link when published