Bayesian Hypothesis Network Analysis

Overview

The Bayesian Hypothesis Network Analysis Framework (BHN) is the heaviest analytical operation in T5 (hypothesis evaluation). It exists for the case where a flat ranking of competing hypotheses is not enough — where the user has prior probabilities anchored in base rates, where hypotheses are interdependent (one’s truth shifts another’s prior), and where the goal is a calibrated probabilistic posterior with sensitivity analysis identifying which evidence items most shift the answer. BHN composes T5’s two atomic siblings (differential-diagnosis and competing-hypotheses) into a Bayesian network with explicit priors, likelihood structure, conditional dependencies, and posterior update.

The framework runs the two component operations in different modes. Stage 1 (differential-diagnosis, fragment) generates a wide candidate hypothesis set: the dominant-narrative hypothesis, at least one orthogonal hypothesis (different mechanism), at least one combination hypothesis (multiple mechanisms together), and the null hypothesis (the phenomenon is noise or self-resolving). Each carries a rough prior probability anchored in base rate or domain knowledge — or is explicitly flagged as a flat-prior assumption when no anchor exists. The stage runs in fragment mode (hypothesis-list-with-priors-only); the full diagnostic triage is Stage 2’s work. Stage 2 (competing-hypotheses, full) produces the full Heuer Analysis of Competing Hypotheses (ACH) matrix: evidence inventory with credibility and relevance ratings, consistency matrix where every cell is populated using Heuer vocabulary (CC very consistent, C consistent, N neutral, I inconsistent, II very inconsistent, NA not applicable), diagnosticity assessment per evidence row, tentative conclusions via elimination (the surviving hypothesis is the one with the fewest I+II cells), and sensitivity analysis identifying which evidence reversals would change the ranking. The work is performed across the matrix (one evidence item against all hypotheses), not down it (collecting evidence for a favored hypothesis).

The framework’s load-bearing intellectual content lives in the three synthesis stages. Synthesis Stage 1 (prior elicitation) consolidates the hypothesis set with explicit priors, flags fabricated priors (round numbers with no anchor), reanchors or downgrades to flat-prior assumption where appropriate, and performs the MECE check naming explicitly when the hypothesis set is non-MECE (hypotheses overlap; conjunction hypotheses exist). Synthesis Stage 2 (Bayesian network construction) builds the network: hypotheses as nodes with priors P(H), evidence items as nodes with likelihoods P(E|H) derived from Stage 2’s consistency matrix (CC ≈ 0.8, C ≈ 0.6, N ≈ base rate, I ≈ 0.2, II ≈ 0.05, NA excluded), and conditional dependencies surfaced explicitly when one hypothesis’s truth shifts another’s prior. Independence is the default; when hypotheses share underlying mechanism, naming independence as the explicit assumption is required. Synthesis Stage 3 (posterior update) computes P(H|E) ∝ P(H) × P(E|H) normalized over the hypothesis set, expresses posteriors as ranges (false precision is a failure mode), performs sensitivity analysis identifying dominant evidence items, and names the leading hypothesis with residual uncertainty made explicit.

The framework’s epistemological discipline is honesty about uncertainty at every stage. Priors that cannot be anchored to base rates are flagged as flat-prior assumptions rather than fabricated as point estimates; conditional dependencies are named or independence is named as the explicit assumption; posteriors are expressed as ranges with sensitivity analysis; non-MECE hypothesis structures are named explicitly. The four named failure modes — prior-fabrication, independence-assumption-collapse, sensitivity-omission, mece-violation-unnamed — are the framework’s defenses against common failures of probabilistic reasoning under real-world conditions where inputs are imperfect.

The framework is honest about what it does not do. It operates within a frame, weighing within-frame hypotheses against evidence; when the dispute is really about which paradigm is correct (different mechanisms invoking different theoretical frames), the right move is to escalate sideways to T9 frame-comparison or worldview-cartography rather than to weigh incommensurable hypotheses against each other in a Bayesian network. The framework also does not fabricate the prior anchors when they are absent — the honest move when base rates are unavailable is to document flat-prior assumption and let the posterior reflect the uncertainty.

Systemic context

BHN sits at the top of T5’s depth ladder, above the two atomic component modes (differential-diagnosis for medical-tradition triage; competing-hypotheses for full Heuer ACH). It is invoked three ways: directly by the user when calibrated probability output is the goal; by the four-stage pre-routing pipeline when a prompt classifies as T5-depth-molecular; or by handoff from competing-hypotheses when an ACH analysis surfaces conditional dependencies the qualitative matrix cannot capture. BHN’s downstream handoffs route to T9 (frame-comparison or worldview-cartography) when the dispute is paradigmatic rather than within-frame. Lens dependencies span Heuer’s ACH diagnosticity (required), Pearl’s do-calculus (optional, when network has causal interpretation), Tetlock’s superforecasting calibration discipline (optional, when long-horizon hypotheses), Kahneman-Tversky’s bias catalog (foundational), and Knightian risk-uncertainty-ambiguity (foundational).

Ora’s output

Example question: Our service’s p99 latency has degraded from 80ms to 220ms over the past six weeks. We have changed code, infrastructure, and traffic mix in that window. Help me figure out what is most likely causing it — I want a probabilistic read, not just a list.

Ora’s response (running BHN on the latency regression):

Three changes in the same window means competing causal hypotheses are all in play and the question is which to weight most heavily for investigation. BHN is the right operation here because you have priors anchorable in your team’s history, the hypotheses are interdependent (the combination hypothesis matters), and you want a calibrated posterior rather than a flat ranking.

Stage 1 (differential-diagnosis fragment). Six candidate hypotheses with prior anchors.

H1 (dominant): recent code change introduced inefficient query — prior ~0.45 (anchored: this team’s last 12 latency regressions, 5 traced to query changes). H2 (orthogonal): infrastructure change (Kubernetes node pool migration) caused noisy-neighbor — prior ~0.20 (anchored: industry surveys show ~20% of latency regressions trace to infra). H3 (orthogonal): traffic mix shift (new customer segment with different query shape) — prior ~0.15 (weakly anchored: traffic shifts caused 2 of last 12 regressions). H4 (combination): query inefficiency × infra noisy-neighbor compounding — prior ~0.10 (no direct anchor, derived from H1×H2 joint probability). H5 (null): no real regression, observed degradation is monitoring artifact — prior ~0.05 (anchored: 1 of last 12 reported regressions was monitoring artifact). H6 (zebra): downstream dependency degraded silently — prior ~0.05 (rare; surfaced by analyst).

Stage 2 (competing-hypotheses, full ACH). Six evidence items inventoried; consistency matrix populated with Heuer vocabulary. E1 (latency regression coincides with code-change-X deployment, high cred/rel) — H1=CC, H4=CC, H5=I, others=N. E2 (regression isolated to query type Q4) — H1=CC, H4=CC, H2=I, H5=I. E3 (K8s migration completed two weeks before) — H2=C, H4=C, others=N. E4 (new customer segment onboarded mid-window) — H3=CC, others=N. E5 (monitoring instrumentation unchanged) — H5=II, others=N. E6 (downstream service A’s p99 stable) — H6=II, others=N. Surviving hypotheses by elimination: H1 with 1 inconsistency (E5), H3 with 0, H4 with 0. High-diagnosticity items: E1, E2, E5, E6.

Synthesis Stage 1 (prior elicitation). All priors carry anchors except H4 (derived from joint) and H6 (analyst-generated); both flagged weakly-anchored. MECE check fails explicitly — hypotheses are not mutually exclusive (H4 = H1∧H2). Posterior interpretation must account for non-MECE structure.

Synthesis Stage 2 (Bayesian network construction). Nodes: H1–H6 (hypotheses with priors), E1–E6 (evidence with likelihoods derived from consistency matrix). Conditional dependencies: H1 ∧ H2 → H4 (H4’s prior is a function of H1 and H2; if independent, P(H4) ≈ P(H1) × P(H2) ≈ 0.09; observed prior 0.10 is approximately consistent). Independence assumption named explicitly: H3 (traffic mix) and H1 (code change) are assumed independent because they trace to different mechanisms.

Synthesis Stage 3 (posterior update). Computing P(H|E) ∝ P(H) × ∏ P(E|H): H1 (code change) posterior 0.40–0.55, leading; H4 (combination) 0.20–0.30; H2 (infra alone) 0.10–0.18; H3 (traffic alone) 0.08–0.15; H5 and H6 each below 0.02 (ruled out by E5 and E6).

Sensitivity analysis. If E1 reversed, H1 drops to ~0.10, H4 drops to ~0.10; H2 and H3 rise to ~0.30 each. E1 is the dominant evidence item.

The integrated finding. H1 is the leading hypothesis but the posterior is not above 0.55 because H4 (combination) is non-trivial at 0.20–0.30. The right action is probably to investigate both H1 and H2 in parallel rather than to commit only to H1 — fixing only the query inefficiency while the infra noisy-neighbor compounds it would leave residual regression. The highest-leverage next step is verifying E1 with greater rigor (re-checking deployment logs against latency timestamps); if E1 is wrong, the ranking flips meaningfully. The framework does not tell you what to do; it tells you the probability landscape and the evidence sensitivities. The judgment is yours.

Commercial AI comparison

Comparison content auto-populates when the comparison-refresh framework runs against this question. Drafters do not author this section.

Brief comparison commentary

Auto-populates with the comparison content above.

How to use this framework

You can run the Bayesian Hypothesis Network Analysis pattern with any AI of your choice. The composition is two component passes followed by three synthesis stages.

The prompt:

[Paste the framework specification]

Run BHN on this phenomenon.

Phenomenon: [The observable pattern, event, or anomaly the hypotheses are competing to explain.]

Hypothesis set (optional): [Candidate explanations if you have named them; the framework will test breadth and add a null hypothesis.]

Evidence inventory (optional): [Observations bearing on the hypotheses with credibility and relevance ratings if available.]

Prior estimates (optional): [Prior probabilities anchored in base rates; if absent, the framework will document flat-prior assumption rather than fabricate.]

Conditional dependency map (optional): [Which hypotheses’ truths affect which others’ priors; if absent, the framework surfaces dependencies during Stage 2.]

The AI runs Stage 1 (differential-diagnosis fragment) → Stage 2 (competing-hypotheses full ACH) → Synthesis Stage 1 (prior elicitation) → Synthesis Stage 2 (Bayesian network construction) → Synthesis Stage 3 (posterior update). The output follows the eight-section template: hypothesis set with priors, evidence inventory with likelihoods, conditional dependencies, Bayesian network diagram or table, posterior distribution, sensitivity analysis, leading hypothesis with residual uncertainty, confidence map.

For best results:

Anchor priors in real base rates if you can. The framework’s most fragile stage is prior elicitation. When base rates are available (your team’s historical regression data, industry surveys, domain knowledge), name them explicitly. When they are not available, accept the flat-prior assumption rather than fabricating point estimates that look like rigor.
Do not insist on MECE structure. Real-world hypothesis sets are often non-MECE — combination hypotheses exist; hypotheses overlap. The framework names the non-MECE structure rather than forcing the hypotheses into orthogonality. Posterior interpretation must account for the structure (do not double-count evidence across overlapping hypotheses).
Take the sensitivity analysis seriously. The dominant evidence items are the items to monitor or to verify with greater rigor. Sensitivity analysis is the framework’s defense against false precision; if a single evidence reversal would change the ranking, that evidence is doing too much work and deserves verification.
Escalate sideways when the dispute is paradigmatic. BHN operates within a frame. If the candidate hypotheses invoke incommensurable theoretical frames, the right operation is T9 frame-comparison or worldview-cartography, not BHN. The framework will flag this when it detects the pattern.

The framework is deliberately tool-agnostic. The five-stage protocol, the Heuer ACH consistency matrix, the Bayesian update arithmetic, the sensitivity analysis discipline, and the four named failure modes (prior-fabrication, independence-assumption-collapse, sensitivity-omission, mece-violation-unnamed) all survive the lift to any environment.

Other examples

A medical case with three plausible diagnoses and conflicting test results. BHN runs differential-diagnosis fragment generating six candidates including null and zebra; competing-hypotheses full ACH populates the consistency matrix across six tests; synthesis stages produce a posterior distribution with one diagnosis at 0.55, two at 0.15 each, and two ruled out. Sensitivity analysis identifies one specific test’s result as dominant — re-running it with greater rigor is the highest-leverage next step. Demonstrates BHN’s role in clinical-style probabilistic reasoning where the competing hypotheses are formally similar.
An intelligence analysis weighing three competing explanations for an observed adversary behavior. BHN runs with explicit prior anchors (base rates from historical adversary behavior), full ACH matrix across multiple intelligence sources rated for credibility, conditional dependencies (one hypothesis’s truth would shift another’s prior because they share an organizational driver), and posterior with sensitivity. The deception assessment is required (CQ5) when adversarial actors are plausible. Demonstrates BHN as the natural operationalization of the Heuer ACH tradition for which the framework is named.
A market-research scenario weighing competing explanations for a customer churn pattern. BHN runs with priors anchored in segment-level historical data, ACH matrix across customer interview evidence and product-usage telemetry, conditional dependencies surfaced explicitly when two hypotheses (price sensitivity and feature dissatisfaction) share an underlying customer-segment driver. Posterior identifies the leading hypothesis with residual uncertainty; sensitivity analysis surfaces which evidence (a specific interview cohort’s responses) is dominant. Demonstrates BHN’s transferability to commercial decision-making contexts where calibrated probability beats flat ranking.

Citations

The Bayesian Hypothesis Network Analysis Framework draws on three converging traditions. Probabilistic reasoning under uncertainty: Bayes’ theorem itself (Reverend Thomas Bayes, 1763) and the modern Bayesian network apparatus from Pearl’s Probabilistic Reasoning in Intelligent Systems (1988); Pearl’s Causality (2009) and The Book of Why (2018) for the do-calculus and conditional-dependency apparatus when the network has causal interpretation. Hypothesis evaluation methodology: Richards Heuer Jr.’s Psychology of Intelligence Analysis (1999) for the Analysis of Competing Hypotheses (ACH) consistency matrix, the diagnosticity assessment, and the elimination-not-confirmation discipline that the framework’s Stage 2 inherits directly. Calibration discipline: Tetlock and Gardner’s Superforecasting (2015) for the range-not-point-estimate convention and the sensitivity-analysis-as-honest-uncertainty discipline.

The five-stage synthesis architecture (differential-diagnosis fragment → competing-hypotheses full → prior elicitation → Bayesian network construction → posterior update with sensitivity) is internal to Ora and emerged from observing that the qualitative ACH matrix (which competing-hypotheses produces) cannot capture conditional dependencies, and that the standard Bayesian network construction (which prior-fabrication enables) cannot survive honest engagement with input quality. The framework’s signature defenses — prior-anchor-or-flat-prior, named-independence-or-named-dependency, range-posteriors-with-sensitivity, named-non-MECE — are the operational implementation of “honest-uncertainty over false precision.” The framework is single-author and originated 2026-05-01; v1.0 is the current version.

Downloads

Framework specification (PDF) — link to ora-ai.org canonical artifact when published
Framework specification (plain text) — link to ora-ai.org canonical artifact when published
Full white paper (PDF) — link when published