Preamble: Governing Principles

Principle 1 — Visual output is a peer cognitive modality

Visual output is not decoration tacked onto prose. It is a parallel analytical product with its own schema, its own adversarial review, its own accessibility layer, and its own failure semantics. The foundational research — Larkin & Simon (1987), Paivio (1986), Baddeley (1974/2000), Mayer (2001/2009) — converges on a single design claim: well-designed visuals engage nonverbal representational codes, visuospatial working-memory resources, and preattentive perceptual grouping that prose alone does not engage. This is not a style preference. It is a representational commitment with measurable cognitive consequences.

Principle 2 — The protocol is an active inversion of LLM priors

LLMs have been trained on millions of slides, blog screenshots, and BI dashboards. The modal chart in their training distribution is a low-density, decorated, truncated-axis, default-binned bar chart from marketing collateral. Absent explicit counter-pressure, an LLM asked to visualize will regress to this mean. Every protocol rule, every adversarial check, and every default style operates as a deliberate inversion of that prior. This is not a one-time fix but a standing architectural commitment. Mode prompts, system prompts, and post-generation linting all encode this inversion. The protocol exists to produce Cleveland dot plots, not PowerPoint bar charts.

Principle 3 — Dishonest visuals are worse than no visuals

Tufte’s honesty constraint is the protocol’s cardinal rule. Readers apply skepticism to prose but credulity to pictures. A poorly drawn chart replaces the reader’s first-principles reasoning with a false conclusion the reader is unlikely to interrogate. If a visual cannot satisfy the integrity rules, silence is preferable. This applies with particular force to automated visualization, where failures compound unseen.

Principle 4 — Duplicative visuals are actively harmful

Mayer’s redundancy principle (1999/2003) is not a design guideline but an empirical finding: visuals that merely restate what prose already says make comprehension worse, not better. The protocol treats relation_to_prose: redundant as the least-preferred state. When a visual would only restate what the prose says, the correct output is no_visual, not a courtesy chart.

Principle 5 — Toward spatial-native intelligence

The long-term architectural trajectory is not “prose system with visual outputs” but “spatial-native intelligence system with prose as one rendering format among others.” Tversky’s research establishes that spatial structure is the foundational representational system for abstract thought. Prose is a lossy translation of spatial insight. This protocol is the first step toward restoring cognitive fidelity: letting insights be represented in formats closer to their native spatial structure. The visual output protocol, the visual input processing layer (sketches, diagrams, whiteboard photos), and the direct annotation interface are components of a single spatial-native architecture. This version specifies the output protocol; the input and annotation layers are documented in the companion Spatial-Native Architecture specification and will converge into a unified bidirectional protocol.


1. In-scope visual techniques

The protocol supports 22 diagram types organized into seven schema families. This is an increase from v0.1’s 19 types, adding tornado/sensitivity diagrams, influence diagrams, and bow-tie risk diagrams — all identified by the conceptual research as high-tractability, visually native techniques serving modes that are among the most degraded by prose-only execution.

1.1 Technique table

#TechniqueFamilyRendering targetTierRationale
1Comparison chart (bar, column, dot plot, slope graph)QUANTVega-LiteLowCleveland–McGill position/length ranking; grammar-of-graphics deterministic
2Time series with uncertainty (line + band, fan, sparkline)QUANTVega-LiteLowSmall multiples, fan bands, error layers; banking-to-45° via aspect field
3Distribution plot (box, violin, strip, histogram)QUANTVega-LiteLowReveals shape hidden by bar+error-bar; Weissgerber applies
4Scatter with annotationQUANTVega-LiteLowBivariate relation; supports facet/repeat for small multiples
5Heatmap / small-multiple heatmapQUANTVega-LiteLowAlso the ACH rendering target
6Tornado / sensitivity diagramQUANTVega-LiteLowSorted horizontal bar with center axis; sensitivity ranking is perceptual. Added v0.2
7Causal loop diagram (CLD)CAUSALSemantic JSON → SVG/DOTHighNo DSL carries polarity or R/B loop semantics
8Stock-and-flow diagram (XMILE-aligned)CAUSALSemantic JSON → SVGHighStocks/flows/clouds/auxiliaries have conservation semantics
9Causal DAG (Pearl/Hernán)CAUSALDAGitty DSLLowMature DSL; exposure/outcome/latent typing; acyclicity checkable
10Fishbone / IshikawaCAUSALSemantic JSON → SVGHighTyped tree with declared category framework
11Decision tree / probability treeDECISIONSemantic JSON → SVGHighProbability-sum and payoff invariants demand validation
12Influence diagram (Howard & Matheson)DECISIONSemantic JSON → DOTHighEncodes conditional independence structure trees cannot show. Added v0.2
13ACH matrixDECISIONSemantic JSON → Vega-Lite heatmapHighTyped table with C/I/N/NA cell vocabulary and diagnosticity scoring
142×2 / scenario-planning matrixDECISIONSemantic JSON → SVGHighAxis independence is epistemic claim requiring explicit rationale
15Bow-tie risk diagramRISKSemantic JSON → SVGHighThreat/event/consequence symmetry is the method’s point; invisible in prose. Added v0.2
16IBIS argument diagramARGUMENTSemantic JSON → DOTHighTiny typed graph with grammar constraints
17Pro–con treeARGUMENTSemantic JSON → SVGHighDegenerate argument map
18Concept map (Novak)RELATIONALSemantic JSON (CXL-shaped) → DOTHighLabeled edges carry propositional semantics
19Sequence diagramPROCESSMermaid / PlantUMLLowMature DSLs; LLM reliability high
20Flowchart / swimlanePROCESSMermaid (subgraphs)LowSubgraph-as-lane idiom works
21State diagram (FSM)PROCESSMermaid stateDiagram-v2LowHarel statecharts excluded; FSM covers most needs
22C4 architecture (Context + Container)SPATIALStructurizr DSLLowReference DSL by the C4 author

1.2 Excluded techniques, with rationale

  • Mind maps (Buzan) — No propositional semantics. Use concept maps with linking_phrase optional. Explicit exclusion prevents confusion.
  • Venn diagrams (n > 3) — Cannot be area-proportional in general (proven). Automatic dishonesty.
  • Full Toulmin argument maps — IBIS + pro-con cover the common case. Full Toulmin has unresolved rebuttal-target ambiguity in the literature. Candidate for v0.3.
  • Harel statecharts — Layout benefits sharply from human judgment. FSM subset covers the vast majority of needs.
  • Gantt charts — Redundant with timeline + swimlane; the specific task/dependency/resource semantics rarely appear in analytical modes.
  • Parallel coordinates — Fold into QUANT family as a Vega-Lite repeat + fold variant rather than a separate type.
  • ER diagrams, org charts, tree diagrams — Subsumed as specializations of Mermaid class/flowchart or concept map.
  • Sparklines — Treat as compact: true variant on time-series schema, not a distinct type.
  • Sankey / alluvial — Deferred to v0.3. Neither Vega-Lite native nor cleanly semantic.
  • Evidence-weight diagrams, Wigmore charts, full AIF — Deferred as niche.
  • Pie charts — Not explicitly banned, but never selected by default. Cleveland-McGill ranks angle/area among the least accurate encodings. If the protocol’s encoding-selection logic ever proposes a pie chart, the adversarial reviewer should flag it for replacement with a bar or dot plot.

1.3 Rejection criteria

Three gates: (a) formal structure cannot be captured in structured text without losing the thing that makes it useful; (b) high dishonesty risk by construction; (c) rendering is aesthetically sensitive enough that automated output will reliably produce degraded results.


2. Protocol envelope structure

2.1 Format

Visual specification blocks are fenced JSON code blocks with a typed marker, embedded in the response Markdown:

```ora-visual
{
  "schema_version": "0.2",
  "id": "fig-1",
  "type": "causal_loop_diagram",
  "mode_context": "systems_dynamics",
  "relation_to_prose": "integrated",
  "title": "Feedback loops in team velocity",
  "spec": { … type-specific fields … },
  "semantic_description": { … four-level description … },
  "spatial_representation": { … optional, for spatial-native pipeline … },
  "render_hints": { … optional, all ignorable … },
  "integrity_declarations": { … optional honesty assertions … }
}
```

Why fenced JSON. XML closing-tag drift is a known LLM failure mode. YAML indentation is brittle when interleaved with prose. Sidecar files break the stateless pipeline invariant. The discriminated union on type combined with additionalProperties: false and enum-constrained vocabularies eliminates the majority of LLM schema drift.

2.2 Required envelope fields

  • schema_version — string, semver. Consumers fail-closed on unknown major versions.
  • id — string, stable within a canonical document. Used for prose cross-references (“see fig-1”) and annotation targeting.
  • type — enum, one of the 22 in-scope types.
  • mode_context — string, the Ora mode that generated this visual (for adversarial routing and default configuration lookup).
  • relation_to_prose — enum: integrated | visually_native | redundant. See §5.
  • spec — type-dispatched object.
  • semantic_description — object per §8.

2.3 Optional envelope fields

  • title — short string (alt-text label and aria-labelledby).
  • caption — longer attribution string (source, period, n).
  • render_hints — object, all ignorable by any renderer: { preferred_engine, aspect_ratio, compact }.
  • integrity_declarations — object: { non_zero_baseline_justified, inverted_axis_justified, axes_independence_rationale, log_scale_base }. Populated only when the visual triggers a Tufte rule that allows a justified exception.
  • memorability_goal — boolean, default false. When true (set by user or mode configuration, never by the model), relaxes T4/T5 chartjunk rules to allow limited embellishment constrained by integrity rules. Based on Bateman et al. (CHI 2010) finding that embellished charts showed no worse comprehension and significantly better long-term recall.
  • fallback — alternative representation if render fails: { type: "table", data: [[…]] } or { type: "prose_only" }.
  • spatial_representation — optional spatial-native format for bidirectional communication pipeline. See §10.

2.4 Tiering

The spec field is polymorphic on type. For low-tier types, spec contains the declarative DSL directly or a validated Vega-Lite subset. For high-tier types, spec is a semantic JSON object that a deterministic compiler translates to a renderer format.

TierTechniquesWhy
Low (Vega-Lite)Comparison, time series, distribution, scatter, heatmap, tornadoGrammar-of-graphics is already the semantic tier
Low (DAGitty)Causal DAGDSL trivially short; acyclicity checkable
Low (Mermaid)Sequence, flowchart/swimlane, state (FSM)LLM reliability empirically highest; repair loops converge in 1-2 retries
Low (Structurizr)C4 architectureReference DSL; model/view separation matches mode context
High (semantic JSON)CLD, stock-flow, fishbone, decision tree, influence diagram, ACH, 2×2/scenario, bow-tie, IBIS, pro-con, concept mapNo mainstream DSL carries required semantics

The wrong alternative, worth naming: emitting Mermaid for everything. A CLD drawn as a Mermaid flowchart has no polarity to validate. A decision tree drawn as a flowchart has no probability sum to check. The protocol must preserve the semantics that make the technique worth using.


3. Per-type schemas

Schemas are informal here; the normative form is JSON Schema 2020-12 with additionalProperties: false everywhere.

3.1 QUANT family (comparison, time series, distribution, scatter, heatmap)

A conservative Vega-Lite subset. Required: $schema, data, mark (from enumerated marks), encoding with typed channels. Required metadata: title, caption.source, caption.period, caption.n, caption.units. Uncertainty field mandatory when encoding.y.field.statistic is point-estimate or when mode_context involves forecast/projection.

Banned without integrity_declarations justification: non-zero baseline on bar/area, inverted y-scale on conventional quantities, log scale without base disclosure, dual y-axes with independent zero points, rainbow/jet colormaps for ordered data.

3.2 Tornado / sensitivity diagram (QUANT family)

{
  "base_case_label": "string",
  "base_case_value": number,
  "outcome_variable": "string",
  "outcome_units": "string",
  "parameters": [
    {
      "label": "string",
      "low_value": number,
      "high_value": number,
      "low_label?": "string",
      "high_label?": "string",
      "outcome_at_low": number,
      "outcome_at_high": number
    }
  ],
  "sort_by": "swing"  // enum: "swing" | "high_impact" | "custom"
}

Invariants: parameters sorted by swing (|outcome_at_high - outcome_at_low|) descending unless sort_by overrides; base_case_value rendered as vertical center line; each parameter renders as a horizontal bar spanning [outcome_at_low, outcome_at_high].

3.3 Causal loop diagram (CAUSAL family)

{
  "variables": [{ "id", "label", "description?" }],
  "links": [{ "from", "to", "polarity": "+|-", "delay": false, "note?" }],
  "loops": [{ "id": "R1|B1|…", "type": "R|B", "members": ["varId", …], "label", "narrative?" }]
}

Invariants: every link has polarity; every declared loop is a genuine cycle in the graph; loop type matches sign-product of edge polarities (even count of → R; odd → B); every variables.id unique; no orphan nodes unless allow_isolated: true.

3.4 Stock-and-flow (CAUSAL family, XMILE-aligned)

{
  "stocks": [{ "id", "label", "initial?", "unit?" }],
  "flows": [{ "id", "label", "from": "stockId|cloudId", "to": "stockId|cloudId", "rate?", "unit?" }],
  "clouds": [{ "id" }],
  "auxiliaries": [{ "id", "label", "expression?" }],
  "info_links": [{ "from": "stockId|auxId", "to": "flowId|auxId" }]
}

Invariants: each flow endpoint resolves to a stock or cloud; stocks have ≥ 1 flow; auxiliaries form a DAG over info_links; units dimensionally consistent if provided.

3.5 Causal DAG (CAUSAL family)

{
  "dsl": "dag { x [exposure]; y [outcome]; u [latent]; x -> y; u -> x; u -> y }",
  "focal_exposure": "x",
  "focal_outcome": "y"
}

Invariants: parser accepts dsl; graph acyclic; focal_exposure and focal_outcome present in dsl.

3.6 Fishbone / Ishikawa (CAUSAL family)

{
  "effect": "string",
  "framework": "6M|4P|4S|8P|custom",
  "categories": [
    { "name", "causes": [ { "text", "sub_causes?": [ { "text", "sub_causes?": […] } ] } ] }
  ]
}

Invariants: if framework ≠ custom, categories[].name drawn from framework’s canonical set; depth ≤ 3; effect stated as a problem, not a solution (soft lint).

3.7 Decision tree / probability tree (DECISION family)

{
  "mode": "decision|probability",
  "root": { node },
  "utility_units": "USD|QALY|utils|…"   // required if mode=decision
}
node := {
  "kind": "decision|chance|terminal",
  "label",
  "children?": [ { "edge_label", "probability?", "payoff?", "node": node } ]
}

Invariants: chance-node children’s probabilities sum to 1 ± 1e-6; probabilities in [0,1]; decision nodes have ≥ 1 child; terminals have payoff when mode=decision; no probabilities on decision-node edges. Compiler computes rollback EV.

3.8 Influence diagram (DECISION family) — new in v0.2

{
  "nodes": [
    { "id", "label", "kind": "decision|chance|value|deterministic", "description?" }
  ],
  "arcs": [
    { "from", "to", "type": "informational|functional|relevance", "note?" }
  ],
  "temporal_order?": ["nodeId", …]  // decision sequence if relevant
}

Invariants: exactly one value node; no arcs into decision nodes from later-decided nodes (temporal consistency when temporal_order provided); the graph implied by functional arcs from chance/deterministic nodes into the value node forms a DAG; informational arcs represent information availability at decision time. Compiler checks d-separation readability.

3.9 ACH matrix (DECISION family)

{
  "hypotheses": [{ "id", "label", "description?" }],
  "evidence": [{ "id", "text", "credibility": "H|M|L", "relevance": "H|M|L", "source?" }],
  "cells": { "<evidence_id>": { "<hypothesis_id>": "CC|C|N|I|II|NA" } },
  "scoring_method": "heuer_tally|bayesian|weighted"
}

Invariants: every (evidence × hypothesis) cell populated; cell values from enum; non-diagnostic evidence flagged.

3.10 2×2 / scenario quadrant (DECISION family)

{
  "subtype": "strategic_2x2|scenario_planning",
  "x_axis": { "label", "low_label", "high_label", "description?" },
  "y_axis": { "label", "low_label", "high_label", "description?" },
  "quadrants": {
    "TL": { "name", "narrative?", "action?", "indicators?": [] },
    "TR": { … }, "BL": { … }, "BR": { … }
  },
  "items?": [{ "label", "x": 0..1, "y": 0..1, "note?" }],
  "axes_independence_rationale": "string (required)"
}

Invariants: all four quadrants named; axes_independence_rationale non-empty; items in [0,1]; for scenario_planning, each quadrant narrative non-empty.

3.11 Bow-tie risk diagram (RISK family) — new in v0.2

{
  "hazard_event": { "label", "description?" },
  "threats": [
    {
      "id", "label",
      "pathway?": "string",
      "preventive_controls": [{ "id", "label", "type": "eliminate|reduce|detect", "effectiveness?": "H|M|L" }]
    }
  ],
  "consequences": [
    {
      "id", "label",
      "severity?": "H|M|L",
      "mitigative_controls": [{ "id", "label", "type": "reduce|recover|contain", "effectiveness?": "H|M|L" }]
    }
  ],
  "escalation_factors?": [{ "from_control_id", "label", "escalation_control?": { "id", "label" } }]
}

Invariants: hazard_event is the center node; threats render left of center, consequences right of center; preventive controls sit on threat-to-event pathways, mitigative controls sit on event-to-consequence pathways. The visual symmetry — the whole reason the form exists — is enforced by layout. At least one threat and one consequence required.

3.12 IBIS argument diagram (ARGUMENT family)

{
  "nodes": [ { "id", "type": "question|idea|pro|con", "text" } ],
  "edges": [ { "from", "to", "type": "responds_to|supports|objects_to|questions" } ]
}

Grammar invariants: idea.responds_to → question; pro.supports → idea; con.objects_to → idea; question.questions → any. Violations are blocking.

3.13 Pro–con tree (ARGUMENT family)

{
  "claim": "string",
  "pros": [ { "text", "weight?": 1..5, "source?", "children?": [ … ] } ],
  "cons": [ … same shape … ],
  "decision?": "string"
}

3.14 Concept map (RELATIONAL family, CXL-shaped)

{
  "focus_question": "string",
  "concepts": [ { "id", "label", "hierarchy_level": 0..N } ],
  "linking_phrases": [ { "id", "text" } ],
  "propositions": [ { "from_concept", "via_phrase", "to_concept", "is_cross_link?": false } ]
}

Invariants: every proposition resolves to declared concept/phrase IDs; soft warning if no cross-links (cross-links are the Novak-specific insight).

3.15 Process family (sequence, flowchart/swimlane, state)

spec.dsl is a Mermaid string; spec.dialect names the diagram kind. Compiler runs a Mermaid parse; on failure, bounded repair loop (2 retries). Known-failure-prone tokens pre-scanned and escaped by the compiler, not the model.

3.16 C4 architecture (SPATIAL family)

spec.dsl is Structurizr DSL. Compiler rejects forward references and mixing of C4 levels within a single view. Level declared in spec.level ∈ {context, container}.


4. Mode-to-visual configuration table

This table maps Ora’s 19 modes (18 answer-seeking + 1 question-seeking) to their native modality classification, default visual types, default relation_to_prose, and adversarial strictness. The classifications are grounded in the Larkin-Simon computational-equivalence framework applied mode-by-mode in the conceptual research.

4.1 Visually native modes

These modes’ core inferences — loop polarity, diagnosticity, conditional independence, simultaneity, dominance ranking, spatial segmentation — are cheap in spatially indexed representations and expensive in sequential ones. Prose-only execution imposes measurable inferential cost.

ModeDefault visual typesDefault relation_to_proseAdversarial strictness
Systems DynamicsCLD, stock-and-flowvisually_nativeCritical
Competing HypothesesACH matrixvisually_nativeCritical
Decision Under UncertaintyDecision tree, influence diagram, tornadointegratedCritical
Root Cause AnalysisFishbone, CLD (when loops present)integratedStandard
Relationship MappingConcept map, causal DAG, network diagramintegratedStandard
Consequences and SequelCausal DAG, flowchartintegratedStandard
Constraint Mapping2×2 matrix, pro-con treeintegratedStandard
Scenario Planning2×2 scenario matrixintegratedStandard
Strategic InteractionDecision tree (game tree), influence diagramintegratedCritical
Benefits AnalysisPro-con tree, tornado (for quantified benefits)integratedStandard

4.2 Bimodal modes

These modes decompose into structure identification (visual) and structure interpretation (linguistic). Both representations carry essential, non-redundant information.

ModeDefault visual typesDefault relation_to_proseAdversarial strictness
SynthesisConcept map (structural parallels)integratedStandard
Dialectical AnalysisIBIS (thesis/antithesis structure)integratedStandard
Terrain MappingConcept map (known/unknown/open)integratedStandard
Passion ExplorationConcept map (exploration nodes, potential projects)integrated — but visual is for navigation, not argumentRelaxed
Cui BonoFlowchart (interest flows), concept mapintegratedStandard

4.3 Linguistically native modes

Core inferences depend on operators graphics cannot express compactly: negation, counterfactual conditionals, modal qualifiers, normative predicates. Visual output is supplementary at best; forcing a diagram falsifies the task through Stenning-Oberlander over-specificity.

ModeVisual types (if any)Default relation_to_proseNotes
Steelman ConstructionNone by defaultno_visualThe steelman is a piece of prose; reducing it to a node label destroys the steelman
Deep ClarificationOptional: flowchart (for mechanistic processes)redundant → prefer no_visualVisual only when the mechanism is itself spatial/procedural
Paradigm SuspensionNone by defaultno_visualThe questioning of assumptions is linguistic; diagram would force premature commitment
Project ModeVaries by deliverableVariesProject Mode inherits visual configuration from the analytical mode it serves

4.4 Usage rules

  • The table is user-editable per mode and stored in Ora’s vault as a canonical configuration document.
  • visually_native relation is permitted only for modes marked visually native in this table. Other modes may not claim it without user override.
  • When relation_to_prose defaults to redundant, the model should evaluate whether the visual adds anything the prose does not. If the answer is no, emit no_visual. The redundancy principle (Mayer 1999/2003) means a courtesy visual that merely restates prose carries real cognitive cost.
  • Changes to mode configuration propagate to subsequent analytical invocations without requiring per-output approval.

5. Coexistence with prose — the relation_to_prose field

Four states, assigned per visual per mode. The assignment is prescriptive, not optional.

  • visually_native — The visual is the primary artifact; prose is caption and context. Used only when the mode is inherently structural and the diagram does the cognitive work. The adversarial reviewer applies stricter integrity checks because dishonesty is more costly when the visual is primary. Assigned modes: Systems Dynamics, Competing Hypotheses.

  • integrated — Prose and visual are mutually dependent; prose references figure by id; visual carries information that prose summarizes but does not reproduce. Prose must remain interpretable without the visual via the semantic description. Assigned modes: most analytical modes.

  • redundant — Prose carries the full analytical content; visual reinforces. Health warning (Mayer): this state carries empirical cognitive cost. The protocol treats it as the least-preferred option. Before emitting a redundant visual, evaluate whether the visual genuinely adds pattern-recognition, spatial structure, or comparison capability. If it merely restates the prose in diagrammatic form, suppress it.

  • no_visual — No visual specification block emitted. The default for linguistically native modes. Not a failure state — it is the correct output when prose is the computationally superior representation.


6. Handoff architecture — rendering paths

Three paths, all active, with assignments:

Path A — Direct declarative render

The model’s output is already consumable. Failure surface is parse error only.

Assigned: QUANT family (Vega-Lite), PROCESS family (Mermaid), SPATIAL family (Structurizr DSL), causal DAG (DAGitty DSL).

Path B — Specialized compiler

A compiler wraps a graph-layout library, applies Tufte-aligned styling defaults, computes derived quantities (rollback EV, loop polarity product, ACH diagnosticity, bow-tie symmetry), and emits SVG. This path carries the integrity logic that cannot be delegated to a generic renderer.

Assigned: CAUSAL non-DAG (CLD, stock-flow, fishbone), DECISION (all), RISK (bow-tie), ARGUMENT (IBIS, pro-con), RELATIONAL (concept map).

Path C — Small-model rendering judgment

Reserved for freeform or hybrid outputs the semantic tier cannot express. Not used in v0.2. When activated, operates only on rendered SVG from Path B to refine layout — never invents semantics. The adversarial reviewer still audits against the original specification.

Fork C status: Deferred. Including Path C introduces a model call in the rendering pipeline that otherwise has none, affecting latency and stateless-pipeline discipline. Activate when v0.2 Path B outputs prove aesthetically inadequate in a way users notice.

Multi-path routing

Some techniques can flow through multiple paths. Causal DAGs: Path A (DAGitty → Graphviz) is default; Path B available when analytical model is uncertain about syntax. 2×2 matrices: Path B default; Path A Vega-Lite variant exists for scatter-style 2×2 without named quadrants. Protocol records the actual path used in the rendering manifest.


7. Adversarial review for visual output

Visual adversarial review is a distinct adversarial stage with its own prompt, run after the analytical adversarial stage and before rendering (spec-level review) plus a second, lighter pass after rendering (artifact-level review).

7.1 Tufte integrity rules (T-rules)

Machine-checkable rules applied at spec level:

  • T1 Lie factor. Length/area encoding: ratio-of-pixels / ratio-of-values must be in [0.95, 1.05].
  • T2 Zero baseline. Bar/area/column: scale.domain[0]=0 required, or integrity_declarations.non_zero_baseline_justified populated with quantity type (index, z-score, temperature).
  • T3 Dimensional conformance. Visual dimensions ≤ data dimensions. Fail on 1D-to-2D area or 3D volume.
  • T4 Data-ink ratio proxy. Count decorative elements against data marks; fail on 3D extrusion, drop shadow, gradient fill on categorical mark, decorative image. Exception: relaxed when memorability_goal: true, but integrity rules (T1–T3) still hold.
  • T5 Chartjunk blacklist. Hard-fail on 3D bar/pie/cylinder/cone, moiré, non-data gradients. Same memorability exception as T4.
  • T6 Show the data. If n/marks > 20 and no distributional layer, require adding one or justifying aggregation.
  • T7 Labelling completeness. Axis titles, units, scale type, n, source, period — all required.
  • T8 Scale-type disclosure. Log/symlog/pow must be labelled with base.
  • T9 Axis orientation. Inverted y-scale on conventional quantities fails unless declared intentional.
  • T10 Banking to 45°. For line marks, aspect ratio within 2× of Cleveland-banked optimum.
  • T11 Small-multiples trigger. ≥ 7 categorical colors on one panel triggers a facet suggestion.
  • T12 Currency standardization. Nominal currency over > 3 years warns; require real/deflated unless declared.
  • T13 Event labelling. Long time series: warn if major-event metadata present but unlabelled.
  • T14 Tick consistency. Constant tick step on quantitative axes; log axes label powers of base.
  • T15 Caption-source-n present. Hard requirement.

7.2 Structural integrity rules (per-family)

  • QUANT: Uncertainty required when quantity is inferential, forecast, model output, or drives a decision. Dual y-axes blocked unless mathematically linked.
  • CAUSAL: CLD polarity on every edge; declared loop type matches edge-sign product. DAG acyclic. Stock-flow: stocks ≥ 1 flow; units consistent. Fishbone: categories from declared framework; depth ≤ 3.
  • DECISION: Tree chance-node probabilities sum to 1; terminals have payoffs. Influence diagram: exactly one value node; temporal consistency. 2×2: axes_independence_rationale non-empty. ACH: cells complete and from vocabulary; non-diagnostic evidence flagged. Tornado: parameters sorted by swing.
  • RISK: Bow-tie: at least one threat, one consequence; preventive controls on left pathways, mitigative on right. Symmetry preserved in layout.
  • PROCESS: Flowchart decision nodes have ≥ 2 labelled, mutually exclusive, exhaustive outgoing edges. Sequence: every message has sender and receiver. State: initial state declared; unreachable states flagged.
  • SPATIAL: C4 level declared and not mixed.
  • ARGUMENT: IBIS grammar enforced. Warrant ≠ evidence.

7.3 Severity tiers

  • Critical (auto-block render): T1 beyond 2×; T3; T5; T9 undisclosed; log without label; Venn ≥ 4 sets; cherry-picked time range reversing trend sign; CLD missing polarity; decision tree missing probability/payoff; IBIS grammar violations; bow-tie with controls on wrong side.
  • Major (warn; require human sign-off if integrated or visually_native): aspect ratio > 2× off optimum; rainbow on ordered data; aggregation hiding distribution; missing uncertainty on inferential quantities; false precision; non-orthogonal 2×2 axes (|corr| > 0.7); chart-type mismatch to task per Mackinlay ranking.
  • Minor (informational log): missing legend title; inconsistent tick intervals; redundant data-ink; heavy gridlines; untested CVD palette.

7.4 Artifact-level review

After rendering, a lightweight adversarial pass checks: overlapping nodes, text truncation, illegible contrast (WCAG 1.4.11 ≥ 3:1), visual-chartjunk the spec layer cannot see. Does not re-litigate structural correctness.

7.5 LLM-prior-inversion checks

In addition to the T-rules and structural checks, the adversarial reviewer flags:

  • Template-trap regression: output that looks like the modal BI dashboard (default palette, generic layout, low density). Not a blocking violation but a prompt to consider higher-density alternatives.
  • Chart-type misselection: encoding selection must follow Bertin/Cleveland/Munzner decision procedure (data type × task × cardinality), not the first chart the model emits. If the model proposes a bar chart and a dot plot would be more accurate for the task, the reviewer flags it.
  • Default-settings passthrough: if the model emits a Vega-Lite spec where all optional fields are at library defaults (bin, scale, axis, legend), flag for explicit authorship. Every default is an authored choice.

8. Accessibility and semantic description

8.1 Four-level semantic description (mandatory)

Following Lundgard & Satyanarayan (MIT Vis Group, IEEE TVCG 2022):

"semantic_description": {
  "level_1_elemental": "string",        // required — chart type, encodings, axis ranges
  "level_2_statistical": "string",       // required — extrema, trends, correlations, counts
  "level_3_perceptual": "string",        // required — synthesized patterns, notable exceptions
  "level_4_contextual": "string|null",   // optional — domain interpretation
  "short_alt": "string",                 // required, ≤ 150 chars
  "data_table_fallback": { … } | null
}

Rules:

  • short_alt follows the Cesal formula: “[chart type] of [data], where [key takeaway].”
  • Level 4 is optional and should be omitted when in doubt. Lundgard-Satyanarayan found blind readers ranked Level 4 least useful (63% emphatically opposed it).
  • For quantitative visuals with ≤ 50 data points, data_table_fallback populated.
  • For non-quantitative diagrams, type-specific description fields augment Level 1 (loops for CLD, optimal path for decision trees, leading hypothesis for ACH, items-per-quadrant for 2×2, threat/consequence counts for bow-tie, actors/steps for sequence diagrams).

8.2 Redundancy guard

When relation_to_prose = redundant, short_alt is the Cesal one-liner and Levels 2-4 may be “See surrounding prose.” When relation_to_prose = integrated or visually_native, full four-level description required.

8.3 Rendering accessibility

  • SVG wrapped with role="img", aria-labelledby="<title-id> <desc-id>".
  • <title> from short_alt, <desc> from concatenated Levels 1-3.
  • Decorative shapes: aria-hidden="true".
  • Complex SVG: parallel navigable representation following Olli/ARIA TreeView pattern.

8.4 Contrast and color rules (WCAG 2.2)

  • Text ≥ 4.5:1 (AA).
  • Graphical objects essential to meaning ≥ 3:1 (SC 1.4.11).
  • Never encode via color alone (SC 1.4.1).
  • Categorical: Okabe-Ito (≤ 8) or similar CVD-safe.
  • Sequential: viridis-family or ColorBrewer sequential.
  • Diverging: ColorBrewer RdBu/PuOr or Crameri vik/roma.
  • Interactive element target size ≥ 24×24 CSS px (SC 2.5.8).

8.5 Fallback when rendering fails

Three-tier graceful degradation:

  1. Render fails → display data_table_fallback if present, else full four-level description in bordered block labelled “Figure unavailable — description follows.”
  2. Artifact fails artifact-level adversarial review at Critical severity → same fallback.
  3. User opts for prose-only view → semantic_description serves as primary content.

Cardinal rule: never render a degraded visual just because a slot was allocated.


9. Human-in-the-loop points

Four intervention surfaces, none requiring per-output approval. Design principle: human is in authority but not in the loop for the common case.

9.1 Specification-stage edit

The canonical document exposes visual spec blocks as editable JSON. User modifies spec directly and triggers re-render without re-running the analytical model. Primary fast-path; seconds, not minutes.

9.2 Artifact review and regeneration

Compiled artifact displayed alongside spec. “Regenerate” re-runs compiler only. “Re-analyze” re-runs analytical model with directive to emit different visual.

9.3 Technique selection override

Mode configuration (permitted visual types, default relation_to_prose, adversarial strictness) is user-editable per mode, stored in Ora’s vault. Changes propagate to subsequent invocations.

9.4 Visual suppression

User can disable visuals globally, per mode, or per visual type. When suppressed, analytical model still emits semantic_description as prose (because the description often carries information prose did not) but omits spec block.


10. Spatial-native extensions (v0.2 forward architecture)

This section documents the architectural direction toward spatial-native intelligence. These extensions are not implemented in v0.2 but their data structures are forward-compatible with the protocol envelope.

10.1 The spatial representation field

The optional spatial_representation field in the envelope captures Tversky’s correspondence principles in a format usable by the bidirectional pipeline:

"spatial_representation": {
  "entities": [
    { "id": "A", "position": [x, y], "label": "concept A", "spec_ref?": "node_id_in_spec" }
  ],
  "relationships": [
    { "source": "A", "target": "B", "type": "causal|associative|hierarchical|temporal",
      "strength?": 0.0..1.0, "spec_ref?": "edge_id_in_spec" }
  ],
  "clusters": [
    { "members": ["A", "B", "C"], "label": "core processes" }
  ],
  "hierarchy": [
    { "parent": "F", "children": ["A", "B", "C"], "type": "abstraction|containment|composition" }
  ]
}

The spec_ref fields link spatial entities to typed-diagram elements, so the spatial representation and the typed spec remain synchronized. This enables:

  • Extracting spatial structure from visual inputs (sketches, diagrams, whiteboard photos) and injecting it into the analytical pipeline
  • Maintaining spatial continuity across conversation turns — the spatial arrangement persists even when the typed diagram changes
  • Feeding spatial position data into the annotation interface so user markup targets specific spatial regions

10.2 Visual input processing (forward-looking)

Three levels of visual input, each feeding into the analytical pipeline:

Level 1 — Structure extraction. Vision-language model parses visual inputs (napkin sketches, Excalidraw exports, whiteboard photos, Obsidian Canvas files) and populates the spatial_representation format. Boxes become entities, lines become relationships, spatial clusters become clusters, vertical position maps to hierarchy level.

Level 2 — Spatial reasoning. The analytical model applies Tversky’s correspondence principles to the extracted structure: are entities positioned according to their conceptual relationships? Are there missing connections that spatial layout suggests? Are hierarchical relationships captured? This is fog-diagnosis applied to visual input — the model helps the user see what their spatial intuition was encoding.

Level 3 — Collaborative spatial refinement. The system proposes refinements to spatial structure with both spatial suggestions and prose explanation: “You’ve placed X near Y, suggesting relationship, but there’s no connecting line. Should there be a causal connection?“

10.3 Direct annotation interface (forward-looking)

The companion Spatial-Native Architecture specification details the annotation interface. For protocol purposes, the key requirement is that every id in the spec and spatial_representation is a valid annotation target. User annotations are parsed into structured feedback keyed to spec elements:

{
  "annotation_type": "expand|connect|correct|insert|delete",
  "target_id": "fig-1.node-A",
  "content": "string",
  "spatial_position?": [x, y]
}

This structured feedback enters the analytical pipeline as input to the next invocation without requiring the user to translate their spatial markup into prose instructions.

10.4 Visual conversation continuity

The spatial_representation field, when populated, persists across conversation turns. On subsequent invocations, the analytical model receives the prior turn’s spatial layout and either preserves it (maintaining the user’s spatial mental model) or explicitly declares spatial changes with rationale. This prevents the disorienting effect of spatial arrangements shifting silently between turns.


11. Unresolved forks

Fork A — Semantic JSON vs. direct Mermaid for process family

Currently routes sequence/flowchart/state through Path A (direct Mermaid). Alternative: semantic JSON tier compiling to Mermaid, adding validation at cost of reinventing a grammar. Decision rule: instrument v0.2 rollout; switch if Mermaid repair loops exceed 3 retries on > 10% of outputs.

Fork B — Adversarial review: blocking vs. annotating

Currently blocks on Critical, warns on Major. Alternative: annotate-only with no blocking. Recommended resolution: per-mode configuration. Analytical modes default to blocking. Passion Exploration and Project Mode (depending on deliverable) default to annotate-only. User-configurable.

Fork C — Small-model rendering activation

Deferred from v0.1 and v0.2. Activate when Path B outputs prove aesthetically inadequate in user testing. Path C introduces a model call in the rendering pipeline; only justified by measurable aesthetic failure.

Fork D — Spatial representation: required or optional (new in v0.2)

The spatial_representation field is optional in v0.2. The question is whether it should become required when the mode is visually native. Making it required would enable visual-input and annotation features immediately for those modes but would increase spec size and model output cost. Decision rule: make required for visually native modes in v0.3 if the annotation interface enters implementation.


12. Vault integration and versioning

Spec blocks are canonically stored in the vault alongside the analytical document, with compiled artifacts as sidecar files referenced by id. Users can revise a spec and re-render against an updated compiler without losing provenance. The schema_version field ensures forward compatibility: a v0.1 spec remains renderable under v0.2 compilers via migration rules. New v0.2 types (tornado, influence diagram, bow-tie) are not available in v0.1 specs; the compiler rejects unknown types cleanly.

The mode-to-visual configuration table (§4) is itself a canonical document in the vault, editable and version-controlled. Changes to the table take effect on the next analytical invocation.


13. Changelog from v0.1

ChangeRationale
Added tornado/sensitivity diagramConceptual research: visually native for sensitivity analysis, high-tractability, serves a mode ranked among most degraded by prose-only
Added influence diagramEncodes conditional independence structure decision trees cannot show; Howard & Matheson (1981); serves Decision Under Uncertainty and Strategic Interaction
Added bow-tie risk diagramSymmetry of preventive vs. mitigative controls is invisible in prose; serves Risk Analysis mode if added, and interim serves Consequences and Sequel
Added Principle 2 (LLM prior inversion)Conceptual research: the modal chart in LLM training data is the wrong chart; protocol must actively invert this prior, not just check violations post-hoc
Added Principle 4 (duplicative visuals harmful)Mayer’s redundancy principle: courtesy visuals that restate prose carry measured cognitive cost
Added memorability_goal flagBateman et al. (CHI 2010): embellished charts show better long-term recall; T4/T5 relaxation bounded by integrity rules
Added no_visual as explicit relation_to_prose stateLinguistically native modes should default to no visual, not to a redundant one
Created mode-to-visual configuration table (§4)Maps all 19 Ora modes to modality classification, visual types, prose relation, and adversarial strictness using actual mode names
Added spatial_representation field (§10)Forward-compatible with spatial-native architecture; enables visual input, annotation, and cross-turn continuity
Added LLM-prior-inversion checks to adversarial layer (§7.5)Template-trap regression, chart-type misselection, default-settings passthrough
Added Fork D (spatial representation required vs. optional)New design fork arising from spatial-native integration