Artifact Evaluation by Stance

Overview

The Artifact Evaluation by Stance Framework is the territory framework for T15 — operations that take a plan, proposal, idea, design, or argument-as-proposal and evaluate it by adopting a defined stance. The territory exists because evaluation-with-a-stance is a distinct kind of work from evaluation-for-soundness (which lives in T1) and from structural-fragility audit-without-an-adversary (which lives in T7). T15 answers questions like “make the strongest case for this,” “stress-test this adversarially,” “give me a balanced read,” “what are the pros and cons,” and “argue against this for me.” The stance is the load-bearing variable: the same artifact gets meaningfully different evaluations depending on what stance is adopted.

The framework hosts five active modes arranged along a stance gradient. Steelman Construction (constructive-strong) reconstructs a position at its logical best — surfacing hidden premises, filling gaps with the most charitable inferences, marshalling the best available evidence. The mirror test governs: would a thoughtful proponent endorse the reconstruction? Critique addresses only the strongest version (no retreat to the weaker original). Benefits Analysis (constructive-balanced) runs de Bono’s Plus-Minus-Interesting on a single proposal with affected-parties asymmetry mapping; the recommendation field is empty by default — Benefits Analysis presents the envelope, the user decides. Balanced Critique (neutral) produces strengths and weaknesses with comparable rigor; perspective-dependent findings are flagged with stakeholder vantage; residual tensions are named in the net assessment rather than collapsed into a tidy verdict. Red Team Assessment (adversarial-actor-modeling, own-decision) ranks vulnerabilities by severity for the user’s own fix-prioritisation, paired with actionable fix recommendations and feasibility notes; Attack-Failure Disclosure names attack classes that produced no findings (honest attack failure beats manufactured findings). Red Team Advocate (adversarial-actor-modeling, external-use) produces an argument brief providing ammunition against the artifact for an external audience, with attacks rated by persuasive force and paired with audience-fit phrasing; concessions section preempts the strongest counter-moves.

The framework’s load-bearing intellectual content is the stance gradient discipline and the Decision D parsing of red team into assessment and advocate. The stance gradient — constructive-strong → constructive-balanced → neutral → adversarial-actor-modeling — is the primary disambiguation axis. The framework’s default route when the user has not signaled a stance is balanced-critique (neutral); the asymmetric stances are deliberately opt-in because asymmetric evaluation against the user’s intent produces unhelpful output. The Red Team parse separates two distinct output contracts that were previously bundled into one mode: assessment ranks vulnerabilities by severity with paired fix recommendations for the user’s own decision; advocate ranks attacks by persuasive force with paired suggested phrasing for an external audience. The two have different ranking criteria, different audience modeling, and different signature failure modes (pulled punches for assessment; cynical overreach for advocate). The Steelman cross-territory case is the framework’s most-asked-about edge case: Steelman Construction’s home is T15 because its primary work is stance-bearing artifact evaluation; T1 cross-reference activates when the artifact under steelmanning is itself an argument and argument-coherence considerations should inform the reconstruction.

The framework is honest about what T15 does not do. It does not adjudicate the artifact’s truth (that is for the user, downstream of the evaluation). It does not perform structural-fragility audit independent of any adversary (that is T7). It does not surface frame-of-the-issue analysis (that is T9). The Input Sufficiency Protocol that runs in both red team modes refuses to attack when the artifact is too vague to attack productively — emitting a three-part redirect (What I see / What’s missing / Three options with override) instead of producing low-quality findings against a low-specificity input. The discipline of declining when conditions fail is what makes the findings credible when they are produced.

Systemic context

The Artifact Evaluation by Stance Framework is the territory framework for T15 in Ora’s territory map. It composes downward into five mode files (Modes/steelman-construction.md, Modes/benefits-analysis.md, Modes/balanced-critique.md, Modes/red-team-assessment.md, Modes/red-team-advocate.md) plus the deferred devils-advocate-lite mode (per CR-6). It composes outward through two cross-territory adjacencies — T1↔T15 (argument soundness audit vs. stance-bearing proposal evaluation, with the Steelman cross-territory case handled explicitly: home T15, T1 cross-reference activates when the artifact is an argument); T7↔T15 (structural-fragility audit without adversary vs. red-team adversarial-actor stress test). The four-stage pre-routing pipeline classifies prompts into T15 when the user wants an artifact evaluated as a proposal with a particular stance; the mode selection question (“strongest case for, strongest case against, balanced look weighted toward positives, neutral look at both sides, or quick devil’s advocate?”) then dispatches to the right mode.

Ora’s output

Example question: I have written the first draft of a strategy memo arguing my company should pivot from selling to enterprises to selling directly to individual professionals. I want to know what is wrong with it before I send it to the executive team. What is the right kind of evaluation to run?

Ora’s response (running T15’s stance disambiguation and dispatching):

Before running any evaluation I will surface T15’s stance question and the four routing options, then make a recommendation for your specific case.

T15’s central disambiguation question. “Want the strongest case for it, the strongest case against it, a balanced look weighted toward positives, a neutral look at both sides, or a quick devil’s advocate?” Five plain-English answers route differently along the stance gradient.

Steelman Construction (constructive-strong). If you want me to make the strongest possible case for the pivot — surfacing hidden premises that strengthen it, filling logical gaps with charitable inferences, marshalling the best available evidence — that is steelman territory. Useful when you want to test whether the strongest version of your argument survives, or when you want material to refine your draft toward. The mirror test applies: a thoughtful proponent of the pivot must be able to endorse the reconstruction.

Benefits Analysis (constructive-balanced). If you want a structured Plus-Minus-Interesting view with affected-parties asymmetry — what benefits the pivot produces, what costs and risks come with it, what is interesting in the second-order effects (precedent, signaling, path-dependency), and which parties are Plus and which are Minus — that is benefits-analysis territory. The recommendation field stays empty; you see the envelope, you decide.

Balanced Critique (neutral). If you want a neutral evaluation with comparable rigor on strengths and weaknesses, perspective-dependent findings flagged with stakeholder vantage, and residual tensions named rather than collapsed into a tidy verdict, that is balanced-critique territory. This is the framework’s default when the user has not signaled a stance — useful when you want to know how the memo lands with a fresh, neutral reader.

Red Team Assessment (adversarial-actor-modeling, own-decision). If you want vulnerabilities ranked by severity for your own fix-prioritisation — paired with actionable fix recommendations and fix-feasibility notes — that is assessment territory. The output answers: what is wrong with the memo, ranked by severity, so you know what to fix before sending it.

Red Team Advocate (adversarial-actor-modeling, external-use). If you want ammunition against the memo for prepping yourself to defend it under hostile review — attacks ranked by persuasive force, paired with suggested phrasing in the audience’s idiom, plus concessions that preempt the strongest counter-moves — that is advocate territory.

For your strategy memo specifically. You said “I want to know what is wrong with it before I send it.” That is the textbook trigger for Red Team Assessment. The mode will produce vulnerabilities ranked by severity (Showstopper / Major / Caveat), each grounded in artifact-specific quotes, each paired with an actionable fix recommendation and a fix-feasibility note. The Attack-Failure Disclosure section names attack classes I attempted that produced no findings. If I find no Major or Showstopper vulnerabilities, I will declare a severity floor rather than inflating findings to feel productive.

Before I run. The Input Sufficiency Protocol checks that the memo identifies the strategy clearly enough to attack productively; if too vague, I will emit a three-part redirect rather than attacking. Also consider a paired Steelman pass before the Red Team — Steelman-then-Red-Team is a common composition for artifacts you intend to commit to.

If you want me to dispatch. Confirm “Red Team Assessment” (or “Steelman then Red Team Assessment”) and paste the memo. The output is a structured audit ranked worst-first with paired fix recommendations.

That is what the territory framework does: surface the five modes along the stance gradient, surface the disambiguation question, and route to the mode that matches what the user actually wants — including paired-mode compositions like Steelman-then-Red-Team when those serve the use better.

Commercial AI comparison

Comparison content auto-populates when the comparison-refresh framework runs against this question. Drafters do not author this section.

Brief comparison commentary

Auto-populates with the comparison content above.

How to use this framework

You can run the Artifact Evaluation by Stance pattern with any AI of your choice. The composition is single-pass against the stance disambiguation followed by dispatch to the selected mode.

The prompt:

[Paste the framework specification]

Run T15 disambiguation on this evaluation request.

Artifact: [Paste or describe the plan, proposal, idea, design, or argument-as-proposal.]

What I want to know (in plain English): [Strongest case for, strongest case against, balanced view, neutral both-sides, or devil’s advocate.]

Audience (only required if Red Team Advocate is selected): [Who you are arguing against; their frame, priorities, persuasion pathways.]

The AI runs T15’s stance question, identifies the mode (steelman-construction, benefits-analysis, balanced-critique, red-team-assessment, red-team-advocate), runs the Input Sufficiency Protocol if a red team mode is selected, declares the stance, and produces the mode-appropriate output.

For best results:

Be honest about what stance you want. The framework’s value is the stance discipline. If you ask for a balanced critique when you actually want the strongest case for your position, the balanced-critique output will not satisfy you. Pre-naming the stance is acceptable; pre-naming the wrong stance produces unhelpful evaluation.
For Red Team Assessment, expect actionable fixes. Each vulnerability is paired with a fix recommendation and a fix-feasibility note. If a finding has no actionable fix, that is a finding worth pushing back on — manufactured vulnerabilities have no fixes.
For Red Team Advocate, name the audience. The mode requires audience-identifiable per the Input Sufficiency Protocol’s additional check. Without a named audience, the suggested phrasing has no idiom to land in and the attacks lose calibration.
For Steelman Construction, apply the mirror test. If a thoughtful proponent of the position would not endorse the reconstruction, the steelman has drifted into a different argument. Push back; the framework is supposed to enforce the mirror test as CQ1.
Compose modes when it serves the use. Steelman-then-Red-Team is the most-common composition for artifacts you intend to commit to. Benefits-then-Red-Team-Advocate is the composition for prepping a position you intend to defend under hostile review. The framework supports composition; the user makes the call.

The framework is deliberately tool-agnostic. The five-mode taxonomy, the stance gradient, the Decision D parse of red team into assessment and advocate, the Input Sufficiency Protocol, and the Steelman cross-territory case all survive the lift to any environment.

Other examples

A startup founder evaluating a product hypothesis before committing engineering to it. The disambiguation routes to Benefits Analysis first (envelope of plus-minus-interesting with affected-parties asymmetry), then optionally to Red Team Assessment if the user wants vulnerabilities ranked. Demonstrates the constructive-balanced-then-adversarial composition for early-stage decisions where the user wants the envelope before the attack.
A policy proposal heading to a public consultation period. The disambiguation routes to Red Team Advocate with named audiences (the public submission’s expected critics — industry groups, affected communities, regulators). The mode produces an attack brief per audience with audience-fit phrasing, concessions preempting the strongest counter-moves, and a strategic considerations section naming political and reputational dimensions. The user uses the brief to refine the proposal before submission. Demonstrates Red Team Advocate’s external-use ranking by persuasive force rather than severity.
A philosophical position the user disagrees with but wants to engage seriously. The disambiguation routes to Steelman Construction (T15 home) with T1 cross-reference activated because the artifact under steelmanning is itself an argument. The steelman reconstructs the position at its logical best — surfacing hidden premises, filling gaps with charitable inferences. The mirror test applies. The user then has material to engage with rather than the strawman version they had absorbed. Demonstrates the Steelman cross-territory case where T1’s argument-coherence considerations inform the T15 reconstruction.

Citations

The Artifact Evaluation by Stance Framework draws on a converging set of evaluation traditions. Steelman discipline traces to Anatol Rapoport’s Fights, Games, and Debates (1960) for the four rules of engagement (the mirror test, points of agreement, acknowledged learning, and only-then permitted criticism); Daniel Dennett’s Intuition Pumps and Other Tools for Thinking (2013) operationalizes Rapoport’s rules into a four-step charity protocol. Benefits Analysis draws on Edward de Bono’s PMI (Plus-Minus-Interesting) tool from de Bono’s Thinking Course (1982) for the three-column structural separation that prevents Plus-vs-Minus advocacy collapse. Red Team draws on the CIA Tradecraft Primer (2009), Micah Zenko’s Red Team (2015), Bryce Hoffman’s Red Teaming (2017), and the Israeli intelligence Ipcha Mistabra tradition for the adversarial-vulnerability-assessment lineage; Klein’s pre-mortem technique (HBR 2007) for prospective hindsight; failure-mode literature and post-mortem analyses for the structural attack vocabulary.

The Decision D parse of red team into assessment and advocate is internal to Ora and resolves the operational problem where a single mode tried to serve two distinct output contracts (severity-ranked vulnerabilities with fix recommendations vs. persuasive-force-ranked attacks with audience phrasing) that turned out to need different ranking criteria, different audience modeling, and different signature failure modes (pulled punches vs. cynical overreach). The territory framework was compiled 2026-05-01 from the territory entry, member mode specs, lens dependencies, and open debates; v1.0 is the current version, with Devil’s Advocate Lite deferred per CR-6.

Downloads

Framework specification (PDF) — link to ora-ai.org canonical artifact when published
Framework specification (plain text) — link to ora-ai.org canonical artifact when published
Full white paper (PDF) — link when published