Knowledge Artifact Coach

Overview

The Knowledge Artifact Coach (KAC) is the framework that transforms thinking into vault-ready knowledge. Whatever shape the input arrives in — a stream-of-consciousness brain dump, a finished document, a batch of related materials, or an existing note that needs improvement — KAC produces atomic notes with explicit typed relationships, complete YAML frontmatter, and the metadata the rest of Ora’s retrieval and reasoning machinery needs to work with the knowledge later. It is the framework that makes the difference between scattered notes and a knowledge substrate.

The framework runs in four modes. Mode A (raw idea) handles informal, unstructured input — a brain dump, a rough paragraph, a complex insight you’ve already synthesized but haven’t formalized. Mode B (single document) handles a finished document — a framework specification, a research summary, a processed transcript — and extracts the atomic claims buried inside. Mode C (batch) handles multiple documents at once, with cross-document deduplication, contradiction surfacing, and a unified relationship map covering the whole batch. Mode D (refine existing note) handles improvement of an existing vault note — the Refinement Assessment lists every quality check pass or specific failure, and the framework produces either a revision (preserving title) or a replacement (new title plus deletion instruction for the old note when the idea has fundamentally shifted). Mode auto-detection from input shape is the default; the user can override.

The framework’s load-bearing intellectual content is the note taxonomy with six atomic subtypes, the three grammar rules, the atomic excavation principle, and the 13-type relationship taxonomy. The note taxonomy distinguishes atomic notes (the smallest carrying-units of single discrete claims, with subtypes — fact, process_principle, definition, causal_claim, analogy, evaluative) from molecular, compound, position, glossary, MOC, and process notes. The three grammar rules govern proposition-format notes: named actors (no pronouns floating without a referent), resolved pronouns (every pronoun resolves to a named actor in the same note), concrete verbs (verbs are specific to the action, not generic). The grammar rules exist because retrieval reliability depends on every atomic note being interpretable in isolation — the RAG engine pulls atomics one at a time, and a bullet that reads cleanly in context but ambiguously in isolation degrades downstream reasoning.

The atomic excavation principle says that for every non-atomic note KAC produces, excavation must be attempted to surface the buried atomic claims inside. The default user behavior is to draft molecular or compound notes (the natural shape of synthesis); KAC’s job is to counter that default and excavate the atomic claims that compose the synthesis. A compound note with explicit excavation produces a parent compound note plus several atomic children with **Source document:** provenance links back; the parent note gets an **Extracted principles:** backlinks section. The result is that the vault accumulates value at the atomic level (where retrieval precision is highest) rather than at the compound level (where retrieval is coarse and synthesis is duplicative).

The 13-type relationship taxonomy turns atomics into a knowledge graph. Each typed link carries a confidence level (high / medium / low) and a relationship type from the closed vocabulary: supports, contradicts, qualifies, extends, supersedes, analogous-to, derived-from, enables, requires, produces, precedes, parent, child. The taxonomy is the structural decision that distinguishes a knowledge graph from a tag-cloud — a bridges-style typed link between two atomics from different sources surfaces a connection neither source named explicitly; a contradicts link surfaces a tension worth investigating; a presupposes link reveals a hidden assumption the meta-layer’s semantic-similarity engine could not infer from text alone.

The framework answers questions like: I have a stream-of-consciousness note from yesterday that contains something useful but I can’t tell what — can you help me extract the keepable parts? I just read a long article and want the durable claims out of it without losing the structure — what’s the right way to do that? I have a folder of related notes I’ve never properly organized — can you produce a unified extraction with the cross-document patterns surfaced? I have an old note that’s bothering me and I can’t articulate why — can you tell me what’s wrong with it and produce a fix?

Systemic context

The Knowledge Artifact Coach is the engram-producer of the Knowledge Production System (paired with the MindSpec Interview Framework as values-substrate producer and the Creativity from Knowledge and Values reference as conceptual frame; the system view is in [[Paper — Knowledge Production System]]). Its outputs are type: engram notes in the vault’s Engrams/ directory at YAML schema P1 retrieval tier (1.0 weight for user-authored, 0.9 for ai-derived or source-derived per the schema’s §6.5 modifier-tags). Document Processing invokes KAC in batch mode (Mode C) for source-document chunking — chunks become resource notes at P2; atomic distillations from chunks become engrams at P1 with the source-derived tag. The 13-type relationship taxonomy produces a knowledge graph that the meta-layer’s semantic-similarity engine traverses semantically; without typed links, the engine falls back to lexical similarity, which is meaningfully weaker. KAC engrams flow into CFF-formalized corpora (Coordinated Corpora declarations on Operation matrices may declare consumption of engram subsets); OFF-rendered outputs may render synthesis essays from engram subsets as their source material. The Operations Manifest’s Coordinated Corpora consumption-declaration semantics depend on the corpus being well-formed, which depends in turn on the engrams being well-formed via KAC.

Ora’s output

Example question: I have this raw paragraph from yesterday I want to keep something useful from. Can you help me extract the keepable parts?

Input: “I keep noticing that the indie hackers I follow who get to first revenue fastest are not the ones with the best products — they’re the ones who already have the audience. The audience comes first, the product second. Or maybe that’s not quite right — it’s that the audience teaches them what to build, and then the build is fast because the audience has already told them. The ‘build first then find audience’ people seem to take much longer because they have to discover both at once. So either build for an existing audience you’re already in, or build the audience first and let it tell you what to build. The ‘great product will find its audience’ assumption seems to be wrong, or at least slow.”

Ora’s response (running KAC in Mode A — raw idea):

The input is informal, unstructured, first-person reasoning — Mode A auto-selects (no document structure; reads as thinking-in-progress). The mode is confirmed; I’ll work the input through the Questioning Protocol’s five phases.

Phase 1 — Identify the synthesized insight. The user is articulating a position: that audience precedes product in indie-hacker first-revenue patterns, and that the “build first then find audience” alternative is slower because both have to be discovered at once. The position has internal complexity (a candidate version, a self-correction, an explanation of why the alternative is slower, a rejected assumption).

Phase 2 — Atomic excavation. The user has drafted a position note (compound shape — multiple claims combining into a stance). The Buried Principle Trap is the failure mode where I confirm the position without excavating the atomic claims inside. I’ll surface the candidates as a numbered list:

Candidate 1. Indie-hacker first-revenue speed correlates with prior audience presence rather than product quality. (Subtype: causal_claim; the user’s observation across the indie-hacker space, framed as a pattern.)

Candidate 2. When the audience exists first, the audience teaches the builder what to build, and the build is fast because the requirements are already discovered. (Subtype: process_principle; the mechanism that explains Candidate 1.)

Candidate 3. When the build precedes the audience, both the audience and the product must be discovered simultaneously — slowing both. (Subtype: process_principle; the contrastive case explaining why the alternative is slower.)

Candidate 4. The “great product will find its audience” assumption is empirically slow at best, possibly wrong. (Subtype: evaluative; the user’s verdict on a competing position.)

Phase 3 — Grammar rule check on each candidate. Named actors? Yes — “indie-hacker,” “audience,” “builder” all named. Resolved pronouns? Yes — no floating pronouns. Concrete verbs? “teaches,” “discovered,” “slowing” all specific. The candidates pass the grammar check.

Phase 4 — Quality check per atomic. Each candidate is the smallest carrying-unit of a single discrete claim. Each is independently complete (interpretable without the others). None can be split without information loss. None should be combined without losing the distinction the user drew. The candidates pass minimum sufficiency.

Phase 5 — Frontmatter + relationship map. Each candidate becomes a draft atomic note with YAML frontmatter and body. The four atomics carry typed relationships:

Candidate 1 — atomic note:
---
nexus: [indie-hacker-customer-acquisition]
type: engram
tags: [atomic, causal_claim, working]
subtype: causal_claim
---

# Audience precedes product in indie-hacker first-revenue speed

- The indie hackers who reach first revenue fastest are not those with the best products; they are those who already have an audience.
- The pattern is observational across the indie-hacker space; not yet causally tested but the correlation is repeatable.

*Source: user observation, 2026-05-09*
Relationship map (drafted):

Candidate 1 ←derived-from← Candidate 2 (the mechanism explains the observed pattern; high confidence)

Candidate 1 extends→ Candidate 4 (the observation supports the verdict on the competing assumption; medium confidence)

Candidate 2 contradicts→ Candidate 4’s underlying assumption (the “great product finds audience” framing presupposes audience-discovery is fast or unimportant; medium confidence)

Candidate 2 ←contradicts← Candidate 3 (the two are contrastive — same mechanism in different conditions; high confidence)

I’ll also draft the parent position note (compound) capturing the user’s overall stance, with **Extracted principles:** backlinks to all four candidate atomics. The user is invited to review the candidates and confirm, modify, or reject any before they land in the vault.

Session summary: Mode A; one source paragraph processed; 1 position note + 4 atomic notes drafted; relationship map covers all 5 notes; minimum sufficiency, grammar rules, and frontmatter checks pass on all atomics; pipeline routing — all 5 to interactive review (none auto-approve; the user is in the loop on every atomic since this is their original thinking). Vault integration: paste into Engrams/ with the position note as the parent and the four atomics as children; the four atomics carry **Source document:** provenance back to the position note; the position note gets the **Extracted principles:** backlinks section.

That is what KAC does with raw input. The user’s paragraph contained five keepable claims; the framework surfaced them as four atomics plus the parent position; the typed relationships preserve the user’s reasoning structure (mechanism → observation; contrastive cases; verdict on competing assumption); the vault gains five well-formed retrievable units rather than one note that would be hard to retrieve precisely later.

Commercial AI comparison

Comparison content auto-populates when the comparison-refresh framework runs against this question. Drafters do not author this section.

Brief comparison commentary

Auto-populates with the comparison content above.

How to use this framework

You can run the Knowledge Artifact Coach pattern with any AI of your choice. The composition is single-pass for any of the four modes.

The prompt:

[Paste the framework specification]

[Optional: state mode explicitly — KAC-A / KAC-B / KAC-C / KAC-D — or let auto-detection pick from input shape.]

Source material: [Paste your brain dump, document, batch, or existing note.]

The AI returns the mode-appropriate output: for Mode A, draft notes from the raw idea with atomic excavation; for Mode B, the document classification verdict plus extracted draft set; for Mode C, the batch inventory plus cross-document analysis plus unified deduplicated draft set; for Mode D, the refinement assessment plus revised or replacement draft. In all modes, the output includes the typed relationship map and the YAML frontmatter on every drafted note.

For best results in interactive use:

Provide nexus when you have it. The nexus property is what links the produced notes to a project or passion in your Master Matrix; supplying it up front saves a vault-integration pass later.
Don’t suppress the atomic excavation. When the framework surfaces atomic candidates from your synthesis, resist the urge to say “the synthesis already says this; the atomics are redundant.” The Obvious Claim argument is documented in the framework itself — the seemingly-obvious claims are exactly the ones that retrieve well later because they are the load-bearing assumptions you’d otherwise have to reconstruct.
Confirm the relationship types. The 13-type taxonomy is closed and load-bearing for the knowledge graph the framework produces. If a relationship is described loosely (e.g., “this connects to that”), ask the framework to specify which type from the taxonomy applies.
In Mode C (batch), let the contradictions surface. When the framework finds two notes from different sources that contradict each other, it should not silently resolve the contradiction by picking one. The contradiction surfaces to you for confirmation; the user-confirmed current position is what gets drafted; the rejected position is recorded as an extracted-and-not-drafted note in the session summary.

The framework is deliberately tool-agnostic. The note taxonomy, the three grammar rules, the atomic excavation principle, and the 13-type relationship taxonomy are conceptual disciplines that survive the lift to any environment. The output format (markdown notes with YAML frontmatter) is portable to any vault system.

Other examples

Mode B (single document) on a research article. A user submits a 3,000-word academic article on attention mechanisms in cognitive science. The framework runs Pass A (document classification: primary type compound, complexity emergent, minimum sufficient unit one major argument), Pass B (atomic excavation surfaces 12 buried atomic claims across the article — empirical findings, theoretical claims, methodological notes, definitional precisions), Pass C (quality gate routes 8 to auto-approve, 3 to human-review queue, 1 to auto-reject as not sufficiently distinct from existing engrams). The compound parent note carries the **Extracted principles:** backlinks; each extracted atomic carries **Source document:** provenance. Demonstrates buried-atomic excavation as the framework’s distinctive contribution against documents that look like single units but contain multiple keepable claims.
Mode C (batch) on a folder of related notes. A user submits seven notes accumulated over six months on the same research topic. The framework produces a Batch Inventory (seven distinct notes with individual classifications), Cross-Document Analysis (two overlapping concepts; one complementary combination; one detected contradiction awaiting user resolution; three shared vocabulary candidates), and a unified deduplicated draft set (eleven atomics; one position note; one glossary note). The detected contradiction is surfaced for user resolution; the user picks the current position; the rejected position is recorded. The unified relationship map shows both intra-document and cross-document relationships, with cross-document contradictions and analogies prioritized in display. Demonstrates batch deduplication and contradiction surfacing — the framework’s defenses against silent over-consolidation in collections.
Mode D (refine existing note) on an underperforming atomic. A user submits an existing atomic note from six months ago that has been retrieved often by RAG but always with low downstream confidence. The framework runs Refinement Assessment: minimum sufficiency check passes; grammar rule check fails on resolved pronouns (“it” in two bullets has no clear referent); frontmatter check passes; subtype check passes. The Refinement produces a revised draft preserving the title with the two pronoun failures fixed (replacing “it” with the named actor); the explanation cites the failures and the changes; the original is replaced. Demonstrates the framework’s role in the long-term hygiene of the engram corpus — old atomics that retrieve poorly are diagnosed and fixed rather than left to degrade retrieval quality.

Citations

The Knowledge Artifact Coach draws on Niklas Luhmann’s Zettelkasten methodology directly — one-claim-per-atomic, explicit typed links, emergence of structure through accumulated relationships rather than top-down hierarchy. The atomic excavation principle (the failure mode where compound notes substitute for the atomic claims inside them) is closer to information-retrieval research’s distinction between document-level and passage-level retrieval — passage retrieval (Liu et al., Karpukhin et al.) consistently outperforms document retrieval on precision-sensitive tasks, and the same pattern holds for vault retrieval where atomic notes outperform compound notes on precise-claim retrieval.

The 13-type relationship taxonomy is closer to RDF / OWL relationship typing than to free-text linking. The closed vocabulary is a deliberate constraint — open-vocabulary linking degrades to tag clouds over time; closed-vocabulary typed links produce a traversable knowledge graph. The three grammar rules (named actors, resolved pronouns, concrete verbs) draw on technical-writing best practices for retrieval reliability — the rules ensure that every atomic note is interpretable in isolation, which is the baseline assumption RAG retrieval makes when pulling notes one at a time.

The four-mode structure (Mode A raw idea / Mode B single document / Mode C batch / Mode D refine existing) emerged from observing the four input shapes that arrive in practice. Mode auto-detection from input shape was added to reduce the user-side decision burden — the framework determines the mode from the input rather than asking the user to classify their own input first. The High-Context Processing (HCP) awareness is a v5.0 addition for source documents whose internal structure is load-bearing for atomic excavation (a long technical specification’s section structure carries information that the atomic-excavation pass needs to preserve in provenance).

The framework is single-author and originated 2026-04-08; v6.0 (2026-04-23) was the F-Convert refactor to Process Formalization Framework v2.0 Anatomy with formal Input/Output Contracts, Evaluation Criteria with 5-level rubrics, Self-Evaluation layer, and the unified eight-layer mode structure (M0 routing + Layers 2–8 mode-specific).

Downloads

Framework specification (PDF) — link to ora-ai.org canonical artifact when published
Framework specification (plain text) — link to ora-ai.org canonical artifact when published
Full white paper (PDF) — link when published