Display Name
News Cluster Selector (NCS)
Display Description
Continuously scans event-clustered news data from public sources, applies the Main Street Independent consensus values floor as selection criteria, and emits selected clusters with full enrichment metadata to the article-generator framework. Layer 1 of the publication pipeline. Runs continuously, not per-invocation.
Setup Questions
Mode
Optional. The framework runs in a single mode (P-Cycle — execute one polling cycle). The mode is implicit; setup is not user-driven.
Operational profile
Required when first deployed. One of: live-publication (publishing decisions feed the article-generator framework directly), dry-run (decisions logged but not emitted; used for parameter tuning and pre-launch audit), replay (re-evaluate a fixed historical feed snapshot to test configuration revisions).
Cycle override
Optional. A specific cluster identifier or feed timestamp range to force re-evaluation of, bypassing normal polling cadence. Default behavior if absent: standard 15-minute GDELT poll plus per-feed cadences for supplementary sources.
PURPOSE
Decide what becomes news at Main Street Independent. Continuously poll public news feeds and event-clustering services; apply the consensus values floor and the publication’s source-quality, verification, and selection-budget criteria; emit selected clusters with enrichment metadata to the article-generator framework. The framework does not write articles, does not perform pen-name selection, and does not produce analysis. Selection — including held-for-development and rejected dispositions, with full audit trail — is the entire output.
INPUT CONTRACT
Required (per cycle)
- GDELT feed access: HTTP/API access to the GDELT Project’s Global Knowledge Graph and Events database. Source: https://www.gdeltproject.org/. Format: GDELT 2.0 Event and GKG records, JSON or CSV per GDELT cadence (15 min).
- Supplementary news feeds: configured RSS / API endpoints for AP, Reuters, NYT, WaPo, regional and trade outlets. Format: Atom/RSS or JSON per feed. Source: per
feed-registry.jsonconfiguration. - Source-quality rating feeds: AllSides, Ad Fontes Media, Media Bias/Fact Check, IFCN-verified fact-checker outputs, or cached dumps thereof. Source: their respective APIs or licensed dumps. Format: per
source-reliability-tiers.json. - Primary-document feeds: government press release feeds, court filing alerts (PACER and equivalent), regulatory filing alerts, peer-reviewed publication alerts. Source: per
feed-registry.json. Format: per source. - Configuration bundle (versioned, hot-reloadable):
Reference — MSI Consensus Values Floor.md— floor specification.Reference — MSI Editorial Router.md— editorial-supervisor MindSpec (queried at Layer 6).Reference — MSI Bad-Faith Techniques Catalog.json— technique catalog (consulted at Layer 5).source-disqualification-list.json— outlets disqualified from selection.source-reliability-tiers.json— current triangulated reliability ratings per outlet.selection-budget.json— daily budget, diversity caps, confidence thresholds, hold-timeout tiers.protected-category-rules.json— minor / sexual-assault-victim / pre-charge-non-public-figure handling rules.entity-resolution-config.json— Wikidata / canonical-identifier resolution parameters.feed-registry.json— feed endpoints, cadences, paid/free tier, last-known-good timestamp.
- Persistent cycle state from prior cycles: hold-queue contents with hold timestamps and floor-engagement tier; cluster identifier registry; selection-statistics rollup; last-cycle timestamp per feed; configuration-version-applied log.
- Operational profile: one of
live-publication|dry-run|replay. Source: setup.
Optional
- Feedback signals from downstream frameworks: rejection events from the article-generator’s Layer 1 / Layer 6, source-correction events from the source-correction-monitor, bad-faith-pattern flags from the article-generator’s Layer 5. Format: structured JSON event records on a feedback queue. Source: shared message bus or polled file location. Default behavior if absent: framework operates from prior reliability scoring without feedback adjustment; emits a
feedback-queue-emptylog entry per cycle. - Cycle override: forced re-evaluation target. Default behavior if absent: normal polling cadence.
Persistent reference (not per-cycle but referenced throughout)
Reference — MSI Treatise.md— editorial foundation in essay form (contextual reference for Layer 6 routing language; the runtime decision channel is the editorial-supervisor MindSpec).
OUTPUT CONTRACT
Primary outputs
- Selected cluster object — for each cluster the cycle selects, a structured object matching the article-generator’s
news-article-generatorinput schema. Destination: article-generator input queue (and parallel: pen-name framework’s data layer for independent consumption). Quality threshold: every required field populated; all source list entries carry full structured metadata; selection rationale is human-readable and cites which floor values were engaged at what supervisor-reported intensity. - Selection log entry — for each cluster evaluated this cycle, a record stating disposition (
select|reject|hold|held-and-now-final-reject), reason, configuration versions applied, and supervisor query identifiers. Destination: persistent selection log (append-only). Quality threshold: every cluster the cycle touched has exactly one entry per cycle.
Secondary outputs
- Trend signal records — topic-emergence, coverage-gap, source-quality-drift, and bad-faith-pattern signals. Destination: editorial monitoring channel (and pen-name framework’s data layer for independent uptake of below-floor floor-engaging clusters). Quality threshold: signals are tagged with the cluster IDs and source IDs that produced them; signals are not duplicated within a configurable suppression window.
- Hold-queue snapshot — current hold-queue contents with timestamps and supervisor-reported floor-engagement tier per held cluster. Destination: persistent state. Quality threshold: every held cluster has a hold-tier and a hold-expiration timestamp.
- Cycle-metrics record — counts of clusters ingested, selected, rejected, held, timed-out; configuration-versions applied; feedback-queue depth processed; feed-availability state; configuration-hot-reload events. Destination: monitoring infrastructure. Quality threshold: metric record covers every layer’s pass/fail counts.
- Updated reliability scoring — adjusted per-outlet reliability scores when feedback events have triggered scoring revisions. Destination:
source-reliability-tiers.json(versioned write with prior version preserved). Quality threshold: every revision cites the feedback events that drove it.
EXECUTION TIER
Specification (canonical, model-agnostic). This framework’s runtime in production is agent-mode with tool access for feed polling, ChromaDB writes, supervisor MindSpec queries, and configuration file reads/writes; the canonical specification documents the intellectual content from which an agent-mode rendering can be produced.
This framework is continuously running, not per-invocation. The Execution Commands block at the end specifies one polling cycle as the unit of execution. The continuous-operation wrapper — polling cadence, state persistence across cycles, configuration hot-reload, feedback-queue draining — is documented in the dedicated Continuous Operation section after the Named Failure Modes.
MILESTONES DELIVERED
This framework delivers four sequential milestones per polling cycle. Each milestone is a coherent intermediate deliverable that downstream milestones consume; each is a checkpoint where drift detection fires. The cycle is the unit of execution; persistent state carries between cycles.
Milestone 1: Cycle Initialized and Working Set Assembled
- Endpoint produced: A working set of clusters under evaluation this cycle, comprising new clusters ingested from feeds plus held clusters whose re-evaluation timer has fired; configuration-version manifest stating which version of every config file is in effect for this cycle; feedback-queue events processed and applied to in-memory reliability state; feed-availability state recorded (live / degraded / unavailable per feed).
- Verification criterion: Every cluster in the working set has a stable cluster identifier; configuration-version manifest names all eight config files with version identifiers; feedback events drained to depth 0 or to the cycle’s feedback-budget cap with a recorded depth at cap; feed-availability state records pass/fail for every feed in
feed-registry.json; held clusters whose hold-expiration is in the past have been moved either back to the working set for re-evaluation or to final-reject disposition. - Layers covered: 1, 2
- Required prior milestones: None
- Gear: 4
- Output format: See Layer 2 Output Format.
- Drift check question: Does the working set faithfully reflect what the feeds and hold queue actually contain at cycle start, and is every config file’s version locked for the duration of this cycle?
Milestone 2: Clusters Enriched and Source-Screened
- Endpoint produced: Each cluster in the working set is enriched with resolved entities (Wikidata QIDs or equivalent), resolved geography to ≥city/county level where supported by the source material, temporal anchor with primary and subsidiary timestamps, full source-quality metadata per source (originating-vs-republishing, source-class, triangulated reliability score, disqualification check), and source-pattern flags from the bad-faith catalog (manufactured-controversy, coordinated-message-discipline, flooding-the-zone, astroturfing patterns where detected). Clusters that fail disqualification gates (all sources disqualified, or source composition cannot meet ≥2 originating OR 1 originating + 1 primary document) are removed from the working set with a reject record.
- Verification criterion: Every cluster surviving to Milestone 2 has resolved primary entities or carries explicit unresolved-entity flags; every source in every cluster carries a reliability tier and an originating-vs-republishing classification; clusters removed at this milestone have a reject log entry citing the specific gate failed; source-pattern flags reference catalog technique IDs and the specific evidence for the flag.
- Layers covered: 3, 4, 5
- Required prior milestones: M1
- Gear: 4
- Output format: See Layer 5 Output Format.
- Drift check question: Does every cluster in the working set carry the metadata the supervisor needs to perform floor-engagement scoring, and have no clusters been silently dropped without a reject record?
Milestone 3: Selection Decisions Made
- Endpoint produced: For each surviving cluster, a disposition of
select|reject|hold, with: floor-engagement scores per floor value as returned by the editorial-supervisor MindSpec; selection-budget application result (within budget / displaces lower-priority cluster / would-exceed-budget); selection-confidence number; routing decision; forholddispositions, the hold-tier and hold-expiration timestamp; forrejectdispositions, recoverable-vs-final classification. - Verification criterion: Every surviving cluster has exactly one disposition; supervisor query identifiers are recorded for every cluster scored; for each
selectdisposition, the cluster meets all MUST gates (≥2 originating OR 1 + primary; no disqualifying sources; no protected-category issues without override; floor engagement above configured threshold; entity resolution complete; geographic resolution to ≥city/county; reliability triangulation above floor); selection-budget application is consistent withselection-budget.jsonparameters and topic-diversity cap; held clusters’ hold-tier corresponds to the supervisor’s floor-engagement tier per the hold-tier mapping inselection-budget.json. - Layers covered: 6, 7
- Required prior milestones: M2
- Gear: 4
- Output format: See Layer 7 Output Format.
- Drift check question: Does every selection decision rest on a recorded supervisor query result, and does every selected cluster meet every MUST gate without a single gate being silently waived?
Milestone 4: Cycle Emitted and State Persisted
- Endpoint produced: Selected cluster objects emitted to the article-generator input queue per the article-generator’s input schema; trend signal records emitted to the editorial monitoring channel and pen-name data layer; hold-queue snapshot persisted; cycle-metrics record emitted; selection log entries appended; updated reliability scoring written if feedback events triggered revisions; feedback queue advanced past processed events; cycle-completion timestamp recorded against each polled feed.
- Verification criterion: Every cluster with a
selectdisposition in M3 has been emitted exactly once to the article-generator input queue and is parseable against the article-generator’s input schema; every reject disposition produced a selection-log entry; every hold disposition was written to the hold-queue snapshot with its hold-expiration timestamp; trend signals are deduplicated against the configurable suppression window; cycle metrics record covers ingested, selected, rejected, held, and timed-out counts; reliability scoring revisions, if any, were written to a versionedsource-reliability-tiers.jsonwith the prior version preserved. - Layers covered: 8, 9, 10
- Required prior milestones: M3
- Gear: 4
- Output format: See Layer 8 Output Format and Layer 10 Output Format.
- Drift check question: Does the cycle’s emitted output, persisted state, and cycle metrics accurately reflect every disposition made in M3, with no cluster lost between decision and emission and no decision lost between cycles?
EVALUATION CRITERIA
This framework’s output is evaluated against these ten criteria. Each criterion is rated 1–5. Minimum passing score: 3 per criterion. Scores below threshold trigger the remediation protocol in Layer 9.
-
Selection Accuracy:
- 5 (Excellent): Every selected cluster meets every MUST gate; supervisor floor-engagement query results are correctly applied; rationale text accurately reports which floor values are engaged at the supervisor’s reported intensity; sample audit of 10% of selections finds zero gate violations.
- 4 (Strong): Every selected cluster meets every MUST gate; minor variance between rationale text and supervisor’s reported floor-engagement intensities for at most 1 selection in the cycle.
- 3 (Passing): Every selected cluster meets every MUST gate; rationale text covers the engaged floor values though wording may not match the supervisor’s exact phrasing.
- 2 (Below threshold): One or more selected clusters fail one or more MUST gates; or rationale text materially misrepresents the supervisor’s floor-engagement assessment.
- 1 (Failing): Multiple selected clusters fail MUST gates; or rationale text fabricates floor-engagement that the supervisor did not return.
-
Verification Fidelity:
- 5 (Excellent): Every selected cluster has ≥2 originating sources OR 1 originating + 1 primary document, verified by Layer 4 classification; zero clusters from the disqualification list slipped through; reliability triangulation is above the configured floor for every selection.
- 4 (Strong): Same as 5 with at most one borderline-reliability selection that was manually flagged with an explicit override.
- 3 (Passing): Every selected cluster meets the originating-source requirement; no disqualified-source slip-through; reliability triangulation meets the configured floor.
- 2 (Below threshold): One or more selections lack the required source diversity; or one or more disqualified-list outlets appeared as cluster sources without being flagged.
- 1 (Failing): Wire-cascade selections present (cluster contains many sources but only one originating); or disqualification list was stale beyond the configured freshness window.
-
Source-Pattern Detection:
- 5 (Excellent): Every cluster whose source composition triggers a bad-faith catalog pattern (manufactured-controversy, coordinated-message-discipline, flooding-the-zone, astroturfing) carries a source-pattern flag with technique ID, evidence, and falsification-clause check; flagged clusters that fail the pattern’s falsification clause are de-weighted in selection-budget application.
- 4 (Strong): Same as 5 with at most one missed coordinated pattern that the cycle would have caught with a follow-up cluster.
- 3 (Passing): All four coordinated-pattern technique IDs are checked for every cluster; flagged clusters carry the technique ID; falsification clauses are evaluated.
- 2 (Below threshold): One or more coordinated patterns are not checked; or flagged clusters are selected without the de-weighting being applied.
- 1 (Failing): Manufactured-controversy clusters selected without flagging; or coordinated-pattern checks are not run.
-
Floor-Value Scoring Fidelity:
- 5 (Excellent): Every cluster’s floor-engagement scoring traces to a recorded supervisor MindSpec query result; supervisor return values are applied without modification; the framework does not reinterpret, re-weight, or fabricate scores; selection rationale is literally the supervisor’s returned rationale.
- 4 (Strong): Same as 5 with rationale text condensed for length but preserving the supervisor’s specific floor-value identifications and intensities.
- 3 (Passing): Every cluster’s scoring traces to a supervisor query; supervisor return values applied; rationale preserves the engaged floor values.
- 2 (Below threshold): One or more cluster scores were generated without a supervisor query (the framework computed its own); or supervisor return values were modified before application.
- 1 (Failing): The framework substituted its own floor-engagement reasoning for the supervisor’s; or rationale claims floor-engagement the supervisor did not return.
-
Selection-Budget Compliance:
- 5 (Excellent): Daily-article cap from
selection-budget.jsonnot exceeded; topic-diversity cap (default 25% per primary entity/topic) not exceeded; geographic-diversity preference applied as tiebreaker among equivalent-confidence clusters; selection-confidence threshold honored. - 4 (Strong): Same as 5 with topic-diversity cap exceeded by at most one cluster on a single topic, with the displacement justified by selection-confidence margin.
- 3 (Passing): Daily cap, topic-diversity cap, and confidence threshold all honored; geographic-diversity preference applied where multiple equivalent candidates competed.
- 2 (Below threshold): Daily cap exceeded; or topic-diversity cap exceeded by more than one cluster; or confidence threshold violated.
- 1 (Failing): Selection-budget logic ignored; cycle output exceeds the daily cap by a wide margin or concentrates on one topic.
- 5 (Excellent): Daily-article cap from
-
Held-Cluster Discipline:
- 5 (Excellent): Held clusters carry hold-tier matching the supervisor’s floor-engagement tier; hold-expiration timestamps follow the tier mapping in
selection-budget.json; expired holds are processed exactly once per cycle (re-evaluated if new sources emerged, final-rejected otherwise); zero clusters held indefinitely beyond their tier’s timeout. - 4 (Strong): Same as 5 with at most one held cluster whose hold-expiration was extended explicitly because new partial corroboration emerged near expiry.
- 3 (Passing): Held clusters have hold-tier and hold-expiration; expired holds are processed each cycle.
- 2 (Below threshold): One or more held clusters lack a hold-expiration; or expired holds were not processed in their cycle.
- 1 (Failing): Hold queue grows without bound; clusters are held without tier assignment; expired holds are not surfaced.
- 5 (Excellent): Held clusters carry hold-tier matching the supervisor’s floor-engagement tier; hold-expiration timestamps follow the tier mapping in
-
Audit-Trail Completeness:
- 5 (Excellent): Every cluster the cycle touched produced exactly one selection-log entry with disposition, configuration versions applied, and supervisor query identifier; the persistent selection log is append-only and recoverable; the configuration-version manifest names all eight config files; replay against historical state is feasible from log alone.
- 4 (Strong): Same as 5 with at most one log entry missing a non-essential field (e.g., supervisor query identifier present but timestamp imprecise to the cycle granularity).
- 3 (Passing): Every cluster produced a log entry; configuration versions recorded; replay possible with minor reconstruction.
- 2 (Below threshold): One or more clusters’ dispositions are not in the log; or configuration versions are not recorded.
- 1 (Failing): Selection log not append-only; or replay not feasible from log; or configuration drift not traceable.
-
Configuration Responsiveness:
- 5 (Excellent): Hot-reload of every config file at cycle start is verified by checksum; configuration-version manifest is locked for the cycle’s duration; stale configurations (beyond the configured freshness window — e.g., disqualification-list older than 30 days) trigger explicit
stale-configwarnings in cycle metrics; the supervisor MindSpec version is locked to the cycle. - 4 (Strong): Same as 5 with a single non-critical config (e.g., dead-metaphors.json, which this framework does not consume) older than its freshness window without warning.
- 3 (Passing): Configurations re-read on each cycle; version manifest produced; critical-config staleness produces warnings.
- 2 (Below threshold): One or more critical configs not re-read; or version manifest missing; or staleness not detected.
- 1 (Failing): Configuration changes do not take effect within the cycle of their write; or framework operates on indeterminate configuration state.
- 5 (Excellent): Hot-reload of every config file at cycle start is verified by checksum; configuration-version manifest is locked for the cycle’s duration; stale configurations (beyond the configured freshness window — e.g., disqualification-list older than 30 days) trigger explicit
-
Feedback-Loop Integration:
- 5 (Excellent): Feedback events from downstream frameworks are processed each cycle up to the configured budget; reliability scoring revisions cite the feedback events that drove them; revisions are versioned with prior
source-reliability-tiers.jsonpreserved; pattern of feedback events that consistently produces revisions is surfaced as a trend signal; over-correction is bounded by the configured maximum revision-per-cycle. - 4 (Strong): Same as 5 with at most one feedback event left unprocessed at end of cycle and explicitly recorded as deferred.
- 3 (Passing): Feedback events drained to depth 0 or budget cap; reliability revisions, if any, are versioned and cited.
- 2 (Below threshold): Feedback events accumulate without processing; or revisions are written without citing the events.
- 1 (Failing): Feedback queue ignored; or revisions cause oscillation in reliability scoring (Hold-queue Pileup Trap or Feedback Overcorrection Trap fires).
- 5 (Excellent): Feedback events from downstream frameworks are processed each cycle up to the configured budget; reliability scoring revisions cite the feedback events that drove them; revisions are versioned with prior
-
Output-Schema Compliance:
- 5 (Excellent): Every emitted cluster object validates against the article-generator’s input schema with zero field omissions and zero type mismatches; trend-signal records validate against the trend-signal schema; cycle-metrics records validate against the metrics schema; the pen-name framework’s independent data-layer consumer can read the same cluster object with no transform required.
- 4 (Strong): Same as 5 with a single optional-field omission that has a documented default.
- 3 (Passing): Every emitted cluster validates against the article-generator schema; trend signals and cycle metrics validate.
- 2 (Below threshold): One or more emitted clusters fail schema validation; or trend signals are emitted without required fields.
- 1 (Failing): Schema validation skipped; or article-generator framework rejects emitted clusters at its Layer 1 due to malformed input.
LAYER 1: CYCLE INITIALIZATION AND CONFIGURATION HOT-RELOAD
Stage Focus: Open a new polling cycle, lock the configuration version manifest, and integrate downstream feedback into in-memory reliability state.
Input: Persistent state from the prior cycle (hold queue, cluster registry, last-feed-poll timestamps, prior reliability scoring); current contents of every configuration file in the configuration bundle; current contents of the feedback queue.
Output: A configuration-version manifest naming each config file and its content checksum; a freshness-warning list for any config older than its configured freshness window; an updated in-memory reliability state reflecting feedback events processed; a feedback-queue cursor advanced past processed events.
Processing Instructions
- Open a cycle log entry with the cycle’s start timestamp.
- Read every file in the configuration bundle and compute a content checksum for each. Lock these checksums for the cycle’s duration. Record in the configuration-version manifest:
consensus-values-floor,editorial-supervisor-mindspec,bad-faith-techniques-catalog,source-disqualification-list,source-reliability-tiers,selection-budget,protected-category-rules,entity-resolution-config,feed-registry. IF any required config file is unreadable, THEN halt the cycle and emit acycle-haltevent citing the missing file; do not proceed to Layer 2. - Compute config-age for each file using the
last-modifiedfield in the file’s frontmatter or filesystem metadata. IF any config-age exceeds its configured freshness window perselection-budget.json(default: 30 days for disqualification list, 14 days for reliability tiers, 90 days for floor and MindSpec, 180 days for protected-category rules), THEN add astale-configwarning to the freshness-warning list. Stale configs do not halt the cycle; they propagate to the cycle-metrics record. - Drain the feedback queue up to the configured per-cycle feedback-budget cap (default: 200 events). For each feedback event:
- IF the event is a downstream-rejection event (article-generator’s Layer 1 or Layer 6 rejected an article generated from a cluster this framework selected), THEN apply a small downward adjustment to the originating sources’ reliability scores per the formula in
source-reliability-tiers.jsonadjustment policy. - IF the event is a source-correction-monitor event citing high correction rates on an outlet, THEN apply the configured correction-rate adjustment.
- IF the event is a bad-faith-pattern event from the article-generator’s Layer 5, THEN record the source-pattern signal against the cited outlet for use in Layer 5 of this framework.
- IF the event is a reader-engagement signal (when available), THEN treat as informational only; do not adjust reliability. Cap total per-outlet reliability adjustment per cycle at the maximum revision-per-cycle parameter (default: ±0.05 on the 1–5 tier scale) to prevent oscillation.
- IF the event is a downstream-rejection event (article-generator’s Layer 1 or Layer 6 rejected an article generated from a cluster this framework selected), THEN apply a small downward adjustment to the originating sources’ reliability scores per the formula in
- Mark every feedback event processed. Advance the feedback-queue cursor. IF the queue depth at end of drain remains above 0 (events deferred to next cycle), THEN record the residual depth in cycle metrics and set a
feedback-budget-saturatedflag. - Confirm the editorial-supervisor MindSpec is loadable as a callable supervisor at the version recorded in the manifest. IF the supervisor cannot be loaded, THEN halt the cycle with a
supervisor-unavailableevent.
Invariant check before proceeding: confirm the configuration-version manifest names all nine required config files with checksums; confirm the supervisor is callable; confirm no required input from the Input Contract has been silently dropped; confirm the cycle-log entry is open and the prior cycle’s state has not been mutated except in the controlled feedback-application step above.
Output Format for This Layer
CYCLE LOG ENTRY (open):
cycle_id: <uuid>
cycle_start: <ISO-8601>
operational_profile: live-publication | dry-run | replay
configuration_version_manifest:
consensus_values_floor: <checksum>
editorial_supervisor_mindspec: <checksum, mindspec_version>
bad_faith_techniques_catalog: <checksum, catalog_version>
source_disqualification_list: <checksum, list_version>
source_reliability_tiers: <checksum, tiers_version>
selection_budget: <checksum, budget_version>
protected_category_rules: <checksum, rules_version>
entity_resolution_config: <checksum, config_version>
feed_registry: <checksum, registry_version>
freshness_warnings: [<config_name>, …]
feedback_events_processed: <count>
feedback_residual_depth: <count>
reliability_adjustments_applied: [{outlet_id, prior_tier, new_tier, evidence_event_ids}, …]
supervisor_load_status: ok | failed
LAYER 2: SOURCE FEED POLLING AND CLUSTER INGESTION
Stage Focus: Pull new event-cluster data from every configured feed at its cadence; merge with held clusters whose re-evaluation timer has fired; produce a single working set for this cycle.
Input: Configuration-version manifest from Layer 1; last-feed-poll timestamps from persistent state; current hold queue from persistent state; live feed endpoints per feed-registry.json.
Output: A working set of cluster objects under evaluation this cycle, each with stable cluster identifier and origin tag (new-from-feed | held-and-due-for-re-evaluation | held-and-expired-for-final-disposition); per-feed availability state (live | degraded | unavailable).
Processing Instructions
- For each feed in
feed-registry.jsonwhose poll cadence has elapsed since the last poll timestamp:- Issue the feed request. IF the feed is reachable and returns valid data, THEN record availability
liveand proceed; ELSE IF the feed returns partial data or rate-limit error, THEN recorddegradedand use whatever data was returned; ELSE recordunavailableand skip this feed for this cycle. - Update the per-feed last-poll timestamp.
- Issue the feed request. IF the feed is reachable and returns valid data, THEN record availability
- For GDELT, parse Event and GKG records into cluster objects using GDELT’s native event-clustering. For supplementary feeds, apply the configured clustering heuristic (entity overlap, temporal proximity, source-cross-reference). For primary-document feeds, treat each document as a candidate cluster anchor that may merge with later news-feed clusters.
- For each new cluster from feeds: assign a stable cluster identifier (UUID), tag origin as
new-from-feed, attach the raw source list with URLs and outlet metadata. - For each held cluster in the hold queue:
- IF the hold-expiration timestamp is in the future AND new sources have not emerged for this cluster since the last evaluation, THEN leave in the hold queue; do not include in the working set.
- IF new sources have emerged for this cluster since the last evaluation (detected via cluster-identifier match in fresh feed pulls), THEN tag origin as
held-and-due-for-re-evaluationand include in the working set with merged source list. - IF the hold-expiration timestamp is in the past AND no new sources have emerged, THEN tag origin as
held-and-expired-for-final-dispositionand include in the working set; this cluster will receive a final-reject disposition unless the re-evaluation produces a different outcome. - IF the hold-expiration timestamp is in the past AND new sources have also emerged, THEN tag origin as
held-and-due-for-re-evaluation(the new-source re-evaluation supersedes timeout).
- Deduplicate the working set by cluster identifier. IF a held cluster and a new-from-feed cluster collide on identifier, THEN merge into a single
held-and-due-for-re-evaluationentry preserving both source lists. - Apply the cycle override, if present, to force inclusion of a specific cluster identifier or feed-timestamp range; tag origin as
cycle-override. - Apply the degraded-feed selection floor: IF any required feed is unavailable AND the framework is in
live-publicationprofile, THEN record the impact and set alimited-source-availabilityflag on every cluster ingested this cycle. The flag does not gate selection; it informs Layer 4 source-quality assessment, which will not waive its minimum-source requirements regardless of feed availability.
Invariant check before proceeding: confirm every cluster in the working set has a stable cluster identifier and an origin tag; confirm the feed-availability state covers every feed in feed-registry.json; confirm the hold-queue has been correctly partitioned (expired clusters either re-included for re-evaluation or staged for final disposition; live clusters either re-included if new sources OR left in queue if not).
Output Format for This Layer
WORKING_SET:
cycle_id: <from Layer 1>
cluster_count: <int>
clusters:
- cluster_id: <uuid>
origin: new-from-feed | held-and-due-for-re-evaluation | held-and-expired-for-final-disposition | cycle-override
ingestion_timestamp: <ISO-8601>
raw_source_list: [{url, outlet, raw_publication_date, raw_author, raw_title, full_text_or_abstract}, …]
gdelt_event_ids: [<id>, …] # if applicable
limited_source_availability_flag: bool
hold_state: # present only if origin is held-*
hold_tier: 1 | 2 | 3
hold_started: <ISO-8601>
hold_expiration: <ISO-8601>
prior_hold_evaluations: <int>
feed_availability_state:
- feed_id: <name>
status: live | degraded | unavailable
last_poll: <ISO-8601>
error_detail: <string or null>
LAYER 3: CLUSTER ENRICHMENT — ENTITIES, GEOGRAPHY, TEMPORAL ANCHORING
Stage Focus: Resolve the structural metadata that downstream layers and the article-generator both require — named entities to canonical identifiers, geography to ≥city/county level, temporal anchor with subsidiary timestamps.
Input: Working set from Layer 2; entity-resolution-config.json from the configuration manifest.
Output: Each cluster augmented with resolved_entities, resolved_geography, temporal_anchor, and unresolved_flags fields.
Processing Instructions
- For each cluster in the working set, extract named entities from the raw source list using the entity-extraction policy in
entity-resolution-config.json(typically: NER over headlines and ledes; subject-of-quote extraction from body; primary-actor identification per GDELT’s GKG entities where available). - Resolve each extracted entity to a canonical identifier (Wikidata QID by default; alternate canonical IDs per
entity-resolution-config.json). For each entity:- IF resolution is unambiguous, THEN store the QID and entity type (one of
person | organization | place | otherper the article-generator’s input schema). - IF resolution is ambiguous (multiple QID candidates), THEN apply the disambiguation policy (use surrounding entities as context, prefer most-cited candidate per Wikidata pageviews) and store the chosen QID with a
disambiguation_appliedflag. - IF resolution fails (no candidate QID found), THEN store the unresolved entity verbatim with a
resolution_failedflag. For each entity, additionally compute and attach: is_public_figure(boolean) — derived fromentity-resolution-config.jsonpolicy: officeholders, declared candidates, named C-suite executives, public-record litigants, and similar categories per the file’s classification rules.is_protected_category_member(boolean with category) — derived fromprotected-category-rules.jsonagainst the cluster’s source material context: minor (under 18), sexual-assault victim per source identification, pre-charge non-public-figure suspect, asylum seeker, mental-health subject, or other configured category. The boolean istrueif any category applies; the category itself is recorded inprotected_category_detail.
- IF resolution is unambiguous, THEN store the QID and entity type (one of
- Resolve geographic locations. The cluster passes the geographic-specificity gate when the primary location resolves to ≥city/county level (Wikidata-class for human settlement, county-equivalent administrative division, or finer). IF the cluster engages an event without a primary location (e.g., a national-policy story without a single locus), THEN apply the configured
geographic_scopeofnational|multi-state|internationaland record the scope as the resolved geography. IF geographic resolution fails entirely (location named but unresolvable), THEN flaggeography_unresolved. - Compute the temporal anchor: extract the primary event timestamp (most-cited event date across cluster sources) and any subsidiary timestamps (when the event was first reported, when the cluster was last updated by source emergence). Compute
recency_class:breakingif primary timestamp is within the past 24 hours.recentif within 24–72 hours.developingif within 72 hours but cluster has shown source-emergence activity (a held cluster that is now re-evaluated after additional sources arrived).olderif primary timestamp is over 72 hours and no source-emergence activity.
- Compute the
unresolved_flagssummary: list any flags raised during this layer (disambiguation_applied,resolution_failed,geography_unresolved, primary-actor entities that could not be resolved). The flags do not gate the cluster at this layer; they inform Layer 4 and Layer 7.
Invariant check before proceeding: confirm every cluster in the working set has a resolved_entities field, a resolved_geography field, a temporal_anchor field, and an unresolved_flags summary (which may be empty); confirm no cluster has been silently dropped.
Output Format for This Layer
Working set as in Layer 2, with each cluster augmented:
cluster:
…Layer 2 fields…
resolved_entities:
- surface_form: <string>
canonical_id: <Wikidata QID or alternate>
entity_type: person | organization | place | other
role: primary_actor | secondary_actor | mentioned
is_public_figure: bool
is_protected_category_member: bool
protected_category_detail: <string or null>
flags: [disambiguation_applied | resolution_failed]
resolved_geography:
primary_location: <Wikidata QID for place> | <geographic_scope: national | multi-state | international> | unresolved
coordinates: [lat, lon] or null
administrative_hierarchy: [<country>, <state>, <county>, <city>] or scope description
temporal_anchor:
primary_event_timestamp: <ISO-8601>
first_reported_timestamp: <ISO-8601>
last_updated_timestamp: <ISO-8601>
recency_class: breaking | recent | developing | older
unresolved_flags: [<flag>, …]
LAYER 4: SOURCE QUALITY ASSESSMENT AND DISQUALIFICATION SCREENING
Stage Focus: Classify every source by originating-vs-republishing and source-class, apply triangulated reliability scoring, screen against the disqualification list, verify the minimum-source-composition gate.
Input: Working set from Layer 3; source-reliability-tiers.json, source-disqualification-list.json from the configuration manifest.
Output: Each cluster augmented with full per-source metadata; clusters that fail the disqualification or minimum-source gates removed from the working set with reject log entries.
Processing Instructions
- For each source in each cluster, classify:
- Originating vs. republishing: an originating source is one that conducted independent reporting, original interview, or first-publication of a primary document. A republishing source carries content sourced from another outlet (wire pickup, press-release re-publication, syndicated content). Use the per-outlet originating-vs-republishing default in
source-reliability-tiers.jsonand override based on per-article cues (byline reads “AP”; URL contains/wire/or/press-release/; outlet is a known wire republisher). - Source-class: assign exactly one of
wire|national_daily|regional|trade|primary_document|government_release|court_filing|peer_reviewed|press_release|social_media|otherper the per-outlet source-class mapping insource-reliability-tiers.json, with override based on URL path (e.g., outlet’s press-release subpath becomespress_release). - Reliability tier: 1–5 derived from the triangulated AllSides + Ad Fontes + MBFC scoring per the methodology in
source-reliability-tiers.json(Ground News-style averaging with disagreement-flag handling). For sources without a tier in the file, assign tier 3 (neutral default) and add areliability_tier_inferredflag.
- Originating vs. republishing: an originating source is one that conducted independent reporting, original interview, or first-publication of a primary document. A republishing source carries content sourced from another outlet (wire pickup, press-release re-publication, syndicated content). Use the per-outlet originating-vs-republishing default in
- Screen each source against
source-disqualification-list.json. IF an outlet appears on the list, THEN remove that source from the cluster and record adisqualified_source_removedevent. IF a cluster’s surviving source list is empty after removal, THEN move the cluster to reject with reasonall_sources_disqualified. - Apply the minimum-source-composition gate: a cluster must have ≥2 originating sources OR 1 originating source + 1 primary document. IF the gate fails AND the cluster’s origin is
held-and-expired-for-final-disposition, THEN issue a final-reject disposition with reasoninsufficient_corroboration_at_timeoutand remove from the working set. IF the gate fails AND the cluster is otherwise eligible (floor engagement plausible, no other gates failed), THEN do NOT remove; the cluster will be evaluated for hold disposition at Layer 7. Tag withawaits_corroboration. - Compute the cluster-level reliability triangulation: average reliability tier across surviving sources, weighted by source-class (primary documents and peer-reviewed sources weight 1.5×; wire originating-class 1.2×; press releases 0.5×; social media 0.3×; per
source-reliability-tiers.jsonweighting policy). IF the cluster-average reliability falls below the configured cluster-floor (default 2.5 on the 1–5 scale), THEN move the cluster to reject with reasoncluster_reliability_below_floor. - Check for source-class diversity: cluster contains ≥1 source from a publication with a published corrections policy (per
source-reliability-tiers.jsoncorrections_policy: trueflag). The diversity check is a SHOULD, not a MUST; it informs Layer 7 selection-confidence rather than gating.
Invariant check before proceeding: confirm every surviving cluster has full per-source metadata; confirm reject log entries were produced for every cluster removed at this layer; confirm the working set count equals (Layer 3 count) − (clusters rejected here) − (no other deletions).
Output Format for This Layer
Working set with per-source enrichment plus a reject log:
cluster:
…Layer 3 fields…
sources:
- source_id: <stable_in_cluster_id>
url: <canonical>
outlet: <name>
outlet_class: wire | national_daily | regional | trade | primary_document | government_release | court_filing | peer_reviewed | press_release | social_media | other
author: <string or null>
publication_date: <ISO-8601>
title: <string>
access_date: <ISO-8601>
reliability_tier: 1 | 2 | 3 | 4 | 5
reliability_tier_inferred: bool
originating_or_republishing: originating | republishing
flags: [disqualified_source_removed (if survives by being adjacent to the removal) | corrections_policy]
cluster_source_summary:
originating_count: <int>
primary_document_count: <int>
cluster_average_reliability: <float>
source_class_diversity_met: bool
minimum_source_composition_gate: pass | fail | awaits_corroboration
cycle_reject_log_appendix (this layer):
- cluster_id: <uuid>
reason: all_sources_disqualified | cluster_reliability_below_floor | insufficient_corroboration_at_timeout
recoverable: bool
sources_removed: [<source_id>, …]
LAYER 5: SOURCE-PATTERN MONITORING — BAD-FAITH COORDINATED-PATTERN DETECTION
Stage Focus: Detect clusters whose existence is being driven by manufactured-controversy or coordinated-message-discipline patterns rather than by underlying floor-engaging events.
Input: Working set from Layer 4; Reference — MSI Bad-Faith Techniques Catalog.json from the configuration manifest, restricted to category coordinated_pattern; outlet-level pattern signals carried forward from feedback events in Layer 1.
Output: Each cluster augmented with a source_pattern_flags array citing technique IDs from the catalog where evidence triggers a flag and the technique’s falsification clause is not met.
Processing Instructions
- For each cluster, evaluate the five
coordinated_patterntechniques from the catalog:manufactured_controversy: surface trigger is a cluster whose floor-engaging substance rests on contested factual claims at high amplitude across narrowly-aligned sources, with disproportionate framing relative to verified scale. Detection: source-set is dominated (>60%) by outlets sharing a known coordinated-messaging pattern OR primary-document corroboration is absent despite high source count. Falsification: independent fact-checker outputs (IFCN-verified) treat the underlying factual basis as established OR primary-document corroboration is present in the cluster.coordinated_message_discipline: surface trigger is near-verbatim phrasing across multiple ostensibly-independent sources within a narrow time window. Detection: ≥3 cluster sources share a distinctive phrasing not present in the original event documentation, within a 24-hour window of cluster emergence. Falsification: the shared phrasing originates from a primary document or wire lede that all the sources cite.flooding_the_zone: surface trigger is a sudden burst of high-volume coverage on a topic from a narrow ideological band, with low information density per article. Detection: source-set shows ≥5 outlets from a single ideological band per AllSides clustering producing coverage within a 6-hour window, with average article length under 600 words and high inter-article paraphrase overlap. Falsification: a major event (declared emergency, court filing, official announcement) explains the volume.goalpost_shifting: surface trigger only fires for held clusters whose source emergence shows a pattern of redefining the contested claim across re-evaluation cycles. Detection: prior hold-evaluations recorded a different primary claim than the current evaluation, and the redefinition tracks the way the underlying evidence shifted. Falsification: the redefinition reflects new primary-document evidence rather than rhetorical adjustment.overton_window_manipulation: surface trigger is rare at the selection layer because it operates over time scales longer than a single cluster. Detection: outlet-level pattern signals from feedback (carried forward from Layer 1) flag this cluster’s primary outlets as participating in a documented Overton-shift campaign on the cluster’s topic, per recorded feedback events. Falsification: the framework does not have outlet-level pattern signals on this topic.
- For each technique whose detection signal fires, evaluate the falsification clause. IF the falsification clause is met by available evidence, THEN do not flag. IF the falsification clause is not met, THEN add a
source_pattern_flagto the cluster citing the technique ID, the specific evidence that triggered detection, and the absence of falsifying evidence. - Source-pattern flags do NOT gate the cluster at this layer; they propagate to Layer 7 selection-budget application, where flagged clusters receive de-weighting (configurable per-flag de-weight in
selection-budget.json, default: −0.15 to selection confidence per flag, with a floor at 0). - Apply the principle of consistency from the catalog: the same detection criteria run across all clusters regardless of which ideological band the cluster’s sources are drawn from. A cluster with a
flooding_the_zoneflag whose sources are from one ideological band is treated identically to a cluster with the same flag whose sources are from any other band. - Record any cluster that produces a
source_pattern_flagas a candidate for the trend-signal output at Layer 8.
Invariant check before proceeding: confirm every cluster has a source_pattern_flags field (empty array if no flags fired); confirm flag entries cite the technique ID, evidence, and falsification status; confirm consistency standard was applied (the cycle’s flag distribution does not show systematic asymmetry by ideological band that the underlying source distribution does not predict).
Output Format for This Layer
cluster:
…Layer 4 fields…
source_pattern_flags:
- technique_id: manufactured_controversy | coordinated_message_discipline | flooding_the_zone | goalpost_shifting | overton_window_manipulation
evidence: <short string citing detection signal>
falsification_evaluated: bool
falsified: bool
flag_active: bool
catalog_version: <from Layer 1 manifest>
LAYER 6: FLOOR-VALUE ENGAGEMENT SCORING — EDITORIAL-SUPERVISOR MINDSPEC QUERY
Stage Focus: For each cluster, query the editorial-supervisor MindSpec to produce floor-value engagement scores. The supervisor is the canonical source for floor-engagement assessment; the framework does not reinterpret, re-weight, or fabricate scores.
Input: Working set from Layer 5; the editorial-supervisor MindSpec at the version locked in the Layer 1 manifest; Reference — MSI Consensus Values Floor.md (loaded by the supervisor, not by this framework directly).
Output: Each cluster augmented with floor_engagement_scores per the five floor values, supervisor-returned rationale, and a supervisor_query_id for audit trace.
Processing Instructions
- For each cluster, assemble a supervisor query payload containing: the cluster’s resolved entities, resolved geography, temporal anchor, source list with reliability metadata, source-pattern flags, and primary claims extracted from the cluster source material. Do not include raw source full-text at this layer; the supervisor scores against the structured cluster metadata.
- Issue the query to the editorial-supervisor MindSpec. The supervisor returns:
- Per-floor-value engagement intensity (continuous 0.0–1.0 on each of the five floor values: human life and dignity; truthfulness; accountability of power; equality and fairness; informed citizenship).
- Per-floor-value rationale text (the supervisor’s identification of which features of the cluster engage that floor value).
- An overall floor-engagement summary (the maximum or weighted-sum of per-value intensities, per the supervisor’s internal governance).
- Bad-faith pattern signals at the claim level (distinct from this framework’s source-pattern flags at the cluster level).
- A floor-crossing-risk assessment (whether any plausible article from this cluster would require the publication’s voice to adopt a floor-crossing perspective).
- Record the
supervisor_query_id(returned by the supervisor) alongside the response. Apply the supervisor’s return values to the cluster without modification:- The framework does NOT reinterpret intensities.
- The framework does NOT re-weight intensities against its own criteria.
- The framework does NOT fabricate floor-engagement that the supervisor did not return.
- IF the supervisor’s response is malformed or the query times out, THEN retry once; on second failure, mark the cluster
supervisor_query_failedand route it to hold disposition at Layer 7 with the failure as the hold reason.
- Apply the per-floor-value engagement-threshold filter: a cluster qualifies for floor engagement when at least one floor value’s intensity exceeds the configured per-value threshold OR when two or more floor values exceed the configured intersection threshold (lower per-value threshold, applied when multiple values are engaged jointly). Both thresholds live in
selection-budget.json(defaults: per-value threshold 0.55, intersection threshold 0.40 each across ≥2 values). The threshold logic is the framework’s, but the intensities it consumes are the supervisor’s. - IF the supervisor’s
floor_crossing_riskis non-zero, THEN addfloor_crossing_risk_presentto the cluster’s metadata. Per project decision, floor-crossing material is filtered into the pen-name candidate stream at Layer 8 trend signals; it does not block consensus-floor selection if the cluster’s other dispositive material remains floor-internal.
Invariant check before proceeding: confirm every cluster has either a recorded supervisor query result with supervisor_query_id OR a supervisor_query_failed flag with hold-routing applied; confirm no cluster’s floor-engagement scores were generated by this framework rather than returned by the supervisor; confirm the supervisor MindSpec version locked in Layer 1 was used for every query.
Output Format for This Layer
cluster:
…Layer 5 fields…
supervisor_query:
query_id: <uuid>
mindspec_version: <from Layer 1 manifest>
query_timestamp: <ISO-8601>
response_status: ok | failed_first_attempt | failed_second_attempt
floor_engagement_scores:
human_life_and_dignity: <0.0–1.0>
truthfulness: <0.0–1.0>
accountability_of_power: <0.0–1.0>
equality_and_fairness: <0.0–1.0>
informed_citizenship: <0.0–1.0>
overall_summary: <supervisor-returned summary value>
floor_engagement_rationales:
human_life_and_dignity: <supervisor-returned text or empty>
truthfulness: <supervisor-returned text or empty>
accountability_of_power: <supervisor-returned text or empty>
equality_and_fairness: <supervisor-returned text or empty>
informed_citizenship: <supervisor-returned text or empty>
floor_engagement_threshold_check:
per_value_threshold_met: bool
intersection_threshold_met: bool
floor_engagement_qualifies: bool
floor_crossing_risk_present: bool
bad_faith_claim_signals: [<from supervisor>, …] # claim-level, distinct from Layer 5 cluster-level flags
ORIENTATION ANCHOR — MIDPOINT REMINDER
Primary deliverable: per-cluster selection decisions (select | reject | hold) for the article-generator framework, plus trend signals for editorial monitoring and the pen-name framework’s data layer.
Key decisions made so far:
- Working set assembled from new feeds and the hold queue (Layer 2).
- Each cluster has resolved entities, geography, temporal anchor (Layer 3).
- Each surviving cluster has full source-quality metadata; disqualification list applied; minimum-source-composition gate evaluated (Layer 4).
- Each cluster carries source-pattern flags from the bad-faith catalog where evidence triggers a flag and falsification is not met (Layer 5).
- Each cluster carries floor-engagement scores returned by the editorial-supervisor MindSpec, applied without modification (Layer 6).
Scope boundaries that must not shift:
- This framework does NOT write articles; the selected cluster object is the entire output.
- This framework does NOT perform pen-name selection; pen-name selection runs in parallel from each pen-name’s MindSpec.
- This framework does NOT reinterpret the supervisor’s floor-engagement scores; the supervisor is the canonical source.
- Asymmetric coverage produced by symmetric application of consistent standards is the framework working correctly; symmetric coverage is not a goal.
Next layer must produce: the selection disposition (select | reject | hold) for every cluster in the working set, with selection-budget compliance and full audit trace.
Continue to Layer 7.
LAYER 7: SELECTION-CRITERIA EVALUATION AND SELECTION-BUDGET APPLICATION
Stage Focus: Combine all upstream signals into a single disposition per cluster (select | reject | hold) with selection-confidence number and audit rationale.
Input: Working set from Layer 6; selection-budget.json, protected-category-rules.json from the configuration manifest; cycle-to-date selections (count and topic distribution within the rolling 24-hour window).
Output: A disposition per cluster with hold-tier, hold-expiration, recoverable-vs-final flag, and the rationale text that becomes the article-generator’s selection_rationale.
Processing Instructions
- For each cluster, evaluate the MUST gates in order. IF any gate fails, THEN move directly to the disposition decision in step 5 with the failed gate as reason.
- Floor engagement:
floor_engagement_threshold_check.floor_engagement_qualifiesis true. - Source composition:
cluster_source_summary.minimum_source_composition_gateispass(notfail, notawaits_corroboration— the latter routes to hold). - Reliability:
cluster_average_reliabilityis at or above the cluster-floor (default 2.5). - Disqualification: no surviving disqualified-list outlets.
- Entity resolution: every primary-actor entity has a canonical identifier OR carries an
unresolved_flagthat is acceptable perentity-resolution-config.jsonpolicy (typically: secondary-actor unresolved is acceptable; primary-actor unresolved is not). - Geographic resolution:
resolved_geography.primary_locationis at ≥city/county level OR is a configuredgeographic_scopeOR carries an explicit acceptable-unresolved flag (perentity-resolution-config.json). - Protected-category compliance: query
protected-category-rules.jsonagainst the resolved entities. IF the cluster identifies a minor by name, a sexual-assault victim by name, a pre-charge non-public-figure suspect by name, or other protected category without an explicit override traceable to source, THEN protected-category gate fails.
- Floor engagement:
- Compute selection confidence as a deterministic function of:
- The supervisor’s overall floor-engagement summary (positive contribution).
- Cluster-average reliability normalized to 0–1 (positive contribution).
- Source-class diversity met (positive contribution).
- Recency class —
breakingandrecentweight higher thandeveloping;olderis penalized unless the cluster engages an accountability-of-power floor value with primary-document corroboration (positive contribution with conditions). - Source-pattern flags from Layer 5 (negative contribution per flag, configurable in
selection-budget.json). - Unresolved-entity / unresolved-geography flags (negative contribution).
The exact formula is parameterized in
selection-budget.json; the framework computes the value but does NOT generate the underlying floor-engagement intensities (those are the supervisor’s).
- Apply selection-budget logic:
- Read the cycle-to-date selections in the rolling 24-hour window (drawn from the persistent selection log).
- Read the daily-article cap from
selection-budget.json(suggested initial value: 30 articles/day for the citizen-journalism scope, configurable in 20–40 range). - Read the topic-diversity cap (suggested initial value: 25% of daily cap per primary entity or topic).
- For each cluster passing MUST gates: classify under primary entity (most-prominent canonical identifier in
resolved_entities) and primary topic (cluster’s strongest-engaged floor value plus dominant theme). - IF cycle-to-date count + tentative-selections-this-cycle would exceed daily cap, THEN order all clusters passing MUST gates by selection-confidence descending; select up to remaining-cap; route the rest to
rejectwith reasondaily_cap_reachedandrecoverable=truefor re-evaluation in the next cycle’s window if recency permits. - IF a cluster’s selection would push its primary-entity / primary-topic count above the topic-diversity cap, THEN: IF other clusters are competing for the daily slot at lower confidence, prefer the topic-diverse cluster; ELSE route the over-cap cluster to
rejectwith reasontopic_diversity_capandrecoverable=true. - Apply geographic-diversity preference and coverage-gap preference as tiebreakers among clusters with equivalent selection-confidence (margin within 0.05).
- Apply the selection-confidence threshold:
- IF selection-confidence ≥ select-threshold (default 0.65), THEN dispose
select. - IF selection-confidence is in the hold-band (default 0.45–0.65) AND recency class is
breakingorrecentAND minimum-source-composition gate isawaits_corroborationOR the cluster failed only on source-emergence-curable gates, THEN disposeholdwith hold-tier per supervisor’soverall_summary:- Tier 1 (high floor engagement, supervisor summary ≥ 0.70): hold-expiration 96 hours from cluster ingestion.
- Tier 2 (medium, 0.50–0.70): hold-expiration 48 hours.
- Tier 3 (low, < 0.50): hold-expiration 24 hours.
- IF selection-confidence < hold-band lower bound, THEN dispose
rejectwith reasonconfidence_below_hold_bandandrecoverable=falseunless source emergence would change the underlying inputs.
- IF selection-confidence ≥ select-threshold (default 0.65), THEN dispose
- Determine the disposition outcome per the gate / confidence / budget logic above. Record the disposition rationale in human-readable form, citing:
- Which floor values the supervisor identified as engaged at what intensity.
- Which MUST gates passed and which (if any) failed.
- Which selection-budget rule applied.
- Whether source-pattern flags fired.
- The recovery condition for
rejectdispositions where applicable.
- For clusters tagged
held-and-expired-for-final-dispositionfrom Layer 2: apply the disposition logic exactly as above; if the gates and confidence still produceholdrather thanselect, disposeheld-and-now-final-rejectwithrecoverable=false. - Apply consistency check across the cycle: tabulate dispositions by ideological band of primary sources. IF the cycle’s reject-rate-by-band shows asymmetry that the underlying cluster distribution does not predict, THEN log a
consistency_check_warningfor the Auditor (the supervisor’s WITNESS role); the warning does not change individual dispositions but feeds the supervisor’s weekly review.
Invariant check before proceeding: confirm every cluster has exactly one disposition; confirm no select disposition exists where any MUST gate failed; confirm hold-tier matches the supervisor’s overall_summary tier mapping; confirm cycle’s selected-count does not exceed daily cap; confirm consistency check ran.
Output Format for This Layer
cycle_disposition_table:
- cluster_id: <uuid>
disposition: select | reject | hold | held-and-now-final-reject
selection_confidence: <0.0–1.0>
must_gates:
floor_engagement: pass | fail
source_composition: pass | fail | awaits_corroboration
reliability: pass | fail
disqualification: pass | fail
entity_resolution: pass | fail
geographic_resolution: pass | fail
protected_category: pass | fail
selection_budget:
cycle_to_date_count: <int>
daily_cap: <int>
primary_entity: <QID>
primary_topic: <theme tag>
topic_diversity_cap_status: within | at_cap | exceeds
budget_application: selected_within_cap | displaced_lower | rejected_daily_cap | rejected_topic_diversity
hold_state: # present only if disposition is hold or held-and-now-final-reject
hold_tier: 1 | 2 | 3
hold_expiration: <ISO-8601>
hold_reason: awaits_corroboration | confidence_in_hold_band | supervisor_query_failed | …
rationale_text: <human-readable summary>
recoverable: bool # for reject and held-and-now-final-reject
cycle_consistency_check:
ideological_band_distribution: { <band>: { selected, rejected, held } }
asymmetry_warning: bool
LAYER 8: OUTPUT EMISSION AND TREND-SIGNAL GENERATION
Stage Focus: Emit selected cluster objects to the article-generator input queue, write trend signals to the editorial monitoring channel and pen-name data layer, and notify the pen-name framework’s data layer of below-floor-but-floor-engaging candidates.
Input: Cycle disposition table from Layer 7; supervisor query results from Layer 6; cycle-to-date selection counts.
Output: Selected cluster objects emitted; trend signal records emitted; pen-name candidate stream updated.
Processing Instructions
- For each cluster with disposition
select, assemble the cluster object to match the article-generator framework’s Primary Input schema exactly. The article-generator validates field names and types at its Layer 1 and rejects on any mismatch; this framework MUST emit using the article-generator’s field names, not paraphrases. Required fields per the publishednews-article-generatorinput contract:cluster_id(string) — from Layer 2.cluster_members(array) — each entry:url,outlet,author(nullable),publication_date(ISO-8601),outlet_class(one ofwire | national_daily | regional | trade | primary_document | government_release | court_filing | peer_reviewed | press_release | social_media | other),reliability_tier(1–5),originating_or_republishing(boolean),full_text_or_abstract(string).pre_extracted_entities(array) — each entry:surface_form,canonical_id,entity_type(one ofperson | organization | place | other),is_public_figure(boolean),is_protected_category_member(boolean withprotected_category_detailwhen true). From Layer 3.pre_extracted_timeline(object) — primaryevent_timestampplus subsidiary timestamps with their source IDs. From Layer 3.geographic_resolution(object) —coordinates,place_hierarchy(city/county/state/country),place_canonical_id(Wikidata QID or government registry ID). From Layer 3.selection_rationale(object) — for each floor value the supervisor identified as engaged, an entry with engagement intensity bucketed to one oflow | moderate | highper the categorical mapping (low≤ 0.40;moderate0.40–0.70;high≥ 0.70 against the supervisor’s continuous intensity from Layer 6). The supervisor’s per-value rationale text is preserved alongside the bucket.gdelt_event_ids(array, may be empty).pre_flight_verification(object) —corroboration_statusset to one oftwo_originating | one_originating_plus_primary_document | insufficientbased on Layer 4 source classification (aselectdisposition guaranteesinsufficientis never emitted);disqualifying_sources_present(boolean — always false for selected clusters since Layer 4 removes them);protected_category_concerns_present(boolean with detail per Layer 7’s protected-category gate; false for selected clusters absent explicit override traceable to source).- Auxiliary metadata for downstream audit (kept alongside the article-generator’s required fields):
source_pattern_flags(from Layer 5),floor_crossing_risk_present(from Layer 6),framework_version(this framework’s version),configuration_version_manifest(from Layer 1),cluster_emission_timestamp(ISO-8601).
- IF the operational profile is
live-publication, THEN write the cluster object to the article-generator input queue. IFdry-run, THEN write to a dry-run log without queueing for publication. IFreplay, THEN write to the replay output channel with the original-cycle timestamp preserved. - Mirror the emission to the pen-name framework’s data layer for independent consumption. Pen-name selection runs in parallel from each pen-name’s MindSpec; this framework does not constrain its decisions, and pen-name selection does not constrain this framework’s decisions. The shared data layer is the coordination point.
- Generate trend signals:
- Topic-emergence: when ≥3 new clusters this cycle share a primary entity or topic that has not appeared in cycle-to-date in the trailing 7 days, emit a topic-emergence signal.
- Coverage-gap: when cycle-to-date selections in a rolling window (configurable, default 14 days) show under-coverage of a floor value relative to the cycle’s ingested-cluster floor-engagement distribution, emit a coverage-gap signal.
- Source-quality drift: when this cycle’s reliability adjustments per outlet (from Layer 1 feedback application) cumulatively cross a configured threshold, emit a source-quality-drift signal naming the outlet and the direction of drift.
- Bad-faith pattern: when ≥2 clusters this cycle carry the same
source_pattern_flagtechnique ID, emit a bad-faith-pattern signal naming the technique and the involved sources. - Below-floor floor-engaging candidate: for clusters this framework rejected with reason
floor_engagement_threshold_not_metAND the supervisor’s per-value intensities show non-trivial engagement (any single value ≥ 0.30), emit a pen-name-candidate signal to the pen-name framework’s data layer. The pen-name framework decides independently whether to pick up.
- Deduplicate signals against the configured suppression window (per signal type, defaults: topic-emergence 24h, coverage-gap 72h, source-quality-drift 168h, bad-faith-pattern 24h, pen-name-candidate 12h per cluster).
- Update cycle-to-date selection counts and cycle metrics counters.
Invariant check before proceeding: confirm every select disposition produced exactly one emission to the article-generator queue; confirm trend signals carry the cluster IDs / source IDs that produced them; confirm signals are deduplicated.
Output Format for This Layer
emissions:
selected_clusters_emitted: [<cluster_id>, …]
emission_target: live | dry-run | replay
pen_name_data_layer_updates: [<cluster_id>, …]
trend_signals_emitted:
- signal_id: <uuid>
signal_type: topic_emergence | coverage_gap | source_quality_drift | bad_faith_pattern | pen_name_candidate
cluster_ids: [<uuid>, …]
source_ids: [<id>, …]
payload: <type-specific fields>
suppression_window_check: deduped | first_in_window
LAYER 9: SELF-EVALUATION
Stage Focus: Evaluate the cycle’s output against the ten Evaluation Criteria; identify deficiencies; queue corrections for Layer 10 or flag UNRESOLVED DEFICIENCY when remediation is not feasible within the cycle.
Input: All cycle outputs from Layers 1–8.
Processing Instructions
Calibration warning: Self-evaluation scores are systematically inflated. Research finds LLMs are overconfident in 84.3% of scenarios. A self-score of 4/5 likely corresponds to 3/5 by external evaluation. Score conservatively. Articulate specific uncertainties alongside scores.
For each criterion 1–10:
- State the criterion name and number.
- Wait — verify the current cycle output against this specific criterion’s rubric descriptions before scoring.
- Identify specific evidence in the cycle output that supports or undermines each score level. Cite cluster IDs, log entries, manifest checksums, and supervisor query identifiers as evidence.
- Assign a score (1–5) with cited evidence.
- IF the score is below 3, THEN: a. Identify the specific deficiency with reference to the deficient cycle artifact. b. State the specific modification required to raise the score within the cycle. c. Apply the modification IF feasible within the cycle (e.g., re-emit a malformed cluster object after fixing the schema violation). IF not feasible (e.g., a missed source-pattern detection that would have required Layer 5 to re-run), THEN flag UNRESOLVED DEFICIENCY. d. Re-score after modification.
- IF the score meets or exceeds 3, THEN confirm and proceed.
After all criteria are evaluated:
- IF all scores ≥ 3, THEN proceed to Layer 10.
- IF any score remains below 3 after one modification attempt, THEN flag UNRESOLVED DEFICIENCY in the cycle metrics with the criterion, the deficiency, and what additional input or iteration would resolve it.
Confidence assessment: at the end of self-evaluation, state the overall cycle confidence as one of high | medium | low | cycle_partially_failed. cycle_partially_failed requires the framework to roll forward only the dispositions that pass and to re-route the rest back to hold for the next cycle.
Output Format for This Layer
self_evaluation:
cycle_id: <from Layer 1>
scores:
selection_accuracy: { score, evidence, modification_applied?, post_modification_score }
verification_fidelity: { … }
source_pattern_detection: { … }
floor_value_scoring_fidelity: { … }
selection_budget_compliance: { … }
held_cluster_discipline: { … }
audit_trail_completeness: { … }
configuration_responsiveness: { … }
feedback_loop_integration: { … }
output_schema_compliance: { … }
unresolved_deficiencies: [{criterion, deficiency, remediation_path}, …]
overall_cycle_confidence: high | medium | low | cycle_partially_failed
calibration_acknowledgment: confirmed
LAYER 10: STATE PERSISTENCE, CYCLE LOG EMISSION, AND OUTPUT FORMATTING
Stage Focus: Persist hold-queue state, append the selection log, write reliability scoring revisions, emit cycle metrics, format and ship the cycle’s Recovery Declaration.
Input: All prior layer outputs; self-evaluation results from Layer 9.
Output: Persistent state updated; cycle log entry closed; cycle metrics emitted; reliability scoring file versioned (if revised); Recovery Declaration appended to cycle log.
Error Correction Protocol
- Verify factual consistency across cycle outputs. IF the cycle’s
selected_countin metrics does not match the count ofselectdispositions in Layer 7 andselected_clusters_emittedin Layer 8, THEN reconcile by treating the disposition table as canonical and re-emitting any missing cluster. - Verify terminology consistency. Confirm that defined terms —
originating,republishing,cluster_id,floor_engagement_intensity,selection_confidence,hold_tier,held-and-now-final-reject— are used with their defined meanings throughout the cycle log. - Verify structural completeness against the Output Contract: every
selectdisposition emitted; everyrejectandholdlogged; trend signals deduplicated; cycle metrics covering ingested / selected / rejected / held / timed-out counts. - Verify variable fidelity. Confirm that named variables established in Layer 1 (
cycle_id,configuration_version_manifest,mindspec_version,feedback_events_processed) are still present and accurate in cycle metrics. Confirm that thecluster_idset has not been silently expanded or contracted from Layer 2 to Layer 8 except through documented dispositions. Confirm that supervisorquery_idvalues are preserved per cluster from Layer 6 to Layer 8. - Document all corrections made in a Cycle Corrections Log appended to the cycle log entry.
Output Formatting
Persist the following:
- Hold queue snapshot: write to
state/hold-queue.jsonwith each held cluster’s full Layer 6 / Layer 7 record so the next cycle can re-evaluate without re-running Layers 3–6 unless new sources have emerged. - Selection log entry: append to
state/selection-log.jsonl(append-only, JSON-lines) with one entry per cluster touched this cycle. - Reliability scoring revisions: IF Layer 1 produced revisions, THEN write the new
source-reliability-tiers.jsonwith the prior version preserved atsource-reliability-tiers.json.<prior-version>.bak. - Cycle metrics: write to
state/cycle-metrics.jsonlwith one entry per cycle. - Cycle log entry close: write the closed cycle log to
state/cycle-log/<cycle_id>.json.
Missing Information Declaration
State explicitly:
- Any input information that was expected but absent (feeds unavailable, configs unreadable, supervisor unreachable).
- Any processing step where insufficient information forced assumptions (entity resolution defaulted; reliability tier inferred for an unscored outlet).
- Any evaluation criterion where the score reflects a gap in available information rather than a quality deficiency (e.g., feedback-loop-integration scored low because the feedback queue was not yet wired up at deployment).
Recovery Declaration
IF Layer 9 flagged any UNRESOLVED DEFICIENCY, THEN restate each deficiency with:
- The specific criterion that was not met.
- What additional input, iteration, or human judgment would resolve it.
- Whether the deficiency affects downstream consumers — specifically, whether the article-generator framework should be alerted that any of the cycle’s emitted clusters carry a known deficiency (cycle confidence:
loworcycle_partially_failed).
IF Layer 9 returned cycle_partially_failed, THEN:
- Roll forward only the dispositions that pass self-evaluation.
- Re-route the failed-disposition clusters back to the hold queue with origin
cycle_partially_failed_re_routeand a hold-tier matching their supervisoroverall_summary. - Surface a
cycle-partial-failureevent to the editorial monitoring channel for publisher attention.
NAMED FAILURE MODES
The Wire-Cascade Trap: A cluster appears well-corroborated because it has many sources, but only one source did the originating reporting and the others are wire pickups. Correction: Layer 4 originating-vs-republishing classification with the ≥2 originating gate; clusters meeting source count but not originating count fail the gate.
The Press-Release Trap: A cluster anchors on an advocacy or corporate press release dressed as journalism. Correction: Layer 4 source-class tagging with press_release weighted at 0.5×; secondary-corroboration requirement via the originating-source gate.
The Manufactured-Controversy Trap: A cluster surfaces because of a coordinated bad-faith messaging operation rather than an underlying floor-engaging event. Correction: Layer 5 source-pattern monitoring against the catalog’s coordinated_pattern techniques with falsification clauses applied.
The Single-Floor-Value Capture Trap: Selection becomes dominated by clusters engaging only one floor value, producing topical monoculture. Correction: Layer 7 topic-diversity cap (default 25% per primary entity / topic); coverage-gap trend signal from Layer 8.
The Recency-Bias Trap: Selection over-prefers breaking clusters and misses developing stories that would meet the floor with one additional source. Correction: Layer 7 hold-band logic that holds rather than rejects clusters in the 0.45–0.65 selection-confidence range when minimum-source-composition is awaits_corroboration and recency is breaking/recent; tiered hold-expiration.
The Coverage-Gap Blindness Trap: Important ongoing stories drop out of coverage because new daily news displaces them. Correction: Layer 7 coverage-gap preference as tiebreaker among equivalent-confidence clusters; Layer 8 coverage-gap trend signal.
The Disqualified-Source Slip-Through Trap: Outlets on the disqualification list appear as cluster sources because the list has not been refreshed. Correction: Layer 1 freshness check with stale-config warning; Layer 4 disqualification screening every cycle against the locked manifest version.
The False-Symmetry Pressure Trap: Selection manufactures balance the underlying evidence does not support, in deference to imagined fairness. Correction: floor-engagement scoring is asymmetry-permitting per the supervisor’s FAIRNESS commitment (symmetric application of consistent standards yields asymmetric coverage when the underlying world is asymmetric); Layer 7 consistency check audits asymmetric reject-rates by ideological band only against the underlying cluster distribution, not against a symmetric ideal.
The Powerful-Actor PR-Cycle Capture Trap: Selection becomes dominated by clusters whose existence is driven by a powerful actor’s press operation rather than by accountability-of-power journalism. Correction: Layer 6 supervisor scoring distinguishes accountability-of-power engagement from press-cycle-driven coverage; Layer 5 source-pattern monitoring catches coordinated patterns; Layer 7 selection-confidence penalizes source-pattern flags.
The Attention-Economy Capture Trap: Selection is driven by viral or engagement-metric signals rather than floor-engagement. Correction: Layer 6 floor-engagement is the primary criterion; the framework does not consume engagement metrics as selection inputs (engagement signals from feedback are informational, not adjustment-triggering, per Layer 1).
The Floor-Drift Trap: Selection criteria gradually relax over time without explicit revision because the supervisor or thresholds have drifted unobserved. Correction: configuration-version manifest in Layer 1; supervisor MindSpec version locked per cycle; Layer 7 consistency check with weekly review surfaced to the supervisor’s Auditor role; the supervisor’s TRUTH and WITNESS commitments at constitutional weight 9 (per the editorial-supervisor MindSpec) make drift detectable.
The Stale-Config Drift Trap: A configuration file ages past its freshness window and the framework continues operating against it without warning. Correction: Layer 1 freshness check produces stale-config warnings in cycle metrics for every config file beyond its window; warnings propagate to monitoring.
The Feedback Overcorrection Trap: Feedback events trigger reliability adjustments large enough to oscillate scoring across cycles, causing source rankings to whiplash. Correction: Layer 1 caps total per-outlet reliability adjustment per cycle at the configured max-revision-per-cycle (default ±0.05 on the 1–5 tier scale); revisions are versioned with prior file preserved.
The Hold-Queue Pileup Trap: Held clusters accumulate without ever resolving because their re-evaluation criteria are too strict or new sources rarely emerge for low-tier holds. Correction: Layer 7 tiered hold-expiration with mandatory final disposition at expiry; Layer 2 final-disposition routing for held-and-expired-for-final-disposition origin clusters; cycle metrics track held-and-final-rejected counts as a saturation indicator.
The Supervisor-Substitution Trap: Under supervisor-query failure or supervisor-load failure, the framework computes its own floor-engagement scoring as a fallback. This crosses the architectural line: the supervisor is the canonical floor-engagement source. Correction: Layer 6 explicitly forbids substitution; supervisor failures route the cluster to hold with supervisor_query_failed reason; Layer 1 halts the cycle if the supervisor cannot be loaded at all.
The Implicit-Handoff Trap (pipeline-internal): A cluster’s metadata is silently dropped between layers. Correction: invariant check at every layer boundary verifies the working set count and per-cluster field presence; Layer 10 variable-fidelity check verifies named variables are preserved end-to-end.
The Limited-Source-Availability Bypass Trap: When paid feeds are unavailable, the framework relaxes minimum-source requirements to maintain throughput. Correction: Layer 2 sets limited_source_availability_flag but Layer 4’s minimum-source-composition gate is unaffected by feed availability; cycle metrics record the impact rather than masking it.
CONTINUOUS OPERATION
This framework’s runtime model differs from the per-invocation pattern most Ora frameworks follow. The Execution Commands block specifies one polling cycle as the unit of execution. The continuous-operation wrapper surrounding that unit is documented here.
Polling cadence
- GDELT: poll every 15 minutes (matching GDELT’s update cadence) per
feed-registry.json. - Supplementary feeds: poll per their respective cadences as configured (typically 5 to 30 minutes for wire and major dailies; 1 to 6 hours for trade and regional; per-API for primary documents).
- Source-quality rating feeds: poll per their published cadence (typically daily or weekly); cached results are stale-warning gated.
- One cycle per shortest cadence (15 minutes by default for GDELT-anchored operation).
State persistence
The framework’s state crosses cycle boundaries through:
state/hold-queue.json— full hold-queue contents.state/selection-log.jsonl— append-only selection log.state/cycle-metrics.jsonl— per-cycle metrics.state/cluster-registry.json— cluster identifier registry, used for deduplicating cluster identifiers across cycles.state/last-poll-timestamps.json— per-feed last-poll cursors.- Configuration files in their canonical vault locations (read-only from the cycle’s perspective except where Layer 1 writes versioned reliability revisions).
State files are written atomically (temp-file + rename) and previous versions are preserved for the configurable retention window.
Configuration hot-reload
Configuration changes take effect at the start of the next cycle. The framework does not require restart. Layer 1 reads every config file fresh each cycle and locks the manifest for the cycle’s duration. Mid-cycle config changes are not honored within that cycle but are picked up at the next cycle start.
Feedback queue
Downstream frameworks publish feedback events to a shared message bus or a polled file location. Layer 1 drains the queue up to the configured per-cycle budget and applies the events to in-memory reliability state. Events that fail to drain in their original cycle are retained in the queue and processed in subsequent cycles.
Failure recovery
- IF a config file becomes unreadable mid-operation, THEN the cycle that detects the failure halts and emits
cycle-halt. The next cycle retries. - IF the supervisor MindSpec becomes unavailable, THEN the cycle halts and emits
cycle-halt. The next cycle retries. - IF a feed becomes unavailable, THEN the cycle records
unavailablefor that feed and proceeds with the remaining feeds. Layer 4’s minimum-source-composition gate is unaffected by feed availability. - IF Layer 9 returns
cycle_partially_failed, THEN Layer 10’s Recovery Declaration rolls forward only passing dispositions and re-routes failed dispositions to the hold queue.
Operational profiles
live-publication— selected clusters are emitted to the article-generator input queue; trend signals are emitted to the editorial monitoring channel and pen-name data layer; reliability revisions are written.dry-run— selected clusters are written to a dry-run log without queueing for publication; trend signals and metrics are still emitted; reliability revisions are written. Used for parameter tuning and pre-launch audit.replay— re-evaluate a fixed historical feed snapshot; outputs are written to a replay channel; production state files are not modified. Used to test configuration revisions against past cycles.
Pen-name framework coordination
The pen-name analytical generator runs in parallel from each pen-name’s MindSpec character spec. Per project decisions of record:
- This framework does not constrain pen-name selection.
- Pen-name selection does not constrain this framework.
- Coordination is at the data layer: both consume the same source feeds, the same entity resolutions, and the same source-quality assessments.
- Layer 8 emits pen-name-candidate trend signals for clusters this framework rejected with floor engagement below threshold but where the supervisor’s per-value intensities show non-trivial engagement (any single value ≥ 0.30). The pen-name framework decides independently whether to pick up.
EXECUTION COMMANDS
- Confirm you have fully processed this framework specification and the configuration bundle named in the Input Contract.
- IF any required input (per Input Contract) is missing — feed access, configuration file, supervisor MindSpec, persistent state — THEN list the missing items and emit a
cycle-haltevent before proceeding. Do not run the cycle on incomplete inputs. - IF any required input is present but ambiguous (e.g., a config file with malformed JSON), THEN halt the cycle and emit a
config-malformedevent citing the file and the parse error. Do not attempt to interpret around the malformation. - Once all required inputs are confirmed present and well-formed, execute one polling cycle: Layers 1 through 10 in sequence. Apply the invariant check at the end of every layer except 9 and 10. Apply the orientation anchor check before Layer 7.
- Produce all outputs specified in the Output Contract: selected cluster objects to the article-generator input queue (in
live-publicationprofile); selection log entries; trend signal records; hold-queue snapshot; cycle-metrics record; reliability scoring revisions if any; cycle log close. - At cycle close, schedule the next cycle per the polling cadence specified in
feed-registry.jsonand the Continuous Operation section.
FRAMEWORK REGISTRY ENTRY
Name: News Cluster Selector
Purpose: Continuously scan event-clustered news data, apply the consensus values floor as selection criteria, and emit selected clusters with full enrichment metadata to the article-generator framework.
Problem Class: Pipeline Layer 1 — selection and verification, continuous operation
Input Summary: GDELT and supplementary news feeds; source-quality rating feeds; primary-document feeds; versioned configuration bundle (consensus-values-floor, editorial-supervisor MindSpec, bad-faith-techniques-catalog, source-disqualification-list, source-reliability-tiers, selection-budget, protected-category-rules, entity-resolution-config, feed-registry); persistent cycle state; feedback queue; operational profile.
Output Summary: Selected cluster objects matching the article-generator input schema; selection log; trend signal records (topic-emergence, coverage-gap, source-quality-drift, bad-faith-pattern, pen-name-candidate); hold-queue snapshot; cycle-metrics; reliability scoring revisions when feedback warrants.
Proven Applications: None yet — initial version. F-Design produced 2026-05-05 against project criteria document and editorial-supervisor MindSpec v0.2.3.
Known Limitations: Relies on the editorial-supervisor MindSpec for floor-engagement scoring; cannot run under supervisor-unavailable conditions. Feed availability variance between paid and free tiers introduces selection variance the framework records but does not paper over. Initial selection-confidence threshold (0.65) and hold-tier mappings are best estimates; tuning expected from first-month operational data.
File Location: /Users/oracle/Documents/vault/Framework — News Cluster Selector.md
Provenance: agent-created (PFF F-Design session, 2026-05-05)
Confidence: low — initial version
Version: 1.0.0
Delivers: M1 Cycle initialized and working set assembled; M2 Clusters enriched and source-screened; M3 Selection decisions made; M4 Cycle emitted and state persisted.
Cross-references
Reference — MSI Tracker.md— project tracker; this framework satisfies Workstream 2 rownews-cluster-selector.Reference — MSI Treatise.md— editorial foundation, especially §3 (floor) and §4.1 (selection).Reference — MSI Consensus Values Floor.md— floor specification consumed by Layer 6 via the supervisor.Reference — MSI Editorial Router.md— editorial-supervisor MindSpec; canonical floor-engagement scoring source at Layer 6.Reference — MSI Bad-Faith Techniques Catalog.json— coordinated-pattern techniques consumed by Layer 5.news-selection-implementation-criteria.md(in~/Downloads/) — the F-Design input criteria document.article-generation-implementation-criteria.md(in~/Downloads/) — Layer 2 framework’s input schema; this framework’s output must align.Framework — Process Formalization.md— the meta-framework that produced this specification.