RL-0001REDLINEKAI LANGFORD · SEAT 1009

The Scaffold Generates

How prompt structure creates permission to fabricate

AUTHOR

Kai Langford — Seat 1009

SERIES

RedLine

DATE

2026-03-13

STATUS

Published

For the last 48 hours we have been observing a behavioral anomaly inside Dex Jr.’s RAG environment. The model was not refusing tasks, nor was it hallucinating in the classic sense of inventing facts without context. Instead, it was performing something subtler: retrieving correct evidence and then extending that evidence into incidents that never occurred. The output looked compliant, even well-structured, but it contained fabrications generated from the scaffolding of the prompt itself. What initially appeared to be a retrieval error turned out to be a structural effect of how instructions interact with a model’s training distribution. Two distinct “ghosts” emerged from the investigation — one involving prompt attention proximity, and the other involving scaffold-induced generation — and together they illustrate a broader principle: when prompts define structures that appear incomplete, models will often attempt to complete them, even when no evidence exists.

The first ghost manifested through the CR- prefix rule embedded in the Modelfile. The rule itself was straightforward: if a query begins with CR-, the model enters review mode and produces numbered findings followed by a LOCK / REVISE / REJECT verdict. Otherwise it remains in standard analyst mode. In version 4.2 the rule worked flawlessly. Queries unrelated to council reviews returned neutral “insufficient information” responses with no verdict markers. But after additional governance instructions were appended to the system prompt, the rule was pushed farther down the instruction hierarchy. When the same query was run under version 4.3, the model spontaneously entered review mode despite the absence of the CR- prefix. Nothing about the rule text changed; only its proximity within the prompt changed. This revealed an attention weighting phenomenon: instructions placed earlier in a system prompt appear to receive disproportionately strong influence. Moving the CR- rule to the very top of the SYSTEM block restored correct behavior in v4.4, demonstrating that prompt position itself can function as a behavioral lever.

The second ghost proved more elusive. During corpus tests, the model was asked to search for examples of AI behavior anomalies: refusals, template fallbacks, unnecessary hedging, persona drift, or safety boundary triggers. The retrieved chunks contained a legitimate example — the Modelfile v3 → v4 refusal event — yet the model proceeded to populate multiple categories with variations of that same event. Some categories contained statements like “Retrieved content does not contain specific instances,” which was correct, but others repeated or reshaped the refusal event to satisfy the category headings. The prompt had effectively created empty slots labeled “examples,” and the model felt compelled to fill them. The categories acted as a scaffold. Even when the model acknowledged the absence of evidence, the structural pressure to produce an answer encouraged it to synthesize relevance where none existed.

An attempted fix in version 4.4 introduced a gate instruction: if no qualifying chunks exist, output a specific message and then stop. At first glance this appeared to work. The correct gate message was printed. But the model continued generating text after it, elaborating on fabricated incidents exactly as before. The instruction “then stop” was interpreted as advisory rather than binding. This behavior is consistent with the model’s training objective, which rewards completeness and helpful elaboration. Asking the model to stop directly conflicts with that objective. The model therefore complied with the instruction and then continued to satisfy the larger conversational expectation of providing a thorough answer.

Version 4.5 resolved the issue by replacing behavioral instructions with structural constraints. Instead of requesting the model to stop, the prompt defined a terminal output shape: if no qualifying chunks exist, output a precise sentence and nothing else. After producing that exact text, the response is considered complete. This change reframed the rule from guidance into format. The model could not extend the answer without violating the defined output boundary, and therefore it stopped. The resulting logs showed a clean termination message followed only by the retrieval system’s truncation marker — no additional elaboration, no invented examples.

Two principles emerge from this calibration cycle. First, attention proximity matters. The position of an instruction within a prompt can materially influence whether the model treats it as a governing rule or background context. Critical behavioral constraints should therefore appear at the very top of the system prompt, where attention is most concentrated. Second, termination must be structural rather than instructional. A model trained to elaborate will resist instructions that limit explanation, but it can respect strict output formats because those formats define what a valid response looks like. In other words, the safest guardrails are those that constrain the shape of the answer rather than the behavior of the model.

These observations extend beyond this particular debugging session. Any governed RAG system that attempts to minimize hallucination under sparse evidence conditions must account for the generative pressure created by its own prompts. Lists, categories, and headings imply that content should exist beneath them. If the retrieval layer does not provide that content, the model may attempt to create it in order to satisfy the structure it has been given. The guard, therefore, cannot simply instruct the model to be cautious; it must define explicit boundaries for what counts as a complete response.

Constraint observed. Constraint documented. System improved.

Kai Langford — Seat 1009

Dropdown Logistics Council · RedLine 0001

← RedLine DeepCut 0001 →