ESSAY IV

The Orbit

The ORBIT methodology in practice — where theory meets execution

Previously: In Essay III, we established why simplicity is the central unlock — the threshold where complexity collapses into clarity. Now we turn to the methodology that puts this into daily practice. Read Essay III: The Collapse

Chapter 17: The ORBIT Methodology — Putting It All Together

The test of any methodology is not how elegant it sounds — it's what happens when real teams use it on real problems.

CHAPTER THESIS: ORBIT — Orchestrated Reliable Bounded Intent Tasks — is the integrated methodology that combines everything from Essays II and III into a working system. It's not a framework to study. It's a practice to adopt.

The Name Unpacked

Each word in ORBIT carries weight:

Component	Meaning	Why It Matters
Orchestrated	The AI coordinates complexity on behalf of the human	You direct; the system executes across parallel streams
Reliable	Glass Box transparency + audit trails + bounded autonomy	Enterprise-grade trust, not startup-grade hope
Bounded	Mission documents define the playing field	Maximum exploration within defined constraints
Intent	Natural language is the interface — state what you want	No translation layer between thought and action
Tasks	Everything decomposes into executable, measurable units	Progress is always visible, always traceable

ORBIT is what you get when the pilot model (Chapter 7), the Mission Cockpit (Chapter 8), the view system (Chapter 9), living documents (Chapter 10), the architecture (Chapter 11), and the simplicity thesis with its lens mechanism (Chapters 12–16) work together as a single integrated system. In the language of Anton Korinek's research on transformative AI, when complex cognitive tasks collapse into executable functions, entire economic sectors restructure around the new cost landscape. ORBIT is the methodology that enables this restructuring at the team and enterprise level.

The ORBIT Architecture: Solving the Three LLM Limitations

Large Language Models are extraordinarily capable — but they have three fundamental limitations that prevent naive use from producing enterprise-grade results. ORBIT's architecture is designed specifically to address each one.

THE THREE LIMITATIONS

Context Degradation LLMs perform worse as context grows. Liu et al. (2023) demonstrated that accuracy drops from 75% to 55% when relevant information moves from the edges to the middle of a long context — the "lost in the middle" problem. Attention cost scales quadratically with sequence length. At 32,000 tokens, most models drop below 50% of their short-context performance.

Hallucination LLMs generate confident-sounding content that is factually wrong. The architectural root cause: autoregressive generation requires always outputting a token, even when the model is uncertain. There is no built-in verification mechanism. Hallucination rates range from 15% on general tasks to over 80% in specialised domains — and the errors are often invisible because they are delivered with the same confidence as correct answers.

Loss of Coherence at Scale Over extended interactions, LLMs drift from original goals — a phenomenon documented as "goal drift" in agent research. Outputs become inconsistent, architectural decisions contradict earlier ones, and the model loses track of what it was building and why. Research confirms that all models show increasing drift with longer interactions, but critically — it is controllable through explicit architectural choices.

The ORBIT architecture addresses these limitations not through a single technique but through a layered system of complementary strategies — each targeting specific failure modes, and together creating a compound defence that produces reliable, coherent, enterprise-grade output.

Solving Context Degradation

Massive Decomposition is the foundational strategy. Rather than feeding a large, complex task into a single LLM context window, ORBIT decomposes it into small, focused subtasks — each of which fits comfortably within the model's effective attention range. The MDAP framework (Meyerson et al., 2025) provides the mathematical proof: by decomposing tasks into atomic subtasks and applying multi-agent voting at each step, their system MAKER completed over one million sequential steps with zero errors — using relatively simple, non-reasoning models. The intelligence came from the system design, not the model. Smaller, well-defined tasks produce dramatically better results than large, ambiguous ones — because the model can attend fully to the information that matters.

Progressive Disclosure ensures the model receives only the information it needs for the current subtask — not everything at once. This is the principle behind Retrieval-Augmented Generation (RAG): instead of loading an entire knowledge base into context, the system retrieves relevant fragments on demand. ORBIT extends this through pull-on-demand architecture: agents request specific information when they need it, rather than having everything pushed upfront. This reduces context overhead by orders of magnitude while ensuring the model works with clean, relevant inputs. Anthropic's donation of the Model Context Protocol (MCP) to the Linux Foundation's Agentic AI Foundation in December 2025 — with OpenAI, Google, Microsoft, and AWS as co-founders — effectively standardised this pattern as an industry norm. MCP now has over 10,000 active public servers, confirming that pull-on-demand context management is the architecture the industry is converging on.

Server-side prompt caching eliminates the cost and latency of repeatedly sending the same foundational context (system prompts, mission documents, architectural standards). Cached prompts reduce costs by up to 90% and latency by up to 85% — making it economically viable to maintain rich, persistent context across thousands of agent interactions without degrading performance.

Solving Hallucination

Parallelisation and voting runs multiple agents on the same task and uses consensus mechanisms to select the best output. Wang et al.'s research on self-consistency (2023) demonstrated that sampling multiple reasoning paths and returning the majority answer significantly reduces errors — because incorrect hallucinations are unlikely to be consistent across independent runs. ORBIT applies this principle through parallel agent execution with structured voting, ensuring that confident-sounding errors are caught by disagreement.

Specialist tools remove entire categories of hallucination by giving agents access to external verification: code execution engines that test whether generated code actually works, search tools that verify facts against sources, databases that confirm data against ground truth, and file system access that checks whether referenced files exist. Schick et al.'s Toolformer research (2023) showed that a 6-billion parameter model with tools achieves performance competitive with much larger models — because tools provide the factual grounding that generation alone cannot. Anthropic's 2025 release of advanced tool-use capabilities takes this further: Tool Search allows agents to access thousands of tools without consuming the context window (solving the tool-definition bloat problem), while Programmatic Tool Calling enables agents to invoke tools within a code execution environment — reducing context overhead while increasing precision. Both innovations validate the principle that agents should discover and use tools on demand, not carry the weight of every possible tool in their context.

A panel of AI agent experts — multiple specialised agents with different roles and system prompts — provides domain-specific validation. A security-focused agent reviews for vulnerabilities. An architecture agent checks for consistency. A testing agent verifies functionality. When agents with different perspectives converge on the same answer, confidence is warranted. When they disagree, the system flags the output for human review. A 2025 study on multi-agent LLM orchestration for incident response quantified the advantage: orchestrated multi-agent systems achieved 100% actionable recommendation quality compared to 1.7% for single-agent systems — an 80× improvement in specificity and 140× improvement in correctness, with zero quality variance across all trials. The evidence is clear: specialised agents working in concert produce results that no single agent can match.

Mission-bound structured documents are the most fundamental hallucination defence. When the model works within explicit constraints — product specifications, architectural schemas, interface contracts, design standards — the space of valid outputs is dramatically constrained. The model cannot hallucinate a database schema that contradicts the one defined in the mission document. It cannot generate code that violates the architectural patterns specified in the bounded artifacts. Research on controlled generation pipelines confirms that explicit constraints are the most reliable method for reducing hallucination in production systems.

Solving Loss of Coherence at Scale

Goal-seeking autonomous loops — what ORBIT calls Ralph Loops — are agents that iterate toward a defined success criterion, checking their own output against explicit goals at each step. Based on the Reflexion framework (Shinn et al., NeurIPS 2023), which demonstrated 20-22% improvements on reasoning and decision-making tasks, Ralph Loops ensure that each iteration moves closer to the mission — not further from it. The agent doesn't just generate output; it evaluates whether that output serves the objective, and self-corrects when it doesn't. A 2025 paper published in Nature on dual-loop self-reflection — inspired by metacognition — provides further validation: LLMs that critique their own reasoning against reference responses through iterative reflection cycles show measurably improved accuracy and coherence. The pattern is the same one visible in Knuth's Claude Cycles case study: 31 explorations where each failure was documented, analysed, and used to refine the next attempt. Self-reflection is not a luxury. It is the mechanism that turns iteration into learning.

Structured planning requires agents to create an explicit plan before executing — then follow the plan step by step, checking progress against milestones. Wei et al.'s chain-of-thought research (2022) and Yao et al.'s Tree of Thoughts (NeurIPS 2023) demonstrated that planning before acting improves performance from 4% to 74% on complex tasks. ORBIT applies this at every level: mission-level plans decompose into sprint-level plans, which decompose into task-level plans, each with explicit success criteria.

Agent delegation and orchestration maintains coherence across scale through hierarchical coordination. A coordinator agent holds the high-level mission context and delegates specific subtasks to specialist agents — each working within bounded scope but contributing to the unified objective. Research from Microsoft and IBM confirms that hierarchical architectures are the only viable pattern at scale (50+ agents), precisely because they maintain goal alignment that flat architectures lose.

Agent swarms — large numbers of agents working in parallel on different aspects of a problem — extend this further. Each agent in the swarm processes a portion of the work, communicating through lightweight event-driven mechanisms. Distributed consensus across the swarm reduces individual agent errors, while the orchestration layer ensures all outputs converge toward the mission. The swarm doesn't replace human judgment — it amplifies it, executing at a scale and speed that no individual could achieve while remaining tightly bound to mission-defined objectives.

ARCHITECTURE SUMMARY

Limitation	ORBIT Technique	Research Basis
Context Degradation	Massive decomposition	MDAP/MAKER: 1M+ steps, zero errors (Meyerson et al., 2025)
	Progressive disclosure / pull-on-demand	RAG research; tool-use reduces context 99%
	Server-side prompt caching	90% cost reduction, 85% latency reduction
Hallucination	Parallelisation & voting	Wang et al. self-consistency (2023)
	Specialist tools	Schick et al. Toolformer (2023)
	Panel of agent experts	Multi-agent debate research
	Mission-bound structured documents	Controlled generation pipelines
Loss of Coherence	Ralph Loops (goal-seeking iteration)	Shinn et al. Reflexion, NeurIPS 2023: +22%
	Structured planning	Yao et al. Tree of Thoughts: 4% → 74%
	Agent delegation & orchestration	Microsoft/IBM hierarchical patterns
	Agent swarms	Distributed consensus reduces errors

No single technique solves all three limitations. The power of the ORBIT architecture is in the combination — a layered defence where each technique compensates for the weaknesses of others. Decomposition keeps context clean. Voting catches hallucinations. Planning maintains coherence. Tools provide grounding. And structured mission documents anchor everything to a clear, verifiable objective. The result is not a perfect system — no AI system is — but a system whose failure modes are visible, bounded, and correctable. That is the difference between enterprise-grade and prototype-grade AI: not the absence of errors, but the architecture to detect and recover from them.

A Day in the Life: Three Perspectives

The Engineering Team

A Day in the Engineering Cockpit