Bibliography

References & Sources

The research, data, and thinking that inform these essays.

The arguments in this series are grounded in decades of empirical research across project management, organisational design, team dynamics, AI capability, and enterprise technology. This page consolidates the key sources referenced throughout the essays, grouped by theme.


Project Failure & Complexity

The Standish Group. CHAOS Reports (1994–2020). Three decades of data covering tens of thousands of IT projects. Key findings: only 31% of projects succeed (on time, on budget, full scope); small projects succeed at roughly ten times the rate of large ones; projects in large companies succeed only 9% of the time. The single strongest predictor of failure is scale — when projects exceed a handful of people, a few months, and a modest budget, coordination costs consume the organisation's capacity to deliver.

Brooks, Frederick P. The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley, 1975. Brooks' Law: adding people to a late project makes it later, because communication pathways scale quadratically — N(N-1)/2. A five-person team has 10 communication pathways; a fifteen-person team has 105; a hundred-person team has 4,950.

BCG, Bain & Company, McKinsey & Company. Consultancy studies consistently find that most digital transformations fall short of their original objectives — McKinsey (2018) and BCG (2020) put the figure near 70%; a 2016 Forbes Insights survey reported 84%. Two honesty notes: “fail” in these studies means “did not fully achieve stated goals”, not outright collapse; and scholars have shown the iconic “70% of change initiatives fail” claim has weak empirical pedigree (Hughes, Journal of Change Management, 2011). We cite the range as directional, not precise.

Bain & Company. Beyond the Hype: The Hard Work Behind Analytics and AI. 2024. Found that 88% of transformations fail to meet original ambitions.

IDC / Gartner (via WWT and others). Annual cost of failed digital transformations estimated at $2.3 trillion globally — a widely cited estimate derived from IDC transformation-spend projections and Gartner project-success data.

vFunction. Research estimating approximately $100 billion wasted on migration projects between 2021–2024.

Foster, Richard / Innosight. Corporate Longevity reports. Companies in the S&P 500 of 1958 remained on the index an average of 61 years; average tenure has fallen to roughly 15–20 years and continues to shrink. Foster (Yale School of Management) estimates that at current churn rates, three-quarters of the S&P 500 will be replaced within a decade. The companion Fortune 500 statistic — only 52 of the original 1955 companies remain — is from the American Enterprise Institute.


Team Size, Coordination & Organisational Design

Dunbar, Robin. How Many Friends Does One Person Need? Faber & Faber, 2010. Dunbar's Number (150) defines the approximate limit of stable social relationships. Within that, nested layers of 5 (core group), 15 (deep trust), and 50 (meaningful working relationships) define the thresholds for coordination quality.

Hackman, J. Richard. Leading Teams: Setting the Stage for Great Performances. Harvard Business Press, 2002. Fifty years of team performance research concluding that four to six people is the optimal team unit, and no work team should exceed ten members.

Bezos, Jeff / Amazon. The "two-pizza team" rule: no team should be larger than can be fed by two pizzas (roughly five to seven people). Codified the insight that small, autonomous teams with clear ownership outperform large, coordinated ones.

Jones, Nate B. "Rethinking Team Size in the Age of Artificial Intelligence." Analysis of how AI amplifies the coordination cost of large teams: when per-person output increases by 5–10x, the penalty for a sixth team member is measured in millions of lost productivity. Introduces the "Scout" (solo exploration) and "Strike Team" (five-person execution) archetypes, and argues the scarce resource has shifted from volume to correctness.

Gore, W. L. (W. L. Gore & Associates). Famously capped factory size at 150 people based on the observation that beyond that number, community cohesion and coordination quality collapsed — an independent rediscovery of Dunbar's Number in an industrial context.


AI Capability & Productivity

Agrawal, Ajay; Gans, Joshua; and Goldfarb, Avi. Prediction Machines: The Simple Economics of Artificial Intelligence. Harvard Business Review Press, 2018. When AI collapses the cost of prediction, the elaborate decision architectures organisations built around expensive, scarce prediction become unnecessary overhead.

Microsoft / GitHub. GitHub Copilot research: 55% faster task completion, 40% of accepted code AI-generated, 75% of developers feeling more fulfilled.

Google. Reports that 30% of new code is now AI-generated (2024–2025).

Harvard Business School. 2025 study of Procter & Gamble professionals finding AI-using teams were three times more likely to produce ideas in the top 10% of quality.

McKinsey & Company. Various reports on AI impact: technical debt consuming 30–40% of IT budgets; the "Frontier Firm" vision of agent-directed work; AI agents as execution layer.

Gartner. Prediction (2024) that 33% of enterprise software applications will include agentic AI by 2028, up from less than 1% in 2024. Gartner has also predicted (June 2025) that over 40% of agentic AI projects will be cancelled by the end of 2027 — we cite both sides.

MIT NANDA. The GenAI Divide: State of AI in Business 2025. Based on 150 leadership interviews, 350 employee surveys, and 300 public deployments: only ~5% of AI pilot programmes achieve rapid revenue acceleration; the core barrier is a "learning gap" in enterprise integration, not model quality. Methodology has attracted public debate; cited here as directional evidence.

McKinsey & Company. The State of AI (2025). Roughly 80% of organisations report no significant enterprise-level EBIT impact from AI; only ~6% qualify as "AI high performers," and they share a pattern — transformative ambition, workflow redesign, and faster scaling.

IDC / Microsoft. The Business Opportunity of AI (2024). Microsoft-commissioned IDC research finding an average $3.70 return per dollar invested in generative AI, with top performers reaching ~$10. Not externally validated; returns concentrate in organisations deploying across multiple functions.

METR. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (2025, arXiv:2507.09089). Randomised controlled trial finding experienced developers were 19% slower with AI on mature codebases they knew deeply — while believing they were 20% faster. A 2026 follow-up with 57 developers found a roughly neutral effect. Critical counter-evidence: AI's multiplier is conditional on workflow design and task type.

Meyerson, Elliot, et al. Solving a Million-Step LLM Task with Zero Errors (2025, arXiv:2511.09030). Cognizant AI Lab / UT Austin. The MAKER system completed over one million sequential LLM steps with zero errors through maximal agentic decomposition, multi-agent voting, and red-flagging — the mathematical foundation for ORBIT-style architectures.

Knuth, Donald. Claude's Cycles (note dated 28 February 2026, revised 14 April 2026). Stanford Computer Science. Knuth's own account of the solution (for odd m) of an open Hamiltonian-cycle decomposition problem through 31 guided human-AI explorations with Claude Opus 4.6, posed and steered by Filip Stappers. Paper (PDF).

Dealroom / company disclosures. Revenue per employee at AI-native companies (2025–2026): Cursor (Anysphere) ~$3.3M, Lovable ~$2.2M, Midjourney ~$2M, OpenAI ~$1.5M — versus a traditional SaaS benchmark of $150–250k. Lovable reached $100M ARR in eight months; Midjourney reached $200M ARR with ten employees and no external funding. Caveats: these figures conflate AI leverage with product-market fit and carry survivorship bias; treated here as suggestive, not proof.

McKinsey & Company. Superagency in the Workplace (January 2025). 78% of organisations use AI in at least one business function, but only 1% of leaders describe their company's AI deployment as "mature" — fully integrated into workflows and driving substantial outcomes.

EY. Work Reimagined Survey (2025). 15,000 employees and 1,500 employers across 29 countries. Companies miss up to 40% of potential AI productivity gains due to weak talent foundations; 88% of employees use AI daily but only 5% in transformative ways; just 12% receive sufficient AI training.

PwC. Global AI Jobs Barometer (2025). Analysis of nearly one billion job ads across six continents. Skills sought by employers are changing 66% faster in AI-exposed occupations (up from 25% the prior year); jobs requiring AI skills carry an average 56% wage premium.

Brynjolfsson, Erik; Li, Danielle; and Raymond, Lindsey. "Generative AI at Work." Quarterly Journal of Economics (2025). Field study of customer-support agents: AI assistance raised productivity ~14% on average, with the largest gains for novices — workers with two months' tenure performed like six-month veterans. Gains concentrate among less-experienced workers.

Vaccaro, Michelle; Almaatouq, Abdullah; and Malone, Thomas. "When combinations of humans and AI are useful." Nature Human Behaviour (2024). Meta-analysis of 106 experiments: on average, human-AI combinations performed worse than the best of human or AI alone, especially on decision tasks; synergy appeared mainly in content-creation tasks and where the human outperformed the AI. The strongest published complication of the simple "centaur" narrative — and the reason this series grounds human command in accountability, not performance.

Cui, Zheyuan (Kevin), et al. Field experiments on GitHub Copilot at Microsoft, Accenture, and a Fortune 100 company (2024, MIT Sloan-affiliated). ~26% average increase in completed tasks; junior developers gained 27–39%, senior developers less — consistent with AI as a skill leveller.

Nielsen Norman Group. Analysis of three generative-AI case studies (2023): 66% average productivity improvement across support agents, business professionals, and programmers.

Multi-Agent LLM Orchestration for Incident Response (2025, arXiv:2511.15755). 348 controlled trials comparing single-agent and orchestrated multi-agent systems on identical incident scenarios: 100% actionable recommendation rate vs. 1.7% for single agents — an 80× improvement in specificity, 140× in correctness, with zero quality variance. Domain-specific (incident response); cited for the architectural principle.

Anthropic. Internal usage reports and engineering accounts (2025–2026): ~90% of Claude Code's codebase written by Claude Code itself; company-wide AI-written code at 70–90%; Claude Cowork built in roughly ten days using Claude Code; documented non-engineering use across legal, marketing, finance, and design teams. See How Anthropic teams use Claude Code.


Enterprise Technology & Software Complexity

MuleSoft (Salesforce). Connectivity Benchmark Report (2026 edition; earlier editions with Vanson Bourne and Deloitte Digital). The average organisation now manages 957 applications, only 27% of which are connected (2025 edition: 897 and 29%). Note: vendor-commissioned research by an integration vendor, surveying enterprise IT leaders; independent SaaS-management vendors report lower per-company app counts under narrower definitions. Report.

Mark, Gloria. University of California, Irvine. Attention research finding it takes an average of 23 minutes and 15 seconds to fully refocus after an interruption.

Zylo. SaaS management research: 52.7% of SaaS licenses go unused; large enterprises waste $127 million annually on unused licenses.

Forrester. Research finding 72% of IT budgets spent on "keep-the-lights-on" maintenance rather than innovation.

Stripe / CISQ. Stripe, The Developer Coefficient (2018): developers spend roughly a third of their time dealing with technical debt and bad code. For the macro cost, the Consortium for Information & Software Quality (CISQ), Cost of Poor Software Quality in the US (2022), estimated accumulated US technical debt at approximately $1.5 trillion.

APMdigest. Software failures cost enterprises $61 billion annually.

Gallup. Engagement research: engaged teams are 17% more productive — but structured engagement (direction + velocity + learning + autonomy) compounds rather than merely adds.

Productiv. State of SaaS reports. 48% of enterprise applications are unmanaged — no oversight of renewals, licences, usage, security, or compliance; organisations adopt roughly seven new SaaS applications each month.


Decision-Making, Strategy & Experimentation

Iyengar, Sheena, and Lepper, Mark. "When Choice is Demotivating: Can One Desire Too Much of a Good Thing?" Journal of Personality and Social Psychology (2000). The jam study: 24 options attracted 60% of shoppers but converted 3%; 6 options attracted 40% and converted 30% — the foundational evidence for choice overload. Honesty note: a later meta-analysis (Scheibehenne, Greifeneder & Todd, 2010) found the choice-overload effect does not replicate reliably across contexts; we cite the jam study as an illustration of decision friction, not settled law.

Kaplan, Robert, and Norton, David. "The Office of Strategy Management." Harvard Business Review (2005). Research finding only 5% of employees have a basic understanding of their company's strategy. A consulting-era figure from the Balanced Scorecard literature — dated and methodologically thin; cited as directional.

World Economic Forum. Commentary on "decision distress" and decision fatigue (2023). The widely circulated $400 billion annual cost estimate derives from secondary analyses of this work; treated here as an indicative estimate rather than a precise measurement.

Google. How Search Works (published methodology). In a single year Google reports running 13,000+ live traffic experiments and 800,000+ search quality tests, resulting in roughly 4,000 launched improvements — the benchmark for experimentation culture at scale.


Complexity Theory & Systems Thinking

Christensen, Clayton M. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Harvard Business Review Press, 1997. Resource dependence, process constraints, and identity attachment explain why successful organisations are structurally incapable of responding to disruptive change.

Mandelbrot, Benoit. The Fractal Geometry of Nature. W. H. Freeman, 1982. Self-similarity across scales as a fundamental organising principle of complex systems — the theoretical foundation for ORBIT's fractal scaling model.

Meadows, Donella. "Leverage Points: Places to Intervene in a System" (1997/1999). The twelve leverage points; the structure of information flows is point 6 — high-leverage and notably cheap relative to goal or paradigm change.

Kasparov, Garry. Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins. PublicAffairs, 2017. Freestyle chess tournaments (2005–2008) demonstrated that weak human + machine + better process was superior to both strong humans and strong computers alone — the empirical basis for the Pilot model.

Nadella, Satya. "Every SaaS application is just a database with business logic baked into it. AI will collapse that." — Articulating the structural argument for why AI eliminates entire categories of enterprise software rather than merely improving them.


Influences & Intellectual Lineage

For the broader ecosystem of thinkers across AI research, exponential technology, economics, and product strategy whose work informs these essays, see the Influences & Intellectual Lineage section on the About page.


This page is updated as the essay series evolves. Last updated March 2026.

Get notified when we publish new essays.

Thank you. We'll be in touch.