Why we never truly know — and why that is the engine, not the obstacle
"Doubt is not a pleasant condition, but certainty is absurd."
— Voltaire
CHAPTER THESIS: Faced with complexity, every instinct screams for certainty — the master plan, the single source of truth, the forecast we can finally stop questioning. That instinct is the trap. In the domains that decide an organisation's fate, certainty is not available, and pretending otherwise is how confident organisations walk off the burning platform.
The Crisis showed us giants falling. Watch closely and you notice they rarely fall from indecision. They fall from misplaced confidence — a strategy held too firmly for too long, a roadmap that outlived the world it was drawn for, a "single source of truth" that became a single source of confident, scaled error.
The evidence that we systematically overrate our own certainty is overwhelming. Daniel Kahneman's work on the planning fallacy (Thinking, Fast and Slow, 2011) found that American kitchen renovations expected to cost an average of $18,658 actually cost $38,769 — a 2x miss, driven not by bad luck but by confident optimism that ignored everything outside the plan. Kahneman named the underlying mechanism What You See Is All There Is: the mind builds a coherent, confident story from the evidence in front of it and goes blind to the evidence that isn't. Coherence feels like accuracy. It is not.
Philip Tetlock proved the same point at the level of expert judgement. Across a 20-year tournament of roughly 82,000 forecasts (Expert Political Judgment, 2005), the experts who reasoned from a single commanding idea — Tetlock's "hedgehogs" — were more confident and less accurate than the "foxes" who held many models loosely and updated often. The most certain experts were the least reliable. Confidence that cannot be moved is not a strength. It is the most expensive form of ignorance.
One essential caveat, because this principle is dangerous when over-applied. Not every problem lives on the frontier. Dave Snowden's Cynefin framework (Harvard Business Review, 2007) distinguishes the clear and complicated domains — where cause and effect are knowable and the right move is to apply proven best practice — from the complex domain, where cause and effect are visible only in hindsight. A surgical checklist is not a hypothesis to be revised mid-operation; Atul Gawande's Checklist Manifesto (2009) documented complication and death rates falling by a third or more when teams followed fixed procedure precisely. The discipline is knowing which world you are in.
The decisions that determine an organisation's survival — markets, customers, competitors, strategy — almost all live in the complex domain. That is exactly where false certainty is fatal, and exactly where the instinct to demand it is strongest.
"Our knowledge can only be finite, while our ignorance must necessarily be infinite."
— Karl Popper
CHAPTER THESIS: For everything empirical — every belief about how the contingent world actually behaves — certainty is not on the menu. We only grow more or less confident. Progress comes not from knowing, but from looping: observe, conjecture, test, revise. This is not cynicism. It is the settled conclusion of three centuries of careful thought, and it is the most productive posture an organisation can adopt.
This is old, hard-won ground. David Hume showed that no quantity of past observation can prove the next one — the sun having risen every prior day does not entail tomorrow's sunrise; it only earns our confidence in it. Charles Sanders Peirce gave the posture its name, fallibilism, and insisted it was the scientific attitude itself, not a retreat from it. Karl Popper made it an engine: a claim earns its standing not by being proved, but by surviving every honest attempt to break it (The Logic of Scientific Discovery, 1934). None of them counselled despair. They counselled calibration.
Be precise about the boundary. We are not talking about 2 + 2 = 4 or the truths of logic — those are certain because we defined them so. We are talking about empirical knowledge, and there the right posture is calibrated confidence: strong where the evidence is dense, humble where it is thin, and always open to revision. Frank Knight and, later, Nassim Taleb mapped the regions where the uncertainty runs so deep that no honest probability can be assigned at all — and there the wise response is not a bolder forecast but humility, optionality, and a refusal to bet the enterprise on a guess dressed as a fact.
So if we never reach the far side, how does anything advance? Not by knowing — by looping. Eric Ries built a movement on it (The Lean Startup, 2011): build, measure, learn, and pivot when the evidence demands. Colonel John Boyd built a theory of competition on it — the OODA loop, where whoever cycles through observe-orient-decide-act fastest disorients the opponent. Snowden's prescription for the complex domain is the same shape: probe, sense, respond — safe-to-fail experiments rather than grand plans.
And here we should be honest, because the honesty makes the case stronger. The deliberate scientific method is not the only engine of progress under uncertainty. Friedrich Hayek showed that markets advance with no one forming a hypothesis at all — prices quietly aggregate more knowledge than any planner could hold (The Use of Knowledge in Society, 1945). Michael Polanyi reminded us that craftspeople carry skill they cannot put into words: we know more than we can tell. Taleb argues that in opaque domains, disciplined tinkering beats grand theory. What unites the laboratory, the market, and the workshop is a single shape, and it is the one lesson this essay asks you to carry: fixed plans lose to adaptive loops. The organisation that wins is not the one with the best forecast. It is the one that turns beliefs into cheap experiments and learns faster than its rivals.
This is why, three essays from now, the methodology is called ORBIT — and why it goes around. You orbit because you can never leap straight to the truth. Iteration is not a concession to imperfection. It is the only known way to make progress on the frontier.
"The fox knows many things, but the hedgehog knows one big thing."
— Archilochus
CHAPTER THESIS: The most striking evidence that calibrated humility beats confident expertise comes from the place you would least expect it — a forecasting tournament run by the United States intelligence community, in which teams of ordinary volunteers, holding no secrets at all, out-predicted professional analysts with access to classified intelligence. They won not by knowing more, but by refusing to be certain.
In 2011, the research arm of the US intelligence community — IARPA — did something unusual. Rather than assume its own analysts were the best forecasters available, it put the question to the test. It ran a multi-year tournament, asking competing teams to assign probabilities to hundreds of real geopolitical questions: Will this regime fall within six months? Will these talks produce an agreement? Thousands of questions, scored mercilessly against what actually happened.
One of the teams was the Good Judgment Project, led by Philip Tetlock and Barbara Mellers at the University of Pennsylvania. Its forecasters were not spies or area experts. They were volunteers — pharmacists, retirees, hobbyists, an irrigation-systems specialist from Nebraska — people whose only qualification was a willingness to think carefully about questions they could not possibly know the answer to.
They did not just win. They won by a humiliating margin. The Good Judgment Project outperformed the tournament's control group by 60% in the first year and 78% in the second — so decisively that IARPA wound down the competing university teams. Its best forecasters — the "superforecasters" Tetlock identified and clustered (Superforecasting, Tetlock and Gardner, 2015) — reportedly outperformed professional intelligence analysts with access to classified intercepts by around 30%, a claim first reported by the Washington Post's David Ignatius in 2013 and never officially confirmed. Amateurs reading the open news, it seems, beat professionals reading the secret cables.
The natural question is: how? And the answer is the entire thesis of this essay, made concrete. The superforecasters were not smarter in any measurable way. What they shared was a method, and it was the method of the frontier:
This is the same contrast Tetlock had documented years earlier: the confident "hedgehogs" who explained the world through one commanding idea were reliably beaten by the self-questioning "foxes" who held many partial models and revised them often. The person in the room most certain about the future is, on the evidence, usually the least reliable guide to it. Certainty is not a signal of skill. Calibration is.
Look hard at what your organisation rewards. Most reward conviction — the executive who states the future with the most confidence wins the room, the budget, the promotion. The Good Judgment Project is a warning that you may be selecting for exactly the wrong trait. Build instead for calibration: reward the person who updates when the evidence turns, track whether confident predictions actually came true, and treat "I was wrong, here is my revised view" as a mark of competence rather than weakness. An organisation that punishes updating is training its best minds to defend yesterday's guess.
This is not a counsel of indecision — the superforecasters made sharper calls than anyone, and acted on them. It is a counsel of held-loosely conviction: bet, watch, and be the first to notice when the bet is going wrong. And that posture is not a personality trait you are born with. It is something an organisation can build into its tools, its documents, and its operating rhythm — starting with two practices.
The first practice is calibrated humility — the posture the superforecasters held individually, written into how an organisation holds its knowledge. The rule has two halves. Every belief about the world carries a confidence, and the confidence is proportional to the evidence — not to seniority, not to how long the belief has been around, not to how awkward revising it would be. And — the half organisations forget — that confidence can approach certainty, but it never arrives. Project managers already know this rule in another costume: a risk at 99% is still a risk. It becomes an issue by happening — never by crossing a threshold. The same boundary runs through all organisational knowledge. What has actually been observed, decided, or recorded can be certain: the payment cleared, the contract was signed, the decision was made. These are the inert facts an organisation reasons from. Everything concluded from evidence — the churn pattern, the market read, the architecture bet — is the provisional present it reasons about: forever revisable, however well it has aged. Organisations that blur this line, letting a strong conclusion quietly harden into a "fact," are manufacturing the next confident, scaled error.
Between those two poles, knowledge matures the way science says it should. A question becomes an observation; an observation suggests an idea; an idea sharpens into a hypothesis; a hypothesis earns an experiment; the experiment yields evidence; evidence accumulates into insight; insight settles into principle; and a principle that survives everything thrown at it begins to look like law. Confidence rises with every rung — that is what the rungs are for. But here is the discipline the ladder enforces: it runs entirely within the territory of the revisable. Even at the top, a law is the most-matured conclusion an organisation holds — not a graduated fact. A well-supported theory is still a theory, never an observation; Newton held for two centuries and was still revisable when Einstein arrived. Climbing the ladder earns trust. It never earns certainty.
One refinement separates the masters from the diligent: "uncertain" and "unchecked" are different problems. A contested belief needs more evidence. A stale one needs a re-check. Time alone does not make a belief false — but it does make it unexamined, and an organisation should know which of its load-bearing beliefs it has not looked at lately. The history matters too: a belief whose confidence has been climbing for a year and one that has been quietly sliding deserve different conversations, even if they sit at the same number today.
The second practice is structured ignorance — making what you don't know as visible as what you do. Every organisation inventories its knowledge: the dashboards, the wikis, the reports, the accumulated record of everything it has ever looked at. Almost none inventories its ignorance. Yet the frontier — the boundary between known and unknown — is where every consequential decision actually lives. So map it, mission by mission: the open questions, the untested assumptions, the places where two confident beliefs quietly contradict each other, the strategic unknowns that nobody owns. A contradiction, on this view, is not an embarrassment to be smoothed over in the next deck. It is a signal flare marking exactly where the map is wrong — the most valuable real estate on the frontier. An organisation that manages its ignorance puts its attention on the highest-leverage unknown rather than the largest pile of the known. It optimises for decision quality, not information quantity.
Neither practice waits on technology — Tetlock's volunteers ran calibrated humility on spreadsheets. What AI changes is the scale. A confidence and a provenance can now travel with every belief the organisation holds, the frontier can be redrawn continuously rather than at the annual offsite, and the same body of knowledge can be viewed through whatever lens an honest question demands: beliefs ranked by confidence, by how long since anyone checked them, by where they contradict one another, by how close they sit to the edge of the known. How those lenses work is the subject of The Collapse; how they become a working rhythm is the subject of The Orbit.
An organisation's intelligence is measured less by what it knows than by how honestly it knows it — and by whether it can see what it doesn't. Calibrated humility keeps every belief priced to its evidence; structured ignorance keeps the unknown on the map. Together they are nothing more — and nothing less — than the scientific method, run as an operating discipline.
"Program testing can be used to show the presence of bugs, but never to show their absence."
— Edsger W. Dijkstra
CHAPTER THESIS: There is exactly one exception — the things we author ourselves, and above all, our code. But even that certainty has edges: a program's identity is certain, its correctness is only ever relative to a specification, and its value becomes a hypothesis again the moment it meets the world.
A program's text is a formal object. We wrote every character; in principle we can reason about exactly what it says. This is real, and it is rare, and it is the still point the whole methodology turns on. But certainty here is layered, and software engineering has spent fifty years mapping the layers.
Its identity is certain — the text is the text. Its correctness, though, is only ever relative to a specification. The discipline names the gap precisely: Barry Boehm's distinction between verification ("are we building the system right?") and validation ("are we building the right system?"). You can verify a program flawlessly against a specification that is quietly wrong — and history is a graveyard of exactly that. The Ariane 5 rocket was lost on its maiden flight to software that was correct — for the Ariane 4. A specification assumption carried over unexamined met a faster trajectory, and a failure measured in hundreds of millions of dollars followed. The logic did not fail. The bet encoded in the spec failed.
And even the running program rides a chain we did not author: Ken Thompson's Reflections on Trusting Trust (1984) showed you cannot fully trust code you did not write all the way down — the compiler, the runtime, the silicon are someone else's authorship. As Dijkstra warned, testing can reveal the presence of bugs but never their absence. Certainty about behaviour is not something testing can buy.
Even our one island of certainty is bounded by the water's edge. The moment software leaves your hands and meets a user, a market, a need, its value becomes a hypothesis again. Does it do what people actually required? That question is never answered in the source. It is answered out on the frontier, where we are never certain — which is why we ship to learn, not merely to deliver.
"A computer can never be held accountable, therefore a computer must never make a management decision."
— IBM training manual, 1979
CHAPTER THESIS: If we are never certain, the temptation is to ask the machine to be certain for us. We will not — but the usual reason is wrong, and getting the reason right is what makes the answer durable. The human stays in command not because humans judge better, but because decisions carry values, and values require an owner who can answer for them.
The honest reason is not that human judgement is superior under uncertainty. Often it is not. Paul Meehl's Clinical versus Statistical Prediction (1954) launched a line of research that has held for seventy years: a 2000 meta-analysis by Grove and colleagues, across 136 studies, found simple statistical models beat or matched expert human judgement roughly 94% of the time. Even the original "centaur" — the human-plus-engine chess pairing that famously beat engines alone in 2005 — has since been overtaken: as the engines grew superhuman, the human hand became a drag rather than a lift, and the machine alone now wins (Krakowski, Luger and Raisch, Strategic Management Journal, 2023).
If we rested the human's authority on being the better forecaster, we would forfeit that authority the moment the machine improved. And it always improves. So we ground it elsewhere, on foundations that do not erode. Decisions carry values, and values require an owner. Someone must be answerable for a choice — to customers, to colleagues, to regulators, to the public. A machine can compute a recommendation; it cannot be accountable for one. It cannot stand up and answer when the choice goes wrong. This is the principle behind the "meaningful human control" doctrine (Santoni de Sio and van den Hoven, 2018) and behind human-oversight law such as Article 14 of the EU AI Act.
So the partnership at the heart of everything that follows — the Centaur — is not a performance architecture, a trick for squeezing better output from the pair. It is a moral architecture. The AI navigates the frontier at superhuman speed and scale; the human stakes something on the outcome and owns the result.
Transparency of process — the Glass Box, not the black box — and sovereignty over your data are not two commitments. They are one: the answerable human stays in command of what is theirs. You cannot delegate accountability to a system you cannot see, any more than you can delegate it to a vendor holding your crown jewels on a server you do not control. As the AI grows more capable, this matters more, not less.
Knowledge, then, is not a vault of settled facts to be guarded. It is a living frontier — confident where we have looked hard, provisional everywhere, shared so the whole organisation navigates by one chart, and redrawn the instant the evidence demands. To treat it as finished is to begin dying. To treat it as alive is to stay in the game.
This is the bedrock the rest of this work stands on, and it is why the methodology to come takes the shape it does. A Mission is a hypothesis worth betting on. Labs is where we run the experiment before we trust the result. The Orbit is the loop itself, turning belief into evidence and evidence into better belief. And the Centaur is the human who — precisely because nothing out here is certain — keeps a hand on the wheel. (The leadership half of this argument — perception, opportunity, and the mission as a living hypothesis — is developed in the companion series, The Positive Sum.)
We ask you to hold this essay the way we ask you to hold everything else. Confidently: we believe it is right, and we have tested it hard. Provisionally: if the evidence turns, so will we. That is not a weakness in the argument. It is the argument.
THE GROUND IS SET. Who acts on it?
ESSAY III: THE CENTAUR
This essay is the trilogy’s epistemic foundation — calibrated humility at the Amplifier scale. Companions: The Mirror (seeing your own mind clearly) · The Experiment (probing under uncertainty) · Glossary · Unified Framework
Want to continue reading about complexity collapse and enterprise transformation?