Where Does Intelligence Live?

We use the word with striking ease: intelligent people, intelligent tools, intelligent organisations. It glides across those cases almost frictionlessly, as though we were pointing to one stable thing each time.

That ease ought to give us pause. If intelligence were a possession in the ordinary sense, it would not travel so lightly. Possessions have boundaries. They belong somewhere. Yet we extend the label wherever we encounter behaviour that feels coherent, adaptive, and context-sensitive. We recognise a certain order and assent to it.

A Word That Travels Too Easily #

For much of intellectual history, that assent had a clear direction. Intelligence was treated as a faculty of mind anchored in a subject. It belonged to someone. Reading Dante’s Inferno recently, I was struck again by how confidently intelligence is placed. Souls are arranged by their capacity to reason rightly inside a moral architecture that precedes them. Intelligence has an address there. It sits inside a hierarchy, and the hierarchy itself is fixed.

That confidence no longer survives contact with many of the systems we now use. Behaviours we call intelligent often arise in settings that do not resemble a subject at all. A language model can produce lucid explanation without biography. A market can aggregate dispersed information without a centre. A research group can arrive at an insight no individual member could have generated alone.

What those cases share is not inner life but organised performance. Inputs are absorbed, constraints negotiated, outputs adjusted. Something structured is happening. The structure can be real even when no single bounded agent contains it.

Language model

Fluent, adaptive output without lived experience or personal stake.

Market

Coordination emerging from many local decisions rather than one centre.

Research group

Cognitive work spread across people, methods, norms, and feedback loops.

Once you see that, the problem stops being purely definitional. The pressing question is how we decide, in practice, that we have encountered intelligence at all.

We Rarely See Intelligence Directly #

We do not usually observe intelligence itself. We infer it from traces.

For a long time, those traces were calibrated on human minds: coherent speech, sustained reasoning, error-correction, flexible problem-solving. Imperfect signals, but useful ones. They did not guarantee depth in every case, yet they worked often enough to become habit.

Later we formalised the habit. Intelligence became testable, or at least rankable. Scores and batteries never settled the metaphysics, but they gave institutions a way to compare individuals without first resolving first principles.

That history still shapes us. When a system now solves tasks, adapts to novelty, and maintains internal consistency, we reach for the same word almost automatically. In many settings, that move is justified. Coherence has been a strong proxy.

The trouble is that every proxy has a training environment. Ours was tuned on embodied agents with memory, stake, continuity, and vulnerability. The systems now being judged are assembled from training data, optimisation procedures, interface design, tooling, and user workflows. The old proxy still catches real capability. It also overgeneralises.

You can see this in ordinary use. A model does one task well and users quietly treat that competence as portable. A clean answer in one domain becomes confidence in another where the error profile is different. The prose transfers faster than reliability.

A Small Encounter That Stayed With Me #

Out of curiosity, I asked a language model a simple question: what is intelligence?

Intelligence is the capacity to learn from experience, adapt to new situations, reason about complex information, and achieve goals across environments.

It was a sensible answer. It made the expected distinctions and avoided obvious mistakes. You could imagine hearing a near-identical sentence in an introductory seminar.

What held my attention was not the claim but the texture. The response arrived as if the category were settled, as if the terrain were tidy. The language was competent but strangely frictionless, with little sign of the underlying dispute.

When a person offers a succint definition, we usually assume there is a history behind it: arguments encountered, views revised, objections absorbed, moments of being wrong. Even unspoken, that history gives the sentence weight.

Model output can imitate that weight very well. Sometimes it does more than imitate and lands on something genuinely useful. The difficulty is that the same surface can represent either case.

I notice this in myself. Clean writing can make me trust too quickly. Overly clean writing can make me overcorrect and dismiss too quickly. Neither reflex is reliable. Style carries signal, but the signal is noisy.

A few months ago I tested this on myself without planning to. I asked a model for help on a topic I know well, then asked for help on a neighbouring topic I only partly understand. Both replies had the same poise. The first was mostly right. The second was confidently wrong in ways I might have missed if I had not checked. That small jolt has stayed with me: tone can be stable while reliability is not.

What Are We Actually Tracking? #

Once that mismatch appears, a harder question follows. What are our intelligence judgements really tracking?

The components only make sense together. Parameters without data do nothing. Data without optimisation is archive. Optimisation without interfaces and uptake never enters social use. Change any link and the behaviour changes with it.

That is why arguments over a single “true location” keep feeling incomplete. They isolate one component and treat it as the whole.

Intelligence may simply not be the kind of thing with one address.

Distribution, Without Hand-Waving #

Saying intelligence is distributed can easily become a way of saying very little. If “distributed” merely means “complicated,” we learn nothing.

The stricter version I find useful is this: intelligence is attributed where adaptive coherence is maintained under real constraint.

That criterion does not pre-select a substrate. It does not decide in advance whether the relevant unit is a person, a team, a toolchain, or a hybrid arrangement. It asks where modelling, adjustment, and correction are actually happening, and whether they persist when pressure increases.

In practice, that often means looking at the mechanics of reliability rather than declarations of capability. Where do errors get caught? Where do they compound? Which part of the arrangement carries the correction burden? What fails first when time, data, or resources are tightened?

The strongest examples are usually in environments where failure is expensive. In good operating theatres, reliability is not reducible to individual brilliance. It is produced by an arrangement: role clarity, timing discipline, communication protocols, checklists, escalation rules, equipment design. Remove enough of that arrangement and even excellent individuals become less dependable.

None of this erases differences between system types. A person is not a model. A model is not a team. A team is not a market. The claim is narrower than that: similar attributional logic can apply across levels without collapsing those levels into one another.

Why Location Still Matters #

At this point a fair objection appears. If intelligence is relational, why keep asking where it lives?

Because location and responsibility are entangled.

When intelligence is attributed to an individual, responsibility usually follows the attribution quite directly. When intelligence is attributed to an arrangement, responsibility can diffuse unless we actively preserve it.

You can hear diffusion in ordinary technical language. A pipeline “selected” this. A model “decided” that. A tool “flagged” the case. Those descriptions can be operationally accurate at one level. They can also obscure who set the thresholds, who accepted the trade-offs, and who approved the deployment context.

So two claims must stand together. Intelligent behaviour can be distributed. Responsibility cannot be dissolved.

There is another layer that complicates judgement. Intelligence is not only detected; it is also presented.

Teams make countless design choices that shape perceived competence long before users inspect technical detail: interface tone, response latency, citation format, benchmark framing, confidence language, error messaging. None of this is trivial decoration. It changes how capability is experienced.

That is not automatically cynical. Better interfaces often reduce misuse. Better wording often reduces ambiguity. But it does mean our judgements sit inside a mixed signal field, partly epistemic and partly performative.

In many product environments the same sequence repeats: strong sample output, light editorial polish, confident interface, selective benchmark, persuasive narrative, broad uptake. Each step can be defensible in isolation. Together they can generate a degree of certainty the system has not yet earned.

This is where empty scepticism is less useful than technical scepticism. Calling everything hype proves almost nothing. The harder and better move is to identify where behaviour is robust, where it is brittle, and which interventions actually shift that boundary.

Human Intelligence Was Never A Sealed Interior #

A further complication is easy to miss because it cuts against a comfortable contrast. When people oppose “human intelligence” to “system intelligence,” they often treat the human side as a sealed interior possession.

It never was.

Language is inherited. Concepts are social. Methods are taught. Memory is externalised into notes, archives, software, and networks. Even solitary thought depends on tools and categories acquired in shared life.

That does not diminish individual minds. It situates them.

Seen this way, current AI does not insert distribution into a world that was previously undistributed. It alters the scale, speed, and architecture of dependencies that were already there.

You can see that dependency in small, ordinary habits. We think with calendars, search bars, scratchpads, diagrams, syntax highlighters, and version histories. We borrow conceptual tools from teachers we have never met and from disciplines we have never formally studied. Even self-correction is often social in origin: a phrase we learned from someone else, a standard we inherited, a method we copied and then made our own. Calling that “external support” can make it sound secondary, but for most serious work it is the substrate rather than the accessory.

A Working Discipline #

I no longer think this topic benefits from a single grand definition. What helps more is a working discipline: make intelligence claims local and explicit. Name the task, horizon, resource budget, acceptable error, and correction path. Once those are clear, disagreement becomes testable rather than rhetorical.

Evaluating Models Without Theatricality #

Applied to model evaluation, this discipline is straightforward and demanding.

Avoid the two easy stories. One says fluent language implies deep understanding. The other says lack of consciousness makes intelligence language meaningless. Both stories close the question too early.

The better questions are slower and less glamorous. What survives distribution shift? Which mistakes disappear with prompt changes, and which persist? How much of observed performance comes from the model versus retrieval, tooling, workflow constraints, or human patching? What breaks when you reframe the task rather than repeat it?

If the goal is deployment, one question dominates: who absorbs the failure cost? A system can be excellent for drafting and unacceptable for routing time-critical decisions, even at similar headline accuracy, because the misses land in different places.

A practical rule I keep coming back to is simple: treat outputs as evidence, not confession. A response shows behaviour under conditions; it does not disclose an inner state.

That rule also protects against a quieter mistake: judging only by average performance. In deployment, averages hide structure. A model can look strong overall while failing repeatedly on one subtype of case that users assume is safe. Another model can look weaker on aggregate and still be preferable because its failures are legible, recoverable, and easy to intercept. The quality of a system is not just how often it is right; it is also how it fails, how quickly those failures are noticed, and how gracefully the workflow recovers.

The Personal Register #

There is also a personal reason this question keeps returning.

Intelligence language leaks into identity very quickly. Many of us were trained, implicitly or explicitly, to treat intelligence as a trait one either has or lacks. In calm settings that can motivate. In high-variance settings it becomes brittle fast.

A relational account tracks lived experience better. Performance depends on fit: fit between person and task, task and environment, environment and support.

In technical education this is obvious once you stop pretending otherwise. Some people underperform on timed exercises and do exceptional work on open-ended problems. Others excel at canonical tasks and stall when the frame shifts. Static labels hide more than they reveal.

Compression, Detail, and What Holds #

I have written before about what remains when detail is compressed away. This feels related: After Emergence.

When we call something intelligent, what remains under perturbation? What still works when framing changes, resources tighten, data shifts, or time horizons move?

Many systems look impressive inside richly narrated demonstrations and then fail under modest variation. Others look ordinary in isolated snapshots yet remain stable over long runs.

The practical mistake is to confuse descriptive resolution with explanatory depth. Detail is easy to add. Stability is harder to earn.

A lot of argument cycles around intelligence come from that mismatch. We reward presentation-ready evidence, then act surprised when confidence collapses under broader testing.

If the aim is durable understanding, then test design has to mimic disturbance rather than perfection. Vary the framing, alter the order of information, tighten the resource budget, change the objective slightly, and watch what remains stable. Stability is not glamorous, but it is where trust is built. A capability that survives only in one polished setting is not useless, but it is narrower than it first appears.

Where I End Up #

So where does intelligence live?

Not in one obvious place.

If “live” is the right verb, it lives wherever adaptive coherence is maintained under constraint. Sometimes the centre of gravity is largely individual. Sometimes it is spread across people, tools, and routines. Usually it is mixed.

That does not flatten important differences. Human life includes dimensions no benchmark captures: embodiment, vulnerability, finitude, responsibility, care. Those dimensions matter independently of measured performance.

At the same time, reserving intelligence language exclusively for one substrate now obscures too much of the cognitive work that actually structures modern life.

The posture that seems most honest to me is modest and exacting at once; modest about metaphysical certainty, exacting about evidence, correction, and consequence.

I have ended up trusting a quieter test. Not how certain an arrangement sounds, but how well it tolerates correction. The most valuable systems, in my experience, are the ones you can revise repeatedly without the whole thing falling apart.

A Word That Travels Too Easily #

We Rarely See Intelligence Directly #

A Small Encounter That Stayed With Me #

What Are We Actually Tracking? #

Distribution, Without Hand-Waving #

Why Location Still Matters #

The Social Framing of Intelligent Appearance #

Human Intelligence Was Never A Sealed Interior #

A Working Discipline #

Evaluating Models Without Theatricality #

The Personal Register #

Compression, Detail, and What Holds #

Where I End Up #