The Accountability Gap in Agentic AI

For most of the history of machine decision-making, governance had a simple shape: a machine proposed, and a human disposed. The model produced an output; a person decided what to do with it. We called this Human-in-the-Loop, and it served us well. It is now quietly failing, because the premise it rested on (that the human is the one who acts) no longer holds.

Terms, used precisely

Responsibility:: doing the work. Can be delegated, including to a system.
Accountability:: being answerable for the outcome. Sits with a named person and cannot be delegated to software.
Liability:: bearing the legal and financial consequences. Can be allocated by contract, indemnity, and insurance, but allocating it is itself an accountable decision, and regulatory accountability often cannot be transferred at all.

When HAL says accountability can never be delegated, execution can, it means exactly this: the second and third are distinct from the first, and only the first is fixed.

01 · Origins of Human-in-the-Loop

Human-in-the-Loop (HITL) did not begin with machine learning. It is an inheritance from control theory, aviation, and early automation, where a human operator remained the final authority over a machine that could otherwise act on its own. The principle was conservative and sound: where a machine might err in ways that matter, insert a human between its judgement and the world.

When predictive models entered high-stakes settings (credit, medicine, hiring), HITL was the natural control. The model scored; a human decided. Oversight and action lived in the same place: the person.

02 · Why HITL worked

HITL worked because of a structural fact about the systems it governed: they did not act. They produced outputs (scores, classifications, drafts, recommendations) and then stopped. The output was inert until a human picked it up. That gap between output and action was where governance lived.

It also worked because the volumes were human-scaled. A loan officer could review the applications in front of them. A clinician could weigh a model's reading against their own. The number of decisions was bounded by human capacity, and so the review was real.

03 · Why HITL breaks at scale

Two things break HITL: volume and action. When a system makes more decisions than a human can examine, "human-in-the-loop" becomes "human-rubber-stamping-the-loop". The control persists on paper while evaporating in practice. We have all seen the dashboard with an Approve all button.

A reviewer who cannot review is not a control. They are a liability with a job title.

The deeper break comes when systems begin to act. The moment a system can send the email, file the report, move the money, or delete the record, the gap between output and action closes. There is no longer a natural pause in which a human stands. To insert one artificially, by requiring approval of every action, is to throw away the very capability you deployed the system for.

04 · The rise of agentic systems

Agentic systems are defined by exactly this: they take actions in pursuit of goals, often chaining many steps, calling tools, and operating with minimal supervision. They do not wait to be asked. They proceed.

That shift changes what governance must cover. A faster, more accurate assistant remains an assistant. An actor needs a different kind of governance. The governing question is whether the action was authorised, bounded, recorded, and owned.

05 · Delegated authority

The useful frame is delegation. When we let a system act on our behalf, we are delegating authority to it, exactly as we delegate authority to people. And we already know how to do that responsibly. We scope what a person may do, set the limits they must not cross, define when they must escalate, and keep records of what they did.

The same structure applies to an agent. Authority defines what it may do; limits define what it must never do; escalation defines when it must hand back to a human; evidence records what it did. These are not novel inventions. They are the ancient mechanics of delegation, applied to a new kind of agent.

06 · Organisational accountability

Here is the pivotal distinction. Execution can be delegated to a system. Accountability cannot. When a human employee acts within their authority, the organisation remains accountable for what they do. Nothing about substituting software for the employee changes that. The organisation, and a named person within it, remains answerable.

HAL fixes accountability to a person who owns the system. Software has no independent judgement to be held to account; it cannot be sanctioned, dismissed, or struck off. So accountability flows back, through the system, to the human who owns it. That is the entire substance of Human Accountable for the Loop.

07 · Legal parallels

The law of agency has governed delegated action for centuries: a principal is bound by the authorised acts of their agent, and an agent who exceeds their authority creates liability that does not simply vanish. Corporate law, fiduciary duty, and internal authority matrices all encode the same idea: someone is always accountable for the actions taken in an organisation's name.

One precision matters here, because lawyers will rightly insist on it. Software is not an agent in the legal sense. It has no legal personality, owes no fiduciary duty, and no agency relationship arises when you deploy it. In law, an AI system is a tool, and the consequences of a tool's operation land directly on the organisation that wields it. The parallel HAL draws is structural, not doctrinal: the mechanics of sound delegation (scoped authority, limits, escalation, records) transfer; the legal relationship does not. If anything, this strengthens the case for HAL: there is no agent to share the blame with.

Agentic AI puts these principles under pressure. A regulator faced with an erroneous automated filing will not accept "the model did it" as a defence. The organisation that deployed the system is accountable. HAL simply asks organisations to confront that reality before deployment, rather than discover it during an incident.

08 · Future governance models

The trajectory is clear. Organisations will run many agents at once, built in-house, embedded in vendor products, and increasingly calling one another. Governing this estate will resemble portfolio management more than decision-by-decision review: a registry of agents, each with an owner, an authority scope, a risk level, a HAL score, and a review date.

Governance itself will become continuous: instrumented, monitored, and reviewed, in the way operations became continuous with DevOps. And throughout, one line will not be allowed to blur: however deep the stack of agents, accountability terminates at a human. The future of AI governance is about ensuring someone remains accountable for the system making those decisions.

Continue to the eight-domain framework, or put a system to the test with the HAL Score assessment.