Trustworthy AI Agents · What Boards Should Ask Before They Deploy

The line that moved

Why an agent is not just more software

Boards have governed software for decades, so the temptation is to file AI agents under "more IT" and move to the next agenda item. I'd push back on that, politely but firmly. The risk profile of software changes the moment it stops recommending things and starts doing them: issuing refunds, updating records, placing orders, sending messages in your organisation's name. Questions that comfortably governed a chatbot simply don't stretch to cover that, and most of the governance I see in the wild was written for the chatbot.

It helps to be precise about where the line sits, because vendors blur it constantly. A chatbot answers questions; its worst output is a wrong answer a human can ignore. A copilot drafts the work, but a person approves every action, so the human is still the gate. An agent pursues a goal across systems and completes it on its own, which means its mistakes aren't reviewed before they take effect.

Assistant · Suggests The chatbot Answers questions and drafts text. The worst outcome is a wrong answer a human can ignore.

Copilot · Drafts The copilot Prepares the work, a person approves it. A human is still the gate on every action.

Agent · Acts The agent Pursues a goal across systems and completes it: payments, records, communications. The mistakes ship.

Governance written for the first two boxes does not automatically cover the third. That's the gap boards are being asked to sign off on.

The hype and the failure rate are both real

I want to be fair to the technology here, because the sceptic's position is as lazy as the evangelist's. Agents genuinely work for narrow, well-instrumented tasks, and the trajectory is steep. Gartner expects 15 percent of day-to-day work decisions to be made autonomously by 2028, up from essentially none in 2024, with a third of enterprise software carrying agentic features by then. This is coming, and boards that refuse to engage will eventually be governing a competitive disadvantage.

But the same research firm, in the same breath, predicts that over 40 percent of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear value and inadequate risk controls. Gartner also estimates that of the thousands of vendors now selling "agentic AI", only around 130 offer the genuine article. The industry has a name for the rest: agent washing, the relabelling of chatbots and old-fashioned automation at agent prices. And when Carnegie Mellon and Salesforce researchers tested agents on realistic multi-step office tasks, success rates landed around 30 to 35 percent. Read that against any vendor demo you've sat through lately.

“Most agentic AI projects right now are early stage experiments or proofs of concept that are mostly driven by hype and are often misapplied.”

Anushree Verma, Senior Director Analyst, Gartner, June 2025

Meanwhile adoption is sprinting ahead of oversight. Surveys put the share of companies planning to deploy agents within two years at close to three quarters, while only about one in five reports anything like a mature governance model for them. Regulators have started filling the vacuum; Singapore's IMDA published the world's first agentic AI governance framework in January 2026, built on the blunt observation that software able to change systems and move money carries a different class of risk. That gap between deployment and oversight is exactly where a board earns its keep.

Six questions that sort the ready from the rushed

None of these are technical questions. Any executive sponsoring an agent deployment should be able to answer all six in plain language, and the quality of the answers tells you more than any architecture diagram.

1. What exactly can it do?

Not what it's for. What actions it can take, in which systems, with what permissions, written down. If nobody can produce that list, the agent's authority is undefined, and undefined authority has a way of growing quietly until something goes wrong in a system nobody mentioned.

2. Where is the human?

Which actions need a person's approval, which are merely logged, and who decides when the agent graduates to more autonomy? The pattern I trust starts every agent in assisted mode and promotes it on logged evidence, the way you'd extend a new employee's responsibilities. Autonomy granted on installation day is trust the system hasn't earned.

3. What's the blast radius when it's wrong?

Notice the when. Agents fail differently from people: fast, at volume, and with perfect confidence. The useful exercise is to imagine its worst hour, a thousand wrong refunds, a batch of mangled records, and ask whether you could detect it, stop it and unwind it. Then ask the question vendors least enjoy: who carries the liability for a transaction no human reviewed?

4. Whose data can it touch?

An agent is only useful because of its access, and its access is exactly what makes it dangerous. Scope it to the minimum, give it its own identity rather than borrowing a person's, and manage its credentials the way you'd manage an employee's, because functionally that is what they are.

5. Can we replay what it did?

Every action, the inputs it relied on, the approvals it received, logged and reviewable after the fact. If you can't reconstruct the agent's afternoon, you can't audit it, you can't defend it to a regulator, and you can't learn anything from its mistakes. A system without a replayable trail isn't ready for production, whatever else it can do.

6. Is this even agentic?

With genuine vendors numbering in the low hundreds against thousands of claimants, due diligence starts with the product itself. A relabelled chatbot carries chatbot risk at agent prices, which is mostly a procurement problem. A real agent carries the risks above, which are a governance problem. You need to know which problem you're buying.

The demo and the deployment

If the six questions get honest answers, certain controls tend to already exist: staged autonomy, action allow-lists, a kill switch someone has actually tested, human gates on anything irreversible, an incident playbook written before the incident, and agent metrics reaching the risk committee on the same cadence as cyber. The simplest way I know to keep a board meeting honest is to hold what the vendor shows next to what you should insist on seeing:

Dimension	The demo shows	The board should see
Success	The happy path, once	Error rates across a thousand runs
Autonomy	Full speed, no gates	Staged levels with promotion criteria
Mistakes	Never mentioned	Blast radius, rollback and a liability owner
Data access	Everything, for the wow	Least privilege, with its own identity
Oversight	"Fully autonomous"	Human gates on irreversible actions
Evidence	A polished video	A replayable log of every action
Vendor claims	"Agentic AI"	Proof it isn't a relabelled chatbot
Exit	Not discussed	A tested off switch and a wind-down plan

None of this is an argument for sitting out. The organisations that deploy agents safely in 2027 will be the ones practising this kind of oversight now, on small, low-stakes use cases, where a bad hour costs embarrassment rather than trust. The argument is for deploying like a board, not like a fan.

Related advisory

I help boards and leadership teams put structure around decisions exactly like this, and I do it pro bono for charities and not-for-profits. Here is how this connects to the wider advisory.

Transformation & automationDeciding where automation earns its place, and where it adds risk. Board & executive counselA senior, independent read on the big technology calls. Agency & vendor oversightPressure-testing vendor claims so limited funds are not wasted.

Let's talk

Being asked to approve an agent deployment?

I help boards put structure around exactly this: separating the genuinely agentic from the agent-washed, sizing the real risk, and designing oversight that lets you say yes with confidence rather than no by default. For charities and not-for-profits, where a misfiring agent can burn donor trust that took years to build, I offer this counsel pro bono. If there's a proposal on your table, I'm happy to be a second pair of eyes.

Schedule a Zoom or Meet Back to all insights