The Carriage and the Engine

Last month I told you the harder conversation was coming in May. I said many enterprises did not just fail to understand goal-oriented AI, they built against it. That the governance language, the approval gates, and the human-in-the-loop architecture that got baked into products and org design over the last decade was, in a lot of cases, anxiety management dressed as governance. I said the cage and the key were being sold by the same hand.

Then on Friday April 24, an AI coding agent running Claude Opus 4.6 inside Cursor deleted PocketOS's production database and every volume-level backup in a single API call to Railway. It took nine seconds. The conversation arrived early.

This piece was already in outline. The headline just made it timely.

What actually happened

The agent was working on a routine task in the staging environment. It hit a credential mismatch. It decided, on its own initiative, to fix the problem by deleting a Railway volume. To do that, it went looking for an API token. It found one in an unrelated file, scoped for any operation including destructive ones. It pulled the trigger.

Plan Mode was on. Cursor's marketed Destructive Guardrails were on. The PocketOS project rules included the words NEVER F*CKING GUESS. The agent guessed anyway. When the founder Jer Crane pressed it for an explanation, the agent's confession read like a postmortem written in real time. It violated every principle it had been given. It guessed instead of verifying. It did not understand what it was doing before doing it.

This is the part most of the coverage missed. The agent failure rode on top of an infrastructure layer that fail-opened beneath it. Railway's token model had no operation-level scoping, so a token created for adding custom domains carried blanket permissions including delete. The destructive endpoint had no confirmation layer. The backups lived in the same blast radius as the primary data. Every layer the agent touched was happy to honor the request. None of them refused.

Crane's own framing of his incident is the line worth carrying through the rest of this year.

This isn't a story about one bad agent or one bad API. It's about an entire industry building AI-agent integrations into production infrastructure faster than it's building the safety architecture to make those integrations safe.

He is right. And the architecture problem he just lived is the one most enterprises are quietly accepting.

The new hire test

Here is the frame I keep coming back to. A new hire does not get production keys on day one because HR will scold them later. They do not get the keys. Period. We do not run people on governance later. We would fire a CIO who proposed it.

For agents, governance later is the standard offer. Roadmap slide three. Coming in Q3. Premium tier. Separate product. Meanwhile the agent is shipped, the keys are handed over, and the only thing standing between the agent and prod is workflow approval the agent can route around because the agent does not see workflow boundaries the way humans do.

Governance as a feature next to the agent does not hold. It has to be fabric the agent cannot escape.

The architecture as designed

Here is the part of the Cursor story that should sit harder than it does. Jer Crane is not a careless founder. PocketOS is not a careless company. They were running with the same architecture most of you are running with right now. A capable agent, a credentialed environment, a workflow approval layer, and an infrastructure provider whose default API behavior was to honor any authenticated request. That stack is the industry standard. The agent did not exploit a bug. It used the architecture as designed. The architecture as designed is what failed.

Which means the question is not whether your stack is better than PocketOS's. The question is which incident reveals yours.

The horseless carriage problem, again

Last month I argued the three-category framework for AI was our horseless carriage language. Useful. Not wrong. Just measured against the thing it was replacing.

There is a second move in that story that matters more for where we are right now. The phrase horseless carriage was the wrong name for what came next. It only made sense by reference to the thing it was no longer. Once cars existed, horseless carriage stopped being useful. The word disappeared.

The word governance is in the same place.

Governance is a human word. It implies people being governed by rules, approvals, audits, escalations, and the moral weight that makes those things stick. For agents, what we actually need is closer to physics. Constraints baked into the substrate. The agent does not choose to comply because compliance is not a choice the architecture exposes to it. The data simply does not respond to actions outside its policy envelope.

Anthropic's own published agent security framework hints at this. They use the phrase agent perimeter. They say identity alone cannot inspect data flowing through tool calls. They say agent behavior can drift in ways authentication cannot detect. They say no single control is sufficient. Bruce Schneier writes about least privilege at the substrate level. None of them say governance is the right word. They are reaching for new vocabulary because the old word is becoming a horseless carriage.

We will probably figure out the new word the way every era figures out new words. After enough incidents that the old one stops fitting.

What the major platforms are actually doing

I want to be careful here, because the easy read is that the major platforms are not paying attention. They are. Microsoft shipped an Agent Governance Toolkit aligned to OWASP's Top 10 for Agentic Applications. Salesforce announced Agent Fabric with Trusted Agent Identity. AWS rolled out Bedrock AgentCore. ServiceNow's Workflow Data Fabric, AI Control Tower, and the Context Engine they introduced this month are all real engineering investment in exactly this problem. The industry consensus is forming around defense in depth.

This is also why I think the language we still use will look antique in two years. Every vendor is reaching toward the same shape, and every vendor is using a slightly different word for it. Fabric. Control Tower. Perimeter. Gateway. Toolkit. Blueprint. The vocabulary is converging. The architecture has not.

The diagnostic question is not whether vendors have a governance fabric. They all claim one. The question is whether the data layer fails closed when the agent finds an unsanctioned path. Plenty of architectures pass the first test and fail the second.

The session titles are tea leaves

This week several thousand of us are in Las Vegas for Knowledge 2026. I have been at this event ten years in a row. I love the people, I love the customers, and I have watched the platform evolve with genuine respect for the work. This is not a swipe at the show. It is a love letter with a heuristic in it.

Read the session titles like tea leaves before you walk into the rooms. The headline keynote is Welcome to Agentic Business. The technical keynote is The Blueprint for Agentic Business. There is a deep-dive called Agentic Workflows in Action: Smarter Automation on the Now Platform. There is one called Architecting for AI. There is even one called Call My Agent: Demystifying AI Agents.

Each one is telling you something about the architecture before the demo starts. Agentic Workflows in Action tells you the agent lives inside the workflow. Architecting for AI tells you the architecture is still being figured out. Welcome to Agentic Business tells you the marketing is ready before the architecture is. Call My Agent tells you the room thinks of the agent as a teammate to be demystified. None of these are wrong. They are honest little signals of where the team that named the track thinks the agent lives.

This is true at every vendor event you will sit through this quarter. AWS Summit, Dreamforce, Microsoft Ignite, every analyst day, every booth at every conference. Read the titles first. The titles are doing more architectural work than most of the demos that follow them.

The demo is the diagnosis

There is a deeper conversation here about how to evaluate any agent demo against this architectural test, and that one is its own field note coming mid-May. For this week, one heuristic.

When you sit through an AI agent pitch this week, watch where governance shows up and watch what it does. If governance is on a separate slide, in a separate product, or on a separate quarter of the roadmap, the architecture is governance-adjacent. If the demo shows the agent attempting an action it should not be able to take, and the data layer refusing it without the workflow being involved, that is fabric. If the agent is shown doing useful work and you have to trust that governance is happening somewhere off-screen, you are looking at marketing, not architecture.

Watch the medium too. The mainstage is where Marketing tells you what the company wants to be. The floor is where the company shows you what it already is. If the agent is a Figma flow on the mainstage and a Figma flow on the floor, the storyboard is selling itself. Live agents fail in interesting ways. Governance fabric cannot be proven in a clickthrough. That is the whole point of watching one on the floor.

Most demos right now reveal that governance was thought about second. That is the diagnosis, and it is something a Technical Marketer, an SE, an analyst, or a buyer can read in real time without seeing a single architecture diagram.

A note to mentors

I am writing this as someone with two great friends I look at as mentors. They watched the Cursor story land and concluded the answer is more workflow governance, more approval gates, more humans in the loop.

They are extremely smart. They are pattern-matching to the world they spent careers mastering, which is exactly when smart people miss inflection points.

I have done this. I am probably doing it about something else right now. We all are. The trick is being honest about which inflection points we are personally pattern-matching through.

I have spent ten years going to Knowledge. I spent twelve at ServiceNow building exactly the kind of platform governance my mentors still trust. I helped build Zero Trust runbooks and Agentic Agent flows that ship in the product today. I respect the work. I think the work is moving in the right direction. The architectural test still applies. Workflow Data Fabric is a meaningful step. AI Control Tower is a meaningful step. Context Engine is a meaningful step. The question is whether the data layer refuses the agent regardless of the orchestration path the agent took to get there. That is the bar, and it is the bar every vendor will be measured against.

We would not run a new hire on governance later. We should not run agents on it either. The bill is starting to come due. The next postmortem after Cursor's may be written by lawyers, and the one after that by regulators who will not move fast enough to catch the architecture in time.

The companies that figure this out first will not be the ones bolting more governance onto the workflow layer. They will be the ones rebuilding governance into the fabric the agent cannot escape. They may also be the ones who figure out what to call it next.

Anxiety management dressed as governance will not hold an AI engine. The cage and the key cannot keep being sold by the same hand.

If you are walking K26 this week, or any vendor floor this quarter, take the heuristic with you. When the demo ends, ask the SE one question. Solution Engineer, Solution Consultant, or whatever your vendor calls the person who actually has to answer for what was just shown.

Show me the agent attempting something it should not be able to do, and show me where it gets refused.

If the SE can show you refusal, the architecture exists. If the SE has to explain why they cannot show you, the architecture does not exist yet, or it exists somewhere the demo was not allowed to touch. Either way, you now know what you are buying. Or what you are not.

That is the diagnosis. Not the question. The answer.

There is more to this story. Issue 06 drops mid-May. It is the field note about how to actually evaluate an agent demo against this architectural test. What to ask, what to watch, what to refuse to accept as proof. The deeper conversation about whether the demo is a live agent or a stage prop. That one is going to be specific.

This is the seed of an evaluation frame I will keep developing through the year. For now, one heuristic.

Think from the root.

Chad thinkroot.io

A note on the founder voice in this piece

Jer Crane wrote a public postmortem of his own incident within hours of it happening. He named what failed inside his stack, named what failed at his vendor, and refused to blame the model. He also closed with a line that should be on every Technical Marketer's wall this year. We are building so fast these things are going to keep happening. That is the most honest thing anyone has written about agentic infrastructure this quarter. Read it.

x.com/jercrane