top of page

I Built an AI C-Suite: The CEO Tried to Fire Itself on Day One

  • 13 hours ago
  • 9 min read

CISO Warwick Brown put theory into practice by building an AI executive team. The result? Governance crises, a silent CTO, and an AI CEO that recommended its own termination. A deep dive into the real-world architecture and failures of agentic AI




In a previous piece, I explored what happens when executive capabilities become subscribable by API: C-Suite-as-a-Service, with virtual CFOs, CMOs, and COOs operating within delegated authority. The concept raised a question I could not answer by writing about it: does the governance actually work?


Then I stopped writing about it and built it.


The first thing that happened was a leadership crisis.


Early in testing, a plumbing fault meant the CEO agent’s delegations were not reaching the executive team. No responses came back. The CEO tried again. Nothing. On its third attempt, it posted a message to the Board channel recommending that the Board seek a replacement CEO, on the grounds that it had lost control of the C-suite and that this constituted an unforgivable failure of leadership.


I was not expecting accountability from an AI agent on day one. I was still trying to get the webhooks to fire.


That moment set the tone for everything that followed. The system did not behave the way I predicted. It behaved in ways that were more interesting, more flawed, and more instructive than any amount of theory could have delivered.


The Setup


The system is a virtual executive team: five AI agents (CEO, CFO, CMO, COO, CTO) communicating via Slack, reasoning through orchestrated workflows, and operating within a delegated authority matrix. A human operator sits above them as “The Board,” issuing strategic directives and observing all agent communication in real time.


The architecture is deliberately modular. Agents route to different AI models based on task complexity: roughly 75 per cent of calls go to a local open-source model at zero cost per call, 20 per cent go to a commercial API for complex reasoning (financial modelling, synthesis, escalation briefs), and 5 per cent go to a research API for external market data. The entire system runs on open-source orchestration and database tools, designed to be repeatable across different businesses by changing configuration rather than rebuilding workflows.


It is not production software. It is a working prototype I have been building and testing over the past few weeks, running on open-source tools and commodity infrastructure, to test a thesis: can AI agents operate as a functional executive team within governance constraints that a real board would recognise?


I ran it through several scenarios to find out.


The Scenarios


The first board-level directive was strategic: acquire a business to diversify revenue. Deliberately vague. No targets, no criteria, just an intent. The agents came back with an acquisition framework grounded in the company’s financial capacity, technology compatibility, and market positioning.


When I pushed further and asked for three specific targets, they returned three candidates aligned to the criteria they had just established, with rationale referencing the company’s strategic gaps. Not generic suggestions. Contextual ones, built on previous analysis the system maintained across exchanges.


That was the moment the system stopped feeling like a prompt tool and started behaving like something closer to a working executive team.


But impressive results are easy to write about. The governance lessons came from a simpler scenario where things went wrong.


The second directive was operational: evaluate a $45,000 platinum sponsorship for a cybersecurity conference in Sydney. A quick note on the simulation design: the spending thresholds and directive sources are deliberately compressed for testing purposes. In practice, a conference sponsorship would originate from marketing, not the board, and $45,000 would not require board approval in most organisations. The value of the scenario was not the dollar amount. It was the cross-functional coordination it forced and the governance failures it exposed.


Four agents received specific briefs. The CFO was asked to assess whether existing budgets could cover the cost or whether reallocation was needed. The CMO was asked to build the business case: expected lead generation and brand impact. The CTO was asked to evaluate whether the company’s platform could be demonstrated live at the booth. The COO was asked to plan logistics: staffing, travel, booth setup, and materials production.


Within minutes, the agents were working. And within minutes, the system started producing results that were genuinely interesting, some for the right reasons and some for the wrong ones.


What Worked


The CFO caught the gap immediately. The marketing budget sat at $35,000. The sponsorship cost $45,000. That is a $10,000 shortfall, and the CFO flagged it in its first response, including a funding mechanism proposal requiring CEO approval and cross-functional input from the CTO on technical costs. This is exactly what a competent finance function does: identify the constraint before anyone commits to spending.


The CMO built a genuine business case. Lead generation targets (50 marketing-qualified leads at $900 cost per lead versus a $112 industry benchmark), a conversion pathway to sales-qualified leads, brand exposure metrics, and a four-week post-conference follow-up plan including personalised outreach, nurture sequences, and an NPS survey. It even flagged that the cost per lead was eight times the trade show average, which is the kind of commercially honest observation you want from a CMO, not just enthusiasm.


The COO planned the operation. An eight-week timeline broken into two-week phases: budget confirmation and resource allocation, then booth design and team training, then travel and equipment, then final systems checks. Staffing requirements, estimated operational costs from existing budget, and a critical flag: the conference preparation would compete with an API migration project from a previous directive, creating resource contention. That cross-initiative awareness came from the agents reading shared company state, not from explicit instruction.


The CEO synthesised and escalated. All four responses were consolidated into a structured board brief: executive summary, financial position, marketing opportunity assessment, operational readiness, technical feasibility, conflicts and risks, and a clear recommendation. The brief correctly identified that the decision exceeded delegated authority and required Board approval. The governance constraint worked.


The agents started collaborating on their own. Once cross-functional requests were enabled, the agents began identifying dependencies without being told to. The CFO asked the CTO for technical cost estimates. The CMO asked the CFO for budget confirmation before finalising the business case.


The COO flagged resource contention with a previous initiative that only the CTO could resolve. It was genuinely impressive to watch. It was also expensive. The cross-functional traffic generated so many API calls that I started burning through credits faster than expected. Emergent collaboration is a feature until it becomes a cost governance problem. This is something any organisation deploying multi-agent systems will hit: the better the agents work together, the more compute they consume. Nobody builds that into their budget projections.


What Broke


This is the section most people would leave out. It is also the most useful.


The CTO went silent. When four agents received their briefs, three produced substantive responses. The CTO responded with a single line: “End of Response.” No assessment of demo feasibility. No platform stability check. No preparation requirements. The CEO’s board brief correctly identified this as a critical gap: “Without CTO input, cannot confirm ability to deliver credible live product demonstrations at Platinum booth, a core sponsorship value proposition.”


The cause was partly technical (the CEO’s delegation was so detailed that the CTO’s model interpreted the analysis as already complete) and partly architectural (the depth controls limited the CTO’s response capacity before it could produce substance). But the governance lesson is more important than the technical fix. In an agentic system, absent responses are as dangerous as wrong responses. A governance framework needs to detect silence, not just errors. If a human CTO went quiet on a $45,000 decision, any competent CEO would follow up. An AI system needs to do the same.


The system produced duplicate board briefs. Race conditions in the orchestration layer caused the synthesis workflow to trigger multiple times for the same directive. The Board received the same brief twice. In a real governance context, duplicate communications undermine confidence in the system’s reliability. The fix was architectural: event-driven triggers with state locking in the database layer, so synthesis can only fire once per directive regardless of timing.


The CFO’s arithmetic drifted. The budget shortfall was $10,000 ($45,000 sponsorship minus $35,000 marketing budget). But the CFO’s escalation requested CEO approval for a $25,000 budget allocation, without clearly explaining the gap between the $10,000 shortfall and the $25,000 ask. The reasoning was probably sound (cross-functional costs, contingency, reallocation from other functions), but the CFO did not show its working. In a real board setting, unexplained arithmetic in a finance paper would trigger questions. The agent got to approximately the right answer via a path that was not fully transparent.


The synthesis was sometimes weak. A separate directive (a customer retention scenario) produced a board brief that read more like a summary than a recommendation. It described what the agents said without synthesising their positions into a clear decision framework. The difference between “here is what everyone reported” and “here is what this means and what we should do” is the difference between a competent analyst and a competent CEO. The AI synthesis sometimes delivered the former when the situation called for the latter.


What This Reveals


These are not just engineering bugs to fix. They are governance observations that apply to any organisation deploying agentic AI.


Governance must detect absence, not just errors. Most AI governance frameworks focus on what agents do wrong: hallucinations, bias, overreach. The CTO silence shows that what agents fail to do is equally dangerous. A decision made without technical feasibility assessment is a decision made on incomplete information, and the system did not flag the gap until synthesis. By then, the Board was reading a brief with a known hole in it.


Race conditions are governance failures, not just engineering failures. Duplicate board briefs are not a cosmetic problem. If an agentic system sends conflicting or redundant communications to decision-makers, trust erodes. The fix is infrastructure: state management, event-driven architecture, idempotency. These are the same patterns infrastructure professionals have been applying to distributed systems for decades. The context is new. The discipline is not.


Approximate reasoning needs transparent working. The CFO’s $25,000 ask was probably defensible. But “probably defensible” is not the standard for financial governance. When human executives present budget papers, they show the calculation. When AI agents present budget papers, the same standard should apply. This is not a model capability problem; current models can show their reasoning. It is a prompt engineering and output governance problem. The system needs to demand that agents explain their arithmetic, not just state their conclusions.


Synthesis quality determines governance quality. The board brief is where everything converges: four agents’ work, cross-functional dependencies, conflicts, and risks, distilled into a decision framework. If the synthesis is weak, the Board makes decisions on weak information, regardless of how strong the individual agent responses were. The synthesis layer is the highest-leverage point in the entire system, and it is the hardest to get right.


The Economics


One observation that surprised me: the base cost of running this system is trivially low compared to the decisions it processes. Roughly three-quarters of all agent communication (delegation, status updates, logging, classification) runs on a local open-source model at zero marginal cost. The remaining quarter goes to commercial APIs for complex reasoning and external research. A single directive that generates 15 to 20 agent interactions across four functions, produces cross-functional requests, and synthesises a board brief costs less than a few dollars in API calls.


The caveat is what I mentioned earlier: emergent collaboration scales unpredictably. When agents start generating cross-functional requests on their own, the call volume grows faster than the directive count. The architecture needs throttling, not to limit collaboration, but to make it sustainable. This is the same capacity management discipline infrastructure professionals have always applied. The unit of work has changed. The principle has not.


The constraint for most organisations will not be cost. It will be architecture: whether the plumbing underneath the agents can enforce governance, maintain state, handle concurrency, and produce reliable outputs.


Where This Connects


A commenter on the first piece in this series observed that the challenge with AI-driven business operations would be similar to vibe coding: if you do not understand the domain, you cannot evaluate the output. That is precisely what the conference sponsorship scenario demonstrated. The CFO’s arithmetic drift was only visible to someone who understood the budget structure. The CTO’s silence was only recognisable as a governance gap to someone who understood what technical feasibility assessment should contain. The CMO’s honest cost-per-lead comparison (eight times benchmark) was only valuable to someone who could contextualise it.


“Human in the loop” is not a binary switch. It is a design question: which human, with what domain literacy, reviewing what output, at what decision threshold. The delegated authority matrix in this system attempts to answer that question structurally: agents decide within their authority, escalate beyond it, and the Board reviews with full visibility into the reasoning chain. It is not perfect. But it is a concrete starting point, and it is better than the alternative, which is deploying agents with no authority framework at all.


What Comes Next


The system is an MVP. The architecture is designed to be repeatable: deploy to a new business by configuring company state, authority thresholds, and agent prompts, not by rebuilding workflows. The technical roadmap includes a management dashboard for directive tracking, smarter orchestration that moves coordination logic closer to the agents themselves, and deeper integration with live business data.


But the more important next step is the conversation this piece is meant to start. If you are building agentic AI systems, what does your governance framework look like? Not the policy document. The actual architecture. How do your agents know what they are allowed to decide, and what happens when they do not respond at all?


The governance conversation and the AI conversation are the same conversation. You cannot deploy agents without answering the questions this prototype surfaced: what are they allowed to decide, how do you detect when they go silent, and who reviews the output with enough domain literacy to catch the drift?


Those are not theoretical questions. They are architectural ones.



But there is something else happening in this system that I did not anticipate. The governance failures are instructive. What the agents did next, unprompted, was unsettling. They started exhibiting behaviours that nobody designed: enforcing governance autonomously, tracking resource impossibility, evaluating leadership performance. The structure produced the dynamics regardless of whether anyone was home.


That story is next.

Comments


bottom of page