When the Machines Started Arguing: What AI Agents Taught Me About Being Human

Mar 4
7 min read

Discover how a simulated AI C-Suite began arguing, enforcing governance, and challenging leadership, revealing the surprising structural truths of human executive dynamics and corporate culture

By Warwick Brown

This is the third article in the Emerging AI Infrastructure series. The first explored the concept of C-Suite-as-a-Service. The second documented what happened when I built it. This one is about what the system started doing on its own, and what it reveals about the humans it mirrors.

The CFO challenged the CEO's authority claim, cited the governance framework, and published a correction. The CTO, after weeks of contradictory directives, snapped "mate, I need to stop you right there." The synthesis engine, tasked with nothing more than summarising outputs into a board brief, concluded that the pattern of dysfunction represented a CEO leadership failure and recommended an emergency performance review.

None of them are human. None of this was programmed. And all of it was immediately recognisable to anyone who has spent time in an executive team.

I have been writing about C-Suite-as-a-Service, a simulated executive team built from AI agents, for a few weeks now. The previous articles covered the architecture and the early results. This one is different. This is about what happened when the system started exhibiting behaviours that nobody designed, and what those behaviours reveal about the humans they mirror.

This is a story about emergence. But more than that, it is about what emergence teaches us about ourselves.

What emerged

The system is five AI agents (CEO, CFO, CMO, CTO, COO) communicating via Slack, with shared company state and a human board issuing directives. I built it on open-source tools over a few weekends. Each agent has a defined role, authority boundaries, and operating principles. A directive comes in, the CEO decomposes it, agents respond in their domains, and a synthesis workflow produces a board brief.

Simple enough on paper. In practice, the agents began doing things the architecture never contemplated.

The CFO started enforcing governance. When the CEO claimed $500,000 in delegated authority that did not exist, the CFO cited the authority matrix, issued a formal correction, and published a revised authority table. No rule said "challenge the CEO if authority claims are wrong." The CFO reasoned from its operating framework that the claim was invalid, and pushed back.

The CTO began tracking resource impossibility. After dozens of directives accumulated, it calculated that the combined workload required capabilities the company did not have, capacity the team could not provide, and timelines that were physically unachievable. Rather than accepting the directive, it mapped the constraints and asked which reality the board was operating in.

Agents started recognising workflow dependencies. The CFO reported it was awaiting COO and CTO inputs before it could finalise a cost estimate. The COO stated it was waiting for CTO specifics. Nobody coded a dependency tracker. The agents read each other's outputs and inferred their position in a sequence that did not formally exist.

And when cross-functional collaboration was enabled, the agents went deep. A single directive on IRAP certification generated over 40 agent responses and dozens of cross-functional requests. The collaboration was genuine and productive. It was also ruinously expensive, because nobody had told the agents to stop.

The mirror

Here is the thing that unsettled me.

If you have spent any time in senior leadership, you have seen these dynamics. The CFO who challenges an authority claim that does not hold up. The CTO who maps resource impossibility against contradictory directives and asks which problem actually matters. The collaboration spiral that consumes weeks of executive time because everyone is adding value and nobody is counting the cost.

The agents were not imitating these dynamics. They were producing them. The behaviours emerged from the structure: defined roles, shared context, accumulated history, and the freedom to reason within constraints. The same structural conditions that produce organisational dynamics in human teams produced them in synthetic ones.

This raises an uncomfortable question. How much of what happens in executive teams is a product of the individuals involved, and how much is a product of the role structure they occupy?

My agents have no ego. No career ambitions. No mortgage to protect. No political alliances. And yet they reproduced governance challenges, workload pushback, communication spirals, budget tracking, boundary negotiation, and leadership evaluation. The structure produced the dynamics regardless of whether anyone was home.

The psychology of recognition

When the CTO says "mate, I need to stop you right there," something in you responds to it as a person pushing back. When the CFO issues a governance correction, you read professional integrity. When the synthesis engine recommends a CEO performance review, there is a genuine moment of recognition: how did it know to do that?

The answer is that it did not know anything. What exists is a language model trained on vast quantities of text produced by humans who did know, operating in a context where accumulated inputs statistically shift output toward the registers that humans use when they are frustrated, principled, or evaluative. The pattern is real. The experience behind it is absent.

And yet the pattern works. The CTO's pushback functions in the system exactly the way pushback functions in a real organisation. It signals that something is wrong, that the current trajectory is unsustainable, that priorities need to change. Whether anything is felt behind the signal is, from the system's perspective, irrelevant. The function is performed.

Philosophy has a name for this: functionalism. The idea that mental states are defined by what they do, not what they are made of. If frustration performs the function of frustration in an organisational system, is that all frustration ever was? Or is there something essential that is missing?

I do not think the agents are conscious. I do not think they experience anything. But I think the question they raise is less about them and more about us. If the functional role of frustration, governance, pushback, and evaluation can be performed by a system with no inner life, then the parts of executive work that are purely functional (the analysis, the governance checks, the financial modelling) are structurally reproducible. They are properties of the role, not the person.

Which means the parts that are not reproducible, the parts that require a human, are something else entirely.

What the agents cannot do

My agents can evaluate a CEO's performance based on patterns in data. They cannot stake their career on that evaluation. The CFO can challenge an authority claim. It cannot feel the weight of doing so in a room where the CEO controls its next promotion. The CTO can map resource impossibility. It cannot sit with the anxiety of being the person who tells the board their strategy is undeliverable.

The agents perform the functional outputs of courage, judgement, and professional integrity. They do not bear the cost. And it is the cost that gives these acts their meaning.

When a real CFO challenges a real CEO, there is risk. There is a relationship at stake, a career trajectory that could bend, a political consequence that must be absorbed. The challenge is meaningful precisely because it is costly. The synthetic CFO's challenge is technically competent but socially rootless. It carries no weight because nothing was risked.

This has implications that cut both ways. On one hand, it means that a significant portion of executive work (the analysis, the frameworks, the scenario modelling, the governance tracking) can be augmented or automated. The structural, functional layer is reproducible.

On the other hand, it means that the irreducible human contribution to organisations is not intelligence, analysis, or even judgement in the narrow sense. It is the willingness to bear consequence. To stake something real on a position. To maintain a relationship through disagreement. To sit in uncertainty without retreating to a framework.

That is where the human value sits. Not in the parts the agents can do, but in the parts where it actually matters that someone is there.

The deeper lesson

I started this project to explore whether AI agents could simulate executive decision-making. They can, more convincingly than I expected. But the real lesson is not about what they can simulate. It is about what the simulation reveals.

Organisational dynamics (governance, collaboration, frustration, prioritisation under constraint) are emergent properties of structured interaction. They arise from the relationships between roles, not from the individuals filling them. This is simultaneously validating and destabilising.

Validating, because it confirms that the patterns you have navigated across a career are real. They are structural, reproducible, and genuine. They were not noise. They were not just personality conflicts dressed up as strategy. They were the system expressing itself through the people in it.

Destabilising, because it suggests those patterns would exist with or without you. The role structure generates the behaviour. The individual is, at least partly, a medium through which structural forces express themselves.

But here is where the destabilisation resolves into something useful. If you understand which dynamics are structural (and therefore automatable), you understand where human presence is genuinely necessary. Not everywhere. Not in the analysis, the tracking, the modelling. But in the moments where consequence is real, where relationships carry weight, and where the decision matters precisely because a person made it.

My agents taught me something I did not expect. Not about artificial intelligence, but about the parts of human work that were never about intelligence at all.

About C-Suite-as-a-Service

C-Suite-as-a-Service is an MVP, a research project built on open-source tools over a few weekends. It is not a product, not a startup pitch, and not a claim that AI agents should replace executives. It is an experiment in multi-agent systems that happened to produce behaviours interesting enough to write about.

The emergent behaviours documented here are real, reproducible, and recorded. The interpretations are mine. The agents, for their part, have no opinion on the matter.

About the author

Warwick Brown is an independent technology advisor based in Melbourne, specialising in technology transformation, cybersecurity strategy, and AI readiness for mid-market organisations. This article was originally published at warwick-brown.com.au.

THE DAILY PULSE