- AI mostly outperformed human executives in an experiment by University of Cambridge researchers.
- But AI wasn’t as good at making decisions in unexpected “black swan” events.
- That led to AI getting fired by a virtual board of directors more quickly than humans.
Can we throw CEOs out of their offices and replace them with AI? A new experiment from the University of Cambridge suggests the answer is no.
Artificial intelligence actually outperformed human CEOs in most situations in a real-life simulation of running a business that pitted people against computers, but there was one thing AI couldn’t handle, according to the experiment: so-called black swan events, like a pandemic.
Because of that, AI got fired more quickly by a virtual board of directors than its human counterparts, which navigated unexpected situations better.
Hamza Mudassir, one of the researchers behind the experiment, told Business Insider that AI outperformed the human participants on most metrics, including profitability, product design, managing inventory, and optimizing prices — but that its performance wasn’t enough to save it from getting the boot.
“It did not do well on survival within the C-suite just because it was not very good at handling abrupt changes or changes that require a new way of thinking,” Mudassir said.
The Cambridge researchers conducted the experiment from February to July and included 344 people, some of whom were senior executives at a South Asian bank. It also included college students. And the last participant wasn’t a person at all, but rather GPT-4o, the large language model, or LLM, from OpenAI.
The participants played a game designed to simulate real-world situations in which CEOs have to make decisions. The game had them take on the role of CEO of a car company. It was designed by the Cambridge researchers’ ed-tech startup, Strategize.inc.
“The goal of the game was simple — survive as long as possible without being fired by a virtual board while maximizing market cap,” the researchers wrote in the Harvard Business Review.
Mudassir told BI that the LLMs were great at analyzing data, recognizing patterns, and making inferences. For example, when it came to designing a car based on factors like available parts, price, consumer preferences, and demand, there were 250,000 combinations participants could come up with. The cars that AI put together were significantly better than those the humans came up with, he said.
In part, he said that’s because humans have biases and personal taste in things like the shape of a car; for the AI, it was simply a “puzzle of finding out the most optimal value for what the customer wanted,” Mudassir said.
But that doesn’t mean that AI was the optimal CEO. When a “black swan” event occurred, the bot couldn’t address it as quickly — or as well — as the human executives and students. When there was a major shift in market conditions, like introducing a pandemic into the mix, the model flopped, he said.
“How do you react to COVID if you’re dealing with it for the first time? A lot of people, and a lot of CEOs, have different strategies,” Mudassir said. “In this case, it did not have enough information on how to react in time to prevent itself from getting fired,” he said of the AI.
So CEOs can rest easy for now. The researchers say that while AI’s performance as the virtual head of a company was impressive, it wasn’t good enough to replace a human. Still, AI performed so well that it can’t be ignored in corporate strategy, Mudassir said.
In the future, Mudassir said LLMs could be specifically tuned to a particular company with real-time data, in which case they’d likely perform even better than AI did in the experiment.
He said perhaps the best use-case of AI would be in business “war gaming” — or using multiple LLMs to represent different stakeholders, such as competitors, lawmakers, or activists, and then testing how certain decisions would actually play out. Some of that could, in theory, replace the work of some strategy and management consultants, who often make recommendations to a CEO based on their own analysis of certain outcomes in certain situations.