>
>

Not So Smart After All? Study Shows AI Can’t Even Survive a Day at Work

April 28, 2025

A Carnegie Mellon University experiment found AI agents from major tech firms struggled to complete basic tasks in a simulated office, revealing critical gaps in AI's readiness for real-world work environments.

Background:

  • A team from Carnegie Mellon University designed a simulated tech company staffed entirely by AI agents from Google, OpenAI, Meta, and Anthropic to evaluate how well current AI models could perform real-world workplace tasks.
  • The experiment recreated a typical small software firm environment, complete with an internal chat system, websites, an employee handbook, and various job roles like HR manager and CTO assigned to different AI agents.
  • The tasks ranged from analyzing datasets to writing performance reviews, replicating everyday assignments encountered in finance, administration, and engineering.
  • Despite AI's advancements, the performance across the board was disappointingly poor. The best-performing AI managed to complete only 24% of its assigned tasks, while Google's AI agent achieved an 11% success rate. Amazon’s agent recorded the worst result with a 1.7% completion rate.

Why should you pay attention?

  • The findings challenge mainstream narratives that AI is ready to fully replace human workers, especially in dynamic, multi-step tasks requiring common sense, social skills, and adaptability.
  • The experiment suggests that current AI models, even those from leading tech firms, struggle with real-world problem-solving beyond isolated or well-defined tasks.
  • This research offers valuable insights for companies planning to integrate AI into critical workflows, highlighting the limitations of current AI technologies in office and enterprise settings.
  • It adds context to broader labor market discussions, balancing AI disruption fears with a clearer view of practical limitations in today’s systems.

Who said what?

  • Graham Neubig, Carnegie Mellon researcher, told Business Insider:
“While agents may be used to accelerate some portion of the tasks that human workers are doing, they are likely not a replacement for all tasks at the moment.”
  • Mario Nawfal, popular show host, shared the study on X:
“For now, AI isn’t stealing your job — it can’t even survive a normal day at work without causing a disaster.”
“There are gut decisions leaders have to make that depend on many internal and external variables — I don’t think you can program instinct.”
  • @AccumulateCrypt, X user, added:
“Back to the drawing board. Don’t worry, they’ll figure it out eventually.”

Zooming out:

  • The Carnegie Mellon findings align with earlier studies like Sparkline Capital’s 2024 research, which found that AI handles junior-level tasks but struggles with higher-complexity roles.
  • A 2025 report from METR similarly showed that although AI’s multi-step task performance is improving, it still falters on long, complex chains of actions.
  • The study pushes back against more aggressive automation predictions, such as OpenAI’s 2023 report suggesting roles like financial analysts were at high risk of replacement.
  • Overall, the results highlight that while AI continues to evolve rapidly, full autonomy in professional environments remains a distant goal, with human judgment, flexibility, and social intelligence still irreplaceable in many contexts.

Other Related Read/Listens

Opening MetaMask...
Confirm connection in the extension

The current connected wallet does not hold a LARP. To get access to the Meal Deal please connect a wallet which holds a LARP. Alternatively, visit Opensea to purchase one or visit Join the Meal Deal to purchase a subscription

Go to Meal Deal
Table of contents