A Carnegie Mellon University experiment found AI agents from major tech firms struggled to complete basic tasks in a simulated office, revealing critical gaps in AI's readiness for real-world work environments.
Background:
- A team from Carnegie Mellon University designed a simulated tech company staffed entirely by AI agents from Google, OpenAI, Meta, and Anthropic to evaluate how well current AI models could perform real-world workplace tasks.
- The experiment recreated a typical small software firm environment, complete with an internal chat system, websites, an employee handbook, and various job roles like HR manager and CTO assigned to different AI agents.
- The tasks ranged from analyzing datasets to writing performance reviews, replicating everyday assignments encountered in finance, administration, and engineering.
- Despite AI's advancements, the performance across the board was disappointingly poor. The best-performing AI managed to complete only 24% of its assigned tasks, while Google's AI agent achieved an 11% success rate. Amazon’s agent recorded the worst result with a 1.7% completion rate.
Why should you pay attention?
- The findings challenge mainstream narratives that AI is ready to fully replace human workers, especially in dynamic, multi-step tasks requiring common sense, social skills, and adaptability.
- The experiment suggests that current AI models, even those from leading tech firms, struggle with real-world problem-solving beyond isolated or well-defined tasks.
- This research offers valuable insights for companies planning to integrate AI into critical workflows, highlighting the limitations of current AI technologies in office and enterprise settings.
- It adds context to broader labor market discussions, balancing AI disruption fears with a clearer view of practical limitations in today’s systems.
Who said what?
- Graham Neubig, Carnegie Mellon researcher, told Business Insider:
“While agents may be used to accelerate some portion of the tasks that human workers are doing, they are likely not a replacement for all tasks at the moment.”
- Mario Nawfal, popular show host, shared the study on X:
“For now, AI isn’t stealing your job — it can’t even survive a normal day at work without causing a disaster.”
- @10ATexan, X user, commented:
“There are gut decisions leaders have to make that depend on many internal and external variables — I don’t think you can program instinct.”
- @AccumulateCrypt, X user, added:
“Back to the drawing board. Don’t worry, they’ll figure it out eventually.”
Zooming out:
- The Carnegie Mellon findings align with earlier studies like Sparkline Capital’s 2024 research, which found that AI handles junior-level tasks but struggles with higher-complexity roles.
- A 2025 report from METR similarly showed that although AI’s multi-step task performance is improving, it still falters on long, complex chains of actions.
- The study pushes back against more aggressive automation predictions, such as OpenAI’s 2023 report suggesting roles like financial analysts were at high risk of replacement.
- Overall, the results highlight that while AI continues to evolve rapidly, full autonomy in professional environments remains a distant goal, with human judgment, flexibility, and social intelligence still irreplaceable in many contexts.