Not So Smart After All? Study Shows AI Can’t Even Survive a Day at Work

Edyme

April 28, 2025

A Carnegie Mellon University experiment found AI agents from major tech firms struggled to complete basic tasks in a simulated office, revealing critical gaps in AI's readiness for real-world work environments.

Background:

A team from Carnegie Mellon University designed a simulated tech company staffed entirely by AI agents from Google, OpenAI, Meta, and Anthropic to evaluate how well current AI models could perform real-world workplace tasks.
The experiment recreated a typical small software firm environment, complete with an internal chat system, websites, an employee handbook, and various job roles like HR manager and CTO assigned to different AI agents.
The tasks ranged from analyzing datasets to writing performance reviews, replicating everyday assignments encountered in finance, administration, and engineering.
Despite AI's advancements, the performance across the board was disappointingly poor. The best-performing AI managed to complete only 24% of its assigned tasks, while Google's AI agent achieved an 11% success rate. Amazon’s agent recorded the worst result with a 1.7% completion rate.

Why should you pay attention?

The findings challenge mainstream narratives that AI is ready to fully replace human workers, especially in dynamic, multi-step tasks requiring common sense, social skills, and adaptability.
The experiment suggests that current AI models, even those from leading tech firms, struggle with real-world problem-solving beyond isolated or well-defined tasks.
This research offers valuable insights for companies planning to integrate AI into critical workflows, highlighting the limitations of current AI technologies in office and enterprise settings.
It adds context to broader labor market discussions, balancing AI disruption fears with a clearer view of practical limitations in today’s systems.

Who said what?

Graham Neubig, Carnegie Mellon researcher, told Business Insider:

“While agents may be used to accelerate some portion of the tasks that human workers are doing, they are likely not a replacement for all tasks at the moment.”

Mario Nawfal, popular show host, shared the study on X:

“For now, AI isn’t stealing your job — it can’t even survive a normal day at work without causing a disaster.”

@10ATexan, X user, commented:

“There are gut decisions leaders have to make that depend on many internal and external variables — I don’t think you can program instinct.”

@AccumulateCrypt, X user, added:

“Back to the drawing board. Don’t worry, they’ll figure it out eventually.”

Obviously. It is still nearly impossible for a logic engine to manage human sentiments of consumption.
— DDD (@DDD41167410) April 27, 2025

Zooming out:

The Carnegie Mellon findings align with earlier studies like Sparkline Capital’s 2024 research, which found that AI handles junior-level tasks but struggles with higher-complexity roles.
A 2025 report from METR similarly showed that although AI’s multi-step task performance is improving, it still falters on long, complex chains of actions.
The study pushes back against more aggressive automation predictions, such as OpenAI’s 2023 report suggesting roles like financial analysts were at high risk of replacement.
Overall, the results highlight that while AI continues to evolve rapidly, full autonomy in professional environments remains a distant goal, with human judgment, flexibility, and social intelligence still irreplaceable in many contexts.

Not So Smart After All? Study Shows AI Can’t Even Survive a Day at Work

Background:

Why should you pay attention?

Who said what?

Zooming out:

Other Related Read/Listens

Ethereum Researcher Proposes Gradual 100x Gas Limit Increase Over Four Years

Ethereum Researcher Proposes Gradual 100x Gas Limit Increase Over Four Years

Not So Smart After All? Study Shows AI Can’t Even Survive a Day at Work

Bitget to Pursue Legal Action Against 8 Accounts Linked to VOXEL Trading Abnormality

Paradigm Leads $50M Series A in Decentralized AI Startup Nous Research

Prompt Engineering 101: A Beginner’s Guide to Talking to AI

TON Foundation Appoints MoonPay Co-Founder Maximilian Crown as CEO

Peaky Blinders to Launch Blockchain Game With Digital Collectibles in Partnership With Anonymous Labs

Helium Partners With AT&T to Expand U.S. Wi-Fi Coverage Using Solana-Based Network

ZKsync Recovers $5M in Stolen Tokens After Hacker Accepts Bounty Agreement

TRUMP Coin Shoots Up as US President Offers Dinner to Top Buyers

Cantor-Backed Bitcoin Venture ‘Twenty One’ Launches With Goal to Amass 42,000 BTC

Paul Atkins Sworn In as SEC Chair, Vows Principled Approach to Crypto Regulation

Less Than 15% of Crypto Users Trust AI to Manage Their Entire Portfolio, CoinGecko Finds

Galaxy Ditches ETH for SOL in $105M Swap as Ethereum Dominance Crashes

CFTC Opens Public Comment on Perpetual Contracts, Stirring Debate Over U.S. Crypto Regulation

Bybit Reports 27.59% of Stolen Funds From $1.4B Hack Are No Longer Traceable

ZORA Token to Debut on Coinbase’s Base Network With Community-Centric Allocation

Ethereum: Vitalik Buterin Wants to Replace the EVM

Bitget Hit by $100M VOXEL Exploit: Traders Flip Pennies into Millions Before Rollback

After SIMD-228 Flop, Galaxy Proposes MESA Voting Framework to Guide Solana Emissions Decisions

Ethena and Securitize Set Q2 Launch for Converge, Using Arbitrum and Celestia Tech for RWA Integration

Coinbase Enhances Solana Infrastructure, Boosts Transaction Throughput Fivefold

$1.2B in BTC Unstaked From Babylon in One Day, Investors Ask: Who Pulled the Plug?

Coinbase’s Base Endorses Token That Tanks 95% in Hours, Crypto X Users Demand Answers

Solayer Unveils Crypto Debit Card With Onchain Payments, Rewards, and No Bank Needed

Crypto Exchange OKX Announces U.S. Expansion and Migration of OKCoin Users

ZKsync Reports $5M Token Loss From Compromised Airdrop Admin Account

Farting All the Way Up: Fartcoin’s Absurd 170% Rally Amid Recent Market Turmoil

Binance, KuCoin, MEXC Face Temporary Outages as AWS Data Center Glitch Disrupts Crypto Services

Visa Reportedly Joins Stablecoin Consortium USDG Alongside Paxos and Robinhood

Metaplanet and Saylor’s Strategy Double Down With Over $300M in New BTC Buys

MANTRA’s $OM Token Tanks 90% Sparking Insider Allegations, What Went Wrong?

Curve Founder Michael Egorov Extends veCRV Lock-Up to Full Four Years

Ripple and SEC Jointly Request 60-Day Pause to Pursue Settlement Agreement

Ross Ulbricht to Make First Public Appearance at Bitcoin 2025 After Presidential Pardon

Cosmos Unveils Eureka to Support Native Asset Transfers Between Ethereum and IBC Chains

'The Tariff Stuff Will Be a Dud'—Cardano Founder Predicts Bitcoin to Hit $250K This Year

Confirmed: Paul Atkins Takes Over SEC, Ushering in Trump’s Pro-Crypto Era

Ethereum Whale Roundtrips $27M Gain After 900 Days — Exits With Just $2.75M Profit

Melania Meme Meltdown: Team Allegedly Cash Out $30M, Token Tanks 96%

Solana Goes Private: New ZK Tools Let You Hide Transfers Without Breaking the Rules

DOJ Shuts Down Crypto Crime Unit, Slams Biden's Strategy as "Reckless"

Arbitrum DAO Governance Faces Scrutiny Following Vote Delegation via LobbyFi

First-Ever XRP ETF Debuts in the U.S. Today, But It’s Not What You Think

Former Binance CEO Changpeng Zhao Appointed Advisor to Pakistan Crypto Council

Mantra Launches $108M Ecosystem Fund to Power RWA Tokenization and Global DeFi Innovation

Tariff Turmoil Triggers Crypto Crash—Here’s What Analysts Are Saying About the Bloodbath

Babylon Users Unstake $21M in Bitcoin Following BABY Token Airdrop

Coinbase Files with CFTC to Launch XRP Futures Trading This Month

Cardano Launches Veridian, A Quantum-Proof Identity Platform

Solana and Chainlink Now Available for Buy, Sell, and Transfer on PayPal in U.S

Standard Chartered Forecasts AVAX to Reach $250 by 2029 in New Digital Asset Outlook

Crypto Truce Incoming? SEC and Gemini Explore Ending Legal Clash Over Earn Program

Circle Files for IPO in Bid to Go Public, Eyes $5 Billion Valuation

Tether Adds 8,888 Bitcoin in Q1, Bringing Total Holdings to Over $7 Billion

Binance, Wintermute, and the Memecoin Bloodbath No One Saw Coming

Thief Becomes the Victim: zkLend Hacker Loses Entire $5.4M ETH Haul to Fake Tornado Cash

TRUMP Token Holders Brace for April Bloodbath as $409M Unlock Looms

Japan’s Metaplanet Doubles Down on Bitcoin with $13.3M Bet Amid Market Dip

Japanese Regulator Prepares Bill to Bring Crypto Under Finance Law

France Embraces Crypto: State Bank Unveils $27M Web3 Investment Strategy

Sei Foundation Plots Historic DeSci Move with 23andMe Acquisition Plan

GHIBLI Takes Off: Breaking Down the Viral Ghibli-Fief Trend

GameStop Goes Full Saylor: Plans $1.3B Bitcoin Buy With Borrowed Billions

Robinhood Unveils AI-Powered Trading Advice, Cash Delivery, and Wealth Tools in Major Expansion

Whale Plays Hyperliquid: $JELLY Pump Sparks $10M+ Loss, Binance Joins the Battle

Celo Ditches Layer 1 Status to Join Ethereum’s Layer 2 Powerhouse

Hyperliquid Launches Mainnet Integration Between HyperCore and HyperEVM

Movement Network Launches $38M Buyback After Unapproved Token Sales by Market Maker

Proof-of-Liquidity To Go Live on Berachain, Marking New Phase in Governance Rollout

Tornado Cash Ported to MegaETH Testnet Following US Treasury Sanctions Reversal

Tornado Cash Freed: Treasury Reverses 3-Year Ban on Crypto Mixer

What's Cooking in the Trenches? Whitepaper Copy-Pasta