How AI Agents Actually Work in Production

The most important finding from the largest systematic study of production AI agents is also the most counterintuitive: organizations succeeding with AI agents are not building the autonomous, self-directing systems that dominate research headlines. They are building deliberately constrained ones.

The MAP study, conducted by researchers at UC Berkeley, Stanford, UIUC, and IBM Research, surveyed 306 practitioners and conducted 20 in-depth case studies across 26 industries. Its conclusions should recalibrate how executives think about AI agent strategy and how investors assess the space.

The headline numbers tell a clear story. Sixty-eight percent of production agents execute at most 10 steps before requiring human intervention. Seventy percent rely on prompting off-the-shelf models rather than fine-tuning weights. Seventy-four percent depend primarily on human evaluation rather than automated testing. These are not the characteristics of organizations timidly dabbling in AI. These are deliberate architectural choices made by teams that have learned, often expensively, what actually works under real operational conditions.

The primary motivation driving deployment is productivity—73% of practitioners cite increased speed of task completion as their core rationale. Reducing human task-hours follows at 64%, while risk mitigation and faster failure response rank near the bottom. Organizations are deploying agents as productivity amplifiers, not as risk management tools or operational safety nets. This has significant implications for where the measurable ROI actually lives in agent deployments.

Finance and banking dominate adoption at 39%, followed by technology at 25% and corporate services at 23%. The concentration in finance is notable: these are environments with high transaction volumes, well-defined task structures, and strong compliance pressures—precisely the conditions where constrained, auditable agent behavior is not a limitation but a regulatory necessity.

Reliability remains the defining challenge, driven by the fundamental difficulty of verifying agent correctness at scale. This is where the research-to-production gap is most acute. Academic benchmarks optimize for capability; production environments demand consistency.

The strategic implication for organizations is direct: the architecture that wins in research demonstrations will not be the architecture that survives enterprise deployment. Controllability is not a compromise on ambition. It is the precondition for sustainable value creation.

Source: Raw/trigger-measuring-agents-in-production.md

De belangrijkste bevinding uit de grootste systematische studie naar productie-AI-agents is ook de meest tegendraadse: organisaties die succesvol zijn met AI agents bouwen niet de autonome, zichzelf sturende systemen die de onderzoekstitels domineren. Ze bouwen bewust ingeperkte systemen.

Het MAP-onderzoek, uitgevoerd door onderzoekers van UC Berkeley, Stanford, UIUC en IBM Research, ondervroeg 306 practitioners en voerde 20 diepgaande casestudies uit in 26 sectoren. De conclusies zouden moeten bijstellen hoe bestuurders denken over AI-agentstrategie en hoe investeerders de ruimte beoordelen.

De kerngetallen vertellen een helder verhaal. Achtenzestig procent van de productie-agents voert maximaal 10 stappen uit voordat menselijke tussenkomst nodig is. Zeventig procent vertrouwt op het prompting van kant-en-klare modellen in plaats van het fine-tunen van gewichten. Vierenzeventig procent is primair afhankelijk van menselijke evaluatie in plaats van geautomatiseerd testen. Dit zijn niet de kenmerken van organisaties die aarzelend experimenteren met AI. Het zijn bewuste architectuurkeuzes van teams die — vaak op harde wijze — hebben geleerd wat daadwerkelijk werkt onder reële operationele omstandigheden.

De primaire motivatie voor inzet is productiviteit — 73% van de practitioners noemt hogere taakafhandelingssnelheid als kernreden. Vermindering van menselijke taaklast volgt op 64%, terwijl risicobeheersing en snellere storingsrespons onderaan de ranglijst staan. Organisaties zetten agents in als productiviteitsversterkende instrumenten, niet als risicobeheer- of operationele vangnettools. Dit heeft significante implicaties voor waar de meetbare ROI bij agent-implementaties daadwerkelijk ligt.

Financiën en bankwezen domineren de adoptie met 39%, gevolgd door technologie op 25% en zakelijke dienstverlening op 23%. De concentratie in financiën is opvallend: dit zijn omgevingen met hoge transactievolumes, goed gedefinieerde taakstructuren en sterke compliancedruk — precies de omstandigheden waarin ingeperkt, auditeerbaar agentgedrag geen beperking is, maar een regelgevende noodzaak.

Betrouwbaarheid blijft de bepalende uitdaging, aangedreven door de fundamentele moeilijkheid van het op schaal verifiëren van de correctheid van agents. Dit is waar de kloof tussen onderzoek en productie het scherpst is. Academische benchmarks optimaliseren voor capaciteit; productieomgevingen eisen consistentie.

De strategische implicatie voor organisaties is rechtstreeks: de architectuur die in onderzoeksdemonstraties wint, is niet de architectuur die enterprise-inzet overleeft. Controleerbaarheid is geen compromis op ambitie. Het is de voorwaarde voor duurzame waardecreatie.

Bron: Raw/trigger-measuring-agents-in-production.md

How AI Agents Actually Work in Production Hoe AI Agents in de Praktijk Echt Werken