A landmark field experiment at Alibaba has produced what may be the most rigorous evidence yet on how generative AI reshapes human work — and the findings cut against the prevailing optimism in boardrooms eager to deploy these tools at scale.
The study, conducted across Alibaba’s e-commerce after-sales customer service operations, randomly assigned human agents access to a generative AI assistant capable of diagnosing customer issues and drafting response messages in real time. Agents retained full discretion over whether to use, modify, or ignore the AI’s suggestions. The results reveal a picture that is neither uniformly positive nor negative — it is stratified, and that stratification carries profound implications for how organizations should think about AI deployment.
The headline finding is that generative AI meaningfully improved service speed and customer satisfaction ratings overall, while also reducing communication burden on customers. These gains are real and significant. But they mask a critical divergence: low-performing agents captured the largest improvements across both speed and quality, while top performers saw little speed benefit and actually experienced declines in service quality on both subjective and objective measures.
What explains this counterintuitive result? The evidence points to behavioral adaptation. When given AI assistance, high-performing agents increased multitasking — handling more concurrent conversations and shifting attention across chats more frequently. The AI, in effect, created a false sense of capacity. The consequence was slower individual response times, higher customer abandonment rates, and more customers returning with unresolved issues. The AI did not make top agents worse at their jobs; it changed how they allocated their attention, and that change proved costly.
This finding reframes a core assumption in enterprise AI strategy. Most deployment frameworks treat AI as a uniform productivity lever — install it broadly and capture gains across the workforce. Alibaba’s data suggests that approach is operationally naïve. The same tool can be simultaneously a competency equalizer for junior talent and a behavioral distractor for expert practitioners.
For executives and investors evaluating AI-enabled workforce transformation, the practical implication is clear: deployment strategy must be segmented by skill tier. High performers likely require guardrails against over-reliance and distraction, not just access. The organizations that extract durable value from generative AI will be those disciplined enough to recognize that better tools do not automatically produce better outcomes — human behavioral responses remain the decisive variable.
Source: Raw/trigger-generative-ai-in-action-alibaba-customer-service.md