文章提出一个问题:AI是否真的在知识工作中提升了生产力。它指出,经过3.5年的生成式AI和1年的agentic AI之后,关于实际收益仍然几乎没有硬性数据。文章回顾了 METR 先前的发现:软体工程师觉得AI让他们快了20%,但精确测量显示他们其实慢了20%。之后,METR对350名技术型知识工作者进行了新的调查,包括软体工程师、研究人员和管理者,借此测试:与其询问速度,不如询问增加的价值,是否能更好估计AI的贡献。
METR没有问任务完成得多快,而是问在有AI与没有AI的情况下,工作者能提供多少价值,包括替代招聘、在没有AI时交付同等价值工作所需的时间,以及如果AI消失,当前价值中有多少比例仍可实现。结果估计低于以速度为基础的回答:约2x,而不是3x;其中最保守的问题得出1.6x,约等于价值增加60%。研究人员和文章都强调,这些数字很可能只是上限,因为部分AI辅助产出被认为不太可能像受访者声称的那样有价值,而且以编码为主的职位相较许多其他工作,对AI的暴露程度异常高。
Sarah认为,在企业内部或整体经济中,个人层面的生产力不是合适的衡量指标,因为组织是相互依存的系统,一个阶段产出更多,可能会在后续造成瓶颈、返工或更低品质。Faros根据22,000名开发者的 telemetry 所做的报告发现,程式码更多、启动的专案更多,但在审查、测试和完成阶段的进度更慢,而且在每个交接点花费的时间大幅上升;它还表示,与低AI采用基准相比,事故数量增加了三倍,且进入生产环境的程式码未达到过去标准。文章将此与 Google Cloud 的 DORA 研究结果对比,后者显示高绩效者更能驾驭AI,但文章认为更广泛的教训是:个人层面的收益未必会转化为组织价值,若品质与工作流程成本抵消速度增益,甚至可能降低价值。
The article asks whether AI is really boosting productivity in knowledge work, noting that after 3.5 years of generative AI and 1 year of agentic AI there is still little hard data on actual gains. It revisits METR’s earlier finding that software engineers felt AI made them 20% faster, yet precise measurement showed they were 20% slower. METR then ran a new survey of 350 technical knowledge workers, including software engineers, researchers, and managers, to test whether asking about added value rather than speed produces a better estimate of AI’s contribution.
Instead of asking how quickly tasks were completed, METR asked how much value workers could deliver with and without AI, including questions about replacement hiring, the time needed to deliver equally valuable work without AI, and the fraction of current value possible if AI disappeared. The resulting estimates were lower than the speed-based answers: about 2x rather than 3x, with the most conservative question yielding 1.6x, or roughly a 60% increase in value. The researchers and the article both stress that these figures are likely upper bounds, because some AI-assisted output was judged unlikely to be as valuable as respondents claimed, and because coding-heavy roles are unusually exposed to AI compared with many other jobs.
Sarah argues that individual-level productivity is not the right metric inside firms or across economies, because organisations are interdependent systems where more output from one stage can create bottlenecks, rework, or lower quality downstream. A Faros report based on telemetry from 22,000 developers found more code, more projects started, but slower progress through review, testing, and done stages, with time spent at each handoff rising substantially; it also said incidents had tripled relative to the low-AI-adoption baseline and that code entering production was not meeting former standards. The article contrasts this with Google Cloud’s DORA findings that high performers handled AI better, but argues the broader lesson is that gains at the individual level may not translate into organizational value, and can even reduce it if quality and workflow costs outweigh speed gains.