Michelle Yin 的研究显示,AI 对职业曝露度的判定高度依赖所用模型。她以美国 705 个职业为样本,让 GPT-4、GPT-5、Claude 和 Gemini 依照相同方法评估任务能否被现有消费级 AI 显著加速,结果在「有风险工作占比」上出现巨大分歧:Gemini 低于 15%,Claude 约 50%。
这些差异在白领职业尤其明显。以 economists 为例,OpenAI 2024 研究中的 GPT-4 只判定 10% 曝露,GPT-5 提高到略高于 50%,Claude 则达到 80%。同一套资料与方法,因模型不同而得出完全不同的劳动市场叙事:用原始分数看,AI 对就业是轻微负面;用 Gemini 的判定,则变成轻微正面,且最曝露的工作就业反而成长。
作者因此主张,研究理论性 AI 曝露的实际影响时,应同时使用多个模型的评分;模型之间的分歧本身也可能有资讯价值。文中并提到远端工作研究:一篇新论文指出,entry-level hiring 的大幅下滑,可能比 AI 更能由 remote work 扩张解释,因为其曝露指标基于实际远端或 hybrid 职缺,而非纯理论评估。
Michelle Yin’s study shows that assessments of occupational exposure to AI depend heavily on which model is used. Using all 705 jobs in the US occupational coding scheme, she asked GPT-4, GPT-5, Claude and Gemini to apply the same method: judge whether current consumer-facing AI tools could significantly speed up constituent tasks. The estimated share of jobs at risk varied sharply, from under 15 per cent for Gemini to about 50 per cent for Claude.
The gaps were especially large for white-collar roles. Economists were rated as 10 per cent exposed by GPT-4 in the 2024 OpenAI study, just over 50 per cent by GPT-5, and 80 per cent by Claude. Those model choices can flip the labour-market story: using the original scores, AI appears to have had a weak negative effect on employment; using Gemini, the effect becomes weakly positive, with the most exposed jobs showing employment growth.
The article argues that real-world studies of theoretical AI exposure should be run with several models, since disagreement may reveal model-specific judgments rather than labour-market truth. It also notes a parallel debate on remote work: a new paper suggests the recent fall in entry-level hiring may be better explained by the rise of remote and hybrid jobs than by AI, because its exposure measure is based on actual job ads rather than a human-made theoretical classification.