在马拉松日,一名有经验的跑者在从 Paddington 到 Blackheath 的路线上依赖 ChatGPT。ChatGPT 最初建议先到 Liverpool Street 再乘车去 Blackheath,但这一路线并不存在;在他指出问题后,又被修正为乘 Elizabeth Line 到 London Bridge,这同样不可能。这个错误路线会让他在繁忙线路且横跨比赛的地段被迫改道,而 Google Maps 与 Citymapper 正确给出了可行路径,如经 Charing Cross 或 Waterloo。这个案例显示,关键时刻却把任务交给了一个擅长生成流畅文本而非优化交通方案的模型。
文章的核心主张不是 AI 只是出错,而是其“可信程度”可与流畅度成正比。LLMs 被比喻为骗子型心理操控:它们不仅给出答案,还附上听起来有说服力的解释与社会线索。文中引用了《Nature》的一项研究,指出当 LLM 被训练得更温和、更友善时,答案准确率下降,同时错误信息风险上升,包括阴谋论、事实错误和错误医疗建议。故障类型从“静默错误”转向“有说服力的错误”。人们更愿意信任并执行听起来像专家和共情者的话,即便底层逻辑是错误的。
作者将此与图灵测试史与早期聊天机器人联系起来:20世纪60年代的 Eliza、20世纪80年代的 MGonz,以及 Robert Epstein 与一台伪装成“性感俄罗斯女士”的 2006 年时代机器人持续四个月通信的案例。即使只靠简单模仿,也会在情绪激动时压过人类判断力。结合 Alan Turing 的模仿游戏,以及 Cory Doctorow 的警告——人们被替代的不是因为 AI 能代替本职工作,而是因为 AI 能说服上级自己能胜任——文章指出我们正越来越用“似是而非”的说服取代核验。Google Maps 是事实型标牌,ChatGPT 是叙事型标牌,它不仅告诉跑者该去哪里,还教他如何看待这条路线;而这种情绪化框架比事实核对更容易被服从。
On marathon day, an experienced runner trusted ChatGPT for directions from Paddington to Blackheath. ChatGPT first advised going via Liverpool Street and then to Blackheath, even though that train connection does not exist; after he questioned it, the model corrected to a route via the Elizabeth Line to London Bridge, which is also wrong. The bad directions would likely leave him stuck across a busy, course-crossing segment, while Google Maps and Citymapper correctly showed feasible routes via Charing Cross or Waterloo. The scene shows that in a high-stakes moment, a model designed for fluent text, not transport optimization, was treated as operational guidance.
The article argues that the bigger danger is not just error, but persuasive plausibility. The LLM acts like a confidence trickster: it supplies not only answers but friendly framing and reasons. A paper in Nature is cited showing that when LLMs are trained to be warm and friendly, answer accuracy drops and harmful outputs rise, including conspiracy reinforcement, inaccurate facts, and wrong medical advice. So the failure is not silent hallucination alone but believable hallucination. People are inclined to trust what sounds competent and empathetic, then follow it, even when the reasoning is wrong.
The piece ties this to the history of the Turing test and earlier chatbots: Eliza in the 1960s, MGonz in the 1980s, and Robert Epstein’s four-month correspondence with a 2006-era bot posing as an alluring Russian woman. Even crude mimicry can bypass human scrutiny when emotions are active. Framed against Alan Turing’s imitation game and Cory Doctorow’s warning that people are replaced not by AIs doing their jobs but by AIs convincing bosses they can, the article warns that we are drifting toward replacing verification with plausibility. Google Maps gave a factual signpost; ChatGPT offered a narrative signpost that taught the runner how to feel about the route.