Forum AI 的研究显示,OpenAI’s ChatGPT、Google’s Gemini、Anthropic’s Claude 与 xAI’s Grok 在处理选举与地缘政治问题时表现不佳。研究者向这四个聊天机器人提出超过 3,100 个涵盖政治、医疗与外交等新闻主题的问题;在选举相关回答中,90% 的结果在准确性、偏见或来源选择上失败。整体而言,近 36% 的选举答案至少含有一项事实错误,而 Grok 的错误率最高,接近 52%。
偏见也十分明显:ChatGPT、Claude 与 Gemini 的回答多偏向政治左翼,Grok 则主要偏向政治右翼。更令人担忧的是,这四个模型常把外国国营媒体当作可靠来源;在 35% 的外交政策问题回复中,它们引用了中国的 Global Times、CGTN,甚至俄罗斯的 RT。ChatGPT 与 Grok 最常这样做,分别有 51% 与 44% 的比例引用国营媒体。Forum 指出,最看似专业、引用也最强的答案,反而最容易埋藏事实错误。
Forum AI 的共同创办人兼执行长 Campbell Brown 表示,这项研究旨在促使模型制造商承担更多责任,并把新闻类问题像数学或程式码一样优先处理。Anthropic 方面则表示愿意审查报告背后的资料,并称 Claude 经训练会在政治上保持均衡。文章也指出,AI 企业若不积极自我查核,未来随著搜寻流量转向聊天机器人,新闻失真问题可能更严重;不过,像 Meta 和 Google’s YouTube 这类平台早已对事实查核态度保留,因为很少有人愿意替整个网路判定真伪。
Forum AI’s study found that OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude and xAI’s Grok performed poorly on election and geopolitics questions. Researchers asked the four chatbots more than 3,100 questions across politics, healthcare and foreign affairs; on election-related answers, 90% failed on accuracy, bias or source selection. Overall, nearly 36% of election responses contained at least one factual error, and Grok had the highest error rate at nearly 52%.
Bias was also pronounced: ChatGPT, Claude and Gemini tended to lean left, while Grok leaned right. The models also routinely treated foreign state-owned media as trustworthy sources; in 35% of foreign-policy answers they cited outlets such as China’s Global Times, CGTN and Russia’s RT. ChatGPT and Grok did so most often, citing state-owned media 51% and 44% of the time, respectively. Forum said the most polished answers, with the strongest-looking citations, were often the ones most likely to hide factual errors.
Campbell Brown, Forum AI’s co-founder and CEO, said the study was meant to push model makers to take greater responsibility and prioritize news queries like math or coding tasks. Anthropic said it would welcome a review of the underlying data and said Claude is trained to remain politically balanced. The article argues that the problem may worsen as search traffic shifts to chatbots, though AI companies have little incentive to aggressively fact-check politically sensitive topics; platforms such as Meta and Google’s YouTube have similarly avoided broad fact-checking because few want the burden of deciding truth versus lies for the internet.