这篇文章描述了长期用于保存网页、供公众存取的 Internet Archive「Wayback Machine」,正面临来自大型出版商日益增长的反弹;尽管如此,记者仍持续依赖它进行问责报导。USA Today 最近对美国移民及海关执法局(US Immigration and Customs Enforcement)的一项调查,利用归档页面重建拘留统计,并显示在 Trump 政府期间资讯揭露做法如何改变,凸显了这个工具的价值;而就在此时,USA Today Co. 本身也在阻止 Wayback Machine 对其自家网站进行归档。
主要冲突在于,越来越多出版商因担心归档内容可能被 AI 公司撷取而限制存取。Originality AI 表示,现在已有 23 个大型新闻网站封锁 ia_archiverbot,Reddit 也在封锁它;The Guardian 虽不封锁这个爬虫,但透过其 API 和介面限制存取。作为回应,包括 Electronic Frontier Foundation 和 Fight for the Future 在内的倡议团体搜集了超过 100 位记者的签名支持信,签署者包括 Rachel Maddow、Kat Tenbarge、Taylor Lorenz、Laura Flynn 和 Micco Caporale,他们所提到的用途从事实查核、音讯来源搜集,到研究旧粉丝网站与工会组织不等。
更广泛的利害关系非常重大,因为 Internet Archive 已运作 30 年,保存了超过一兆个网页,同时自 2020 年以来也经历了重大的法律争议,包括与音乐出版商达成和解;后者曾就 Great 78s 计划寻求最高 700 million。没有任何可与之相比、同等规模的公众工具,因此持续封锁可能会让早期数位历史更难存取,削弱像 2016 年 Bernie Sanders 文章修订案件那样的监督报导,并减少那些在美国诉讼中经常被引用作为证据的归档页面可用性。Mark Graham 表示,Archive 仍在与 The New York Times 等方面谈判,但警告说,公众网路上日益增加的限制正在伤害社会理解当前事件的能力。
The article describes how the Internet Archive’s Wayback Machine, long used to preserve web pages for public access, is facing a growing backlash from major publishers even as journalists continue relying on it for accountability reporting. A recent USA Today investigation into US Immigration and Customs Enforcement used archived pages to reconstruct detention statistics and show how disclosure practices changed under the Trump administration, highlighting the tool’s value at a moment when USA Today Co. is itself blocking the Wayback Machine from archiving its own sites.
The main conflict is that more publishers are restricting access over fears that archived material could be scraped by AI companies. Originality AI says 23 major news sites are now blocking ia_archiverbot, and Reddit is also blocking it; The Guardian does not block the crawler but limits access through its API and interface. In response, advocacy groups including the Electronic Frontier Foundation and Fight for the Future gathered more than 100 journalist signatures for a support letter, with signers including Rachel Maddow, Kat Tenbarge, Taylor Lorenz, Laura Flynn, and Micco Caporale, who described uses ranging from fact-checking and audio sourcing to research on old fan sites and union organizing.
The broader stakes are high because the Internet Archive has operated for 30 years and has preserved more than a trillion web pages, while also surviving major legal fights since 2020, including a settlement with music publishers that had sought up to $700 million over the Great 78s project. There is no comparable public tool at its scale, so continued blocking could make early digital history harder to access, weaken watchdog reporting like the 2016 Bernie Sanders article revision case, and reduce the availability of archived pages that are frequently cited as evidence in US litigation. Mark Graham says the Archive is still in talks with The New York Times and others, but warns that growing limits on the public web are harming society’s ability to understand current events.