← 返回 Avalaches

文章指出,过去大型科技公司以「免费」服务换取个资,并曾因大规模抓取公共网路资料而引发争议;如今生成式 AI 从纯文字聊天走向可自主行动的「代理/助理」,下一波资料需求将更深入且更私密,因为要完成任务往往得被授权存取你的系统与资料。

作者强调代理常需作业系统层级或装置层级权限,才能代你浏览网页、订机票、做研究、把商品加入购物车,甚至把「数十个」步骤串成一个任务;要做到行程与待办管理,就必须读取行事历、讯息、Email 等。文中举例包含企业代理可读程式码、资料库、Slack、云端硬碟档案;以及 Windows Recall 这类功能会「每隔几秒」截图,让使用者可回溯搜寻装置上做过的事;交友平台也推出能扫描手机照片来推测兴趣与性格的功能。

回顾数据与趋势,深度学习在 2010 年代初期证明「资料越多效果越好」后,产业竞逐加剧:脸部辨识公司曾从网路抓取「数百万」张人脸照片,Google 也曾以「5 美元」酬劳换取脸部扫描。当可抓取的公开网路资料逐渐被「用尽」,公司转而把训练与产品预设建立在使用者资料上(多为「预设同意、再自行退出」)。欧洲资料监管单位委托研究列出代理相关风险:敏感资料外泄、被拦截或误用、未设防地传到外部系统、以及与隐私法规冲突;再加上提示注入可诱发泄漏、代理扩权会牵连通讯录与往来对象的非自愿资料,形成新的资安与隐私压力,作者呼吁审慎衡量个资交换的「quid pro quo」。

The article argues that Big Tech’s earlier controversy—scraping large portions of the public web to build LLMs—is giving way to a more private data grab as AI “agents” replace simple text chatbots. Over the past two years, systems like ChatGPT and Gemini have shifted toward assistants that act on users’ behalf, which typically requires broader access to personal systems and data than prior tools.

Agents are framed as partially autonomous LLM-based systems that may need operating-system level permissions to function. They can browse the web, book flights, conduct research, and add items to carts, sometimes across dozens of steps; to manage schedules and tasks they may need calendars, messages, and email. Examples include business-oriented agents that can read code, databases, Slack, and cloud files, and Windows Recall-style features that capture screenshots every few seconds for searchable device history; a dating app feature is cited that searches photos to infer interests and personality.

The piece ties today’s push to a long-running “more data, better results” trend since early-2010s ML breakthroughs: facial recognition firms scraped millions of images, and Google reportedly paid $5 for facial scans. As public-web data becomes exhausted, companies increasingly default to using user data with opt-out rather than opt-in. A regulator-commissioned European study lists agent risks (leakage, interception, misuse, unsafeguarded transfers, and regulatory conflicts), while prompt-injection attacks and deep device access expand the blast radius to third parties whose data is incidentally exposed.

2025-12-25 (Thursday) · 280499c646fcbb2db5c0ed4befb7ac3ac8ee2a07