文章(发表于 2025-12-14 02:00)围绕“元语言能力”展开:LLM 不仅能生成语言,是否也能像人类语言学家那样分析语言。它回应该领域在 2023 年
Beguš、Dąbkowski 与 Rhodes 设计了 4 部分测试,其中 3 部分要求用句法树(源自 1957 年的树形分析传统)解析特制句子,以避免模型靠记忆背题。在递归任务中,他们提供了 30 句含复杂递归(含中心嵌套)的句子;多数模型失败,但 OpenAI 的 o1 能像研究生一样画树、处理多重歧义,并能在原句结构上再“加一层”递归。
在音系任务中,团队新造了 30 个“迷你语言”,每个语言含 40 个造词,要求模型从零推断音系规则。o1 能从数据归纳出带条件的音变规则(如在特定辅音环境下元音转为带气声特征),显示其在未见训练材料上仍可规则化。研究者据此认为:人类语言被视为“独有”的属性正在被持续蚕食,但目前模型仍未提出原创语言学发现;未来提升究竟来自更大算力与数据、还是受训练目标限制,仍是开放问题。
The article (published 2025-12-14 02:00) asks whether large language models have “metalinguistic” skill: not just producing fluent text, but analyzing language like a human expert. It frames this against skepticism highlighted in 2023, including the claim that correct linguistic explanations are too complex to be learned by “marinating in big data,” implying models may use language without truly reasoning about it.
Beguš, Dąbkowski, and Rhodes built a 4-part evaluation, with 3 parts requiring syntactic tree analyses rooted in the 1957 tradition, designed to reduce memorization. In a recursion-focused task they used 30 specially crafted sentences, including difficult center embeddings. Most models failed, but OpenAI’s o1 could diagram structure, resolve multiple ambiguities, and even extend a sentence by adding an extra recursive layer.
They also tested phonology using 30 newly invented “mini-languages,” each containing 40 made-up words, asking models to infer sound rules from scratch. o1 produced conditional generalizations (e.g., environment-triggered vowel changes) despite no prior exposure. The authors argue this steadily erodes properties once treated as uniquely human, while noting today’s models still haven’t produced original linguistic insights and may remain constrained by next-token training objectives.