Amazon 表示,它已经在数据中心网络设计上取得重大突破,并自去年底起悄然部署于自家云基础设施中。该技术围绕一种“quasi-random(准随机)”架构展开,把传统结构化网络与更随机的拓扑优势结合起来;公司认为这可显著提升数据传输速度并降低能耗,从而在云端更快系统竞赛中获得优势。研究团队自 2023 年起推进这一问题,并开发了名为 ShuffleBox 的新设备,用于自动整理此类网络所需的电缆。
Amazon 的新方案名为 RNG(resilient network graphs,韧性网络图),并以平坦、可扩展且具故障韧性的网络为目标,而不是依赖自 1980 年代中期以来主导行业的 fat-tree 拓扑。Amazon 认为,AI 训练数据模式过于协调、中央化,并不适合随机图,因此 RNG 更适合日常数据中心架构优化。公司称其全球数据中心目前连接着 2,000 万公里(20 million kilometers)的光纤,缆线成本极高;相比传统网络,新设计可减少 69% 的路由器和交换机,提升 33% 的数据吞吐量,降低 40% 的网络功耗,并把运营成本压低 27%。
这一思路与早期研究和业界探索相呼应:2012 年,University of Illinois Urbana-Champaign 的研究者提出 Jellyfish,主张用随机图拓扑提高扩展性;随后 Google 又尝试把 optical circuit switching(OCS)引入网络,以便实时重配光缆,但这也增加了复杂度与成本。Amazon 则通过更“quasi-random”的方法,在混合结构、随机性与简化布线之间找到平衡。公司称 RNG 最早于 2024 年在 Dublin 数据中心上线,随后扩展到 Germany 和 Spain,而如今多数新建数据中心都已配备这一协议。
Amazon says it has made a major breakthrough in data center networking and has been quietly deploying the technology in its cloud infrastructure since late last year. The system uses a “quasi-random” design that blends structured networking with the performance benefits of more random topologies, with Amazon claiming faster data movement and lower energy use. The effort has been under development since 2023, and Amazon also built a new device called the ShuffleBox to automatically organize the cabling required for the architecture.
The design is called RNG, short for resilient network graphs, and is meant to create a flatter, more scalable, and more fault-resilient network than the fat-tree topology that has dominated communications systems since the mid-1980s. Amazon says RNG is aimed at ordinary data center efficiency rather than AI training, because AI traffic is too coordinated to resemble a random graph. The company says its global data centers are linked by 20 million kilometers of fiber-optic cable, and it claims the new design uses 69% fewer routers and switches, provides 33% higher throughput, cuts network power consumption by 40%, and reduces operating costs by 27%.
The approach echoes earlier work in the field: in 2012, University of Illinois Urbana-Champaign researchers proposed Jellyfish, a random-graph network intended to improve scalability, while Google later experimented with optical circuit switching to reconfigure optical cabling in real time, adding complexity and cost. Amazon argues that its quasi-random method balances structure and randomness while simplifying cabling. The company says RNG first went live in a Dublin data center in 2024, then expanded to Germany and Spain, and that most newly built data centers are now being equipped with the protocol.