NexusWise Cloud Computing Co., Ltd.

Language
- English
- بالعربية

All categories

Home page

White Paper on the InfiniBand Leaf Spine Network Architecture for AI Computing Power

Release time：2026-03-11

Abstract
As the demand for computing power in large model training and high-performance computing (HPC) grows exponentially, the bottlenecks of traditional network architectures in terms of bandwidth, latency, and scalability have become increasingly prominent. This article details the two-layer spine-leaf (Spine-Leaf) InfiniBand network architecture built on the NVIDIA Quantum-2 platform, and deeply analyzes its core advantages in the AI computing power center scenario, including ultimate performance, linear scalability, high reliability and ease of maintenance, as well as future-oriented evolution capabilities, providing a reference for building modern and high-performance AI infrastructure.

I. Introduction: Network Challenges in the AI Era
In training scenarios of models like GPT-4 and LLaMA, thousands of GPUs need to complete the synchronization and exchange of TB-level data within milliseconds. The traditional three-layer network architecture not only introduces additional forwarding delays but also easily forms a bandwidth bottleneck at the core layer, resulting in low GPU utilization and an infinitely prolonged training period. Therefore, building a high-speed network specifically optimized for AI computing power has become the core task for data center upgrades.

II. Architecture Design: Two-Layer Leaf-Spine (Spine-Leaf) InfiniBand H200 Network

2.1 Core Components and Hierarchical Division

This topology adopts a clear two-layer design, dividing the network into the spine layer, the leaf layer, and the server access layer. Each layer has clear responsibilities and works collaboratively.

Table

Layer	Core Device	Quantity	Optical Module Specification	Core Responsibilities
Spine Layer	NVIDIA Quantum-2 MQM9790	32	800Gbps OSFP 2xFR4/DR4/SR4	Core forwarding across the network, enabling non-blocking full interconnectivity between leaf switches
Leaf Layer	NVIDIA Quantum-2 MQM9790	64	Uplink: 800Gbps OSFP 2xFR4/DR4/SR4 Downlink: 800Gbps OSFP 2xSR4	Uplink to spine layer Server access and traffic aggregation
Server layer	GPU server + ConnectX-7	256	400Gbps OSFP SR4	Provide computing and storage capabilities and connect to the leaf layer

To achieve modular expansion, the entire network is divided into 8 standard PODs (Point of Delivery). Each POD contains 8 leaf switches and 32 GPU servers, forming independent computing and network units.

Single POD size: 8 leaf switches + 32 GPU servers
Total network size: 8 PODs × 32 servers/POD = 256 GPU servers
Connection relationship: Each server is connected to the leaf switch of the corresponding POD through 8 400Gbps links, and each leaf switch is fully connected to all 32 spine switches in the entire network.

III. Core Advantages: Four Pillars Supporting AI Computing Power

3.1 Ultimate Performance: Breaking the Data Transmission Bottleneck
Ultra-high Bandwidth: A single server can achieve a total access bandwidth of 3.2 Tbps through 8 400Gbps links. The core forwarding layer is constructed using 800Gbps links between the spine and leaf switches, ensuring that data is "well-fed and fast-transmitted".
Microsecond Delay: InfiniBand technology controls the end-to-end communication delay within microseconds, significantly reducing the waiting time between GPUs and significantly improving training efficiency.
Non-blocking Forwarding: The fully interconnected design ensures that communication between any two servers only requires 4 hops, avoiding the "detour" and bottleneck problems found in traditional networks.

3.2 Linear Expansion: Computing Power and Network Grow Together
Modular POD Design: New computing power only requires the deployment of a new POD without the need to modify the existing architecture, achieving linear expansion of computing power and network capacity.
Elastic Expansion Capability: By increasing the number of spine/leaf switches, the network scale can be easily expanded from hundreds of servers to thousands, meeting the needs of future ultra-large-scale AI clusters.

3.3 High Reliability and Easy Maintenance: Ensuring Business Continuity
Multiple Link Redundancy: Servers, leaf switches, and spine switches are all connected with multiple links. A single point of failure does not affect the overall business.
Simplified Operations and Maintenance: The two-layer architecture is clear and straightforward, facilitating efficient fault location; standardized PODs and a unified hardware platform significantly reduce deployment and maintenance costs.

3.4 Future-oriented: Protecting Long-Term Investments
Technical Forwardness: Adopting the Quantum-2 platform and ConnectX-7 network cards that support the InfiniBand NDR standard, it has the ability to smoothly evolve to 1.6 Tbps and higher speeds.
Compatible with Next-Generation Hardware: Open architecture design is compatible with future GPU, DPU, and other new computing power hardware, ensuring that the network infrastructure can keep up with the rapid iteration of AI technology.

IV. Application Scenarios: Empowering AI and Supercomputing Fields
Large Model Training: Supporting the high-speed collaboration of thousands of GPUs, reducing the training period from months to weeks.
Scientific Computing: In fields such as weather forecasting and gene sequencing, real-time processing and analysis of TB-level data can be achieved.
Autonomous Driving Simulation: Providing low-latency and high-bandwidth network support for massive scene simulations, accelerating algorithm iterations.

V. Conclusion
Based on the NVIDIA Quantum-2 platform's two-layer leaf-spine InfiniBand network architecture, through ultimate performance, linear expansion, high reliability and easy maintenance, as well as future-oriented design, it has perfectly solved the network challenges of the AI era. It is not only the ideal choice for currently building high-performance AI computing power centers, but also a key infrastructure for protecting users' long-term investments and supporting the development of next-generation AI technologies.

Google and Meta officially announced the large-scale deployment of 1.6T LPO (with a 50% reduction in power consumption) in 2026.

The Optical Interconnect Alliance OCI-MSA was established (AMD/Broadcom/Meta/Microsoft/NVIDIA/OpenAI)

Google and Meta officially announced the large-scale deployment of 1.6T LPO (with a 50% reduction in power consumption) in 2026.

The Optical Interconnect Alliance OCI-MSA was established (AMD/Broadcom/Meta/Microsoft/NVIDIA/OpenAI)

An In-Depth Look at InfiniBand Technology: XDR(800G)

XDR (eXtreme Data Rate) is the eighth-generation speed standard for InfiniBand technology. Positioned as the next-generation evolution of NDR, it aims to increase the single-port data rate to 800 Gbps and is specifically designed for next-generation hyperscale AI clusters and supercomputing centers.

An In-Depth Look at InfiniBand Technology: NDR(400G)

2026/03/26

LightCounting: Demand for 1.6T optical modules will exceed 30 million units in 2026

2026/03/02

NVIDIA GTC 2026: Demand for 1.6T is raised to 20 million units, and CPO roadmap is released

2026/03/16

Leave Message

If you have already experienced our product, please let us know your true feelings. Your satisfaction is our driving force for progress, while your suggestions are our valuable asset for continuous improvement.

WhatsApp: +86 177 6252 0804

Email: alen@wisdom-c.com

Mobile: +86 177 6252 0804

Add: Room 1703, Building 2 (Spark Digital Technology Innovation Center), New Energy R&D Base, Wuhan University of Technology Science Park, No. 36 Tangxunhu North Road, East Lake High-tech Development Zone, Wuhan City, Hubei Province

Cookie

Our website uses cookies and similar technologies to personalize the advertising shown to you and to help you get the best experience on our website. For more information, see our Privacy & Cookie Policy

Cookie

Required

These cookies are necessary for basic functions such as payment. Standard cookies cannot be turned off and do not store any of your information.

Analyze

These cookies collect information, such as how many people are using our site or which pages are popular, to help us improve the customer experience. Turning these cookies off will mean we can't collect information to improve your experience.

Feature

These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third-party providers whose services we have added to our pages. If you do not allow these cookies, some or all of these services may not function properly.

Advertise

These cookies help us understand what you are interested in so that we can show you relevant advertising on other websites. Turning these cookies off will mean we are unable to show you any personalized advertising.

Tel：+8617762520804

WhatsApp：8617762520804

Email：alen@wisdom-c.com

中企跨境-全域组件制作前进入CSS配置样式

在线客服添加返回顶部

右侧在线客服样式 1,2,3 1

图片alt标题设置： NexusWise Cloud Computing Co., Ltd.

表单验证提示文本： Content cannot be empty!

循环体没有内容时： Sorry,no matching items were found.

CSS / JS 文件放置地

Optical module products

1.6T

800G

400G

200G

100G

High-speed optical cable

1.6T

800G

400G

High-speed copper cable

MPO

LC

Fiber Optic Connector

Infiniband EDR 100Gb/s,Single-port,PCle 3.0*16

Infiniband HDR 200Gb/s Single-portPCle 4.0*16

Infiniband NDR 400Gb/s Single-portPCle 5.0*16

Company News

Exhibition

White Paper on the InfiniBand Leaf Spine Network Architecture for AI Computing Power

Cookie

Cookie