4 min read

Meta and Arista Build AI at Scale

Picture of Martin Hull Martin Hull : Oct 15, 2024 10:48:32 AM

Martin Hull 2024

We are excited to share that Meta has deployed the Arista 7700R4 Distributed Etherlink Switch (DES) for its latest Ethernet-based AI cluster. It's useful to reflect on how we arrived at this point and the strength of the partnership with Meta.

The AI market changed when ChatGPT burst onto the world and created an unprecedented stir about cognitive AI's power, impact, and benefits that started to resonate with the wider world. Arista's partnership with Meta on co-development dates back to the “7368X4” minipack 100G system released in 2018, followed by successive iterations of OCP-inspired systems that are widely deployed.

Continued Evolution of Networking for AI

However, Arista’s experience in HPC, AI and machine learning goes back to the company's original foundation when many of the first customers were building large compute networks to process workloads - for Oil and Gas, Research, Medical, Finance (HFT) and others. What characterized the networking requirements in 2008 are not so different from those in 2024 - non-blocking performance, high-speed interfaces, traffic management tools, monitoring and visibility. What has changed is the scale. A typical HPC cluster in 2010 was running at 10G Ethernet, with several hundred nodes connected to a network of modular 7500E series systems. In 2024, the de facto speed is 400G Ethernet, with interconnects running at 800G, and the scale of the AI cluster has increased to many thousands of compute nodes, each containing multiple XPUs.

As large AI language models (LLM) expand, higher bandwidths and ever-more challenging workloads are best suited to Ethernet—the debate on IB is resolved!

Demanding AI Applications Need the Best-of-Breed Networking

Accommodating the networking needs of an entire data center network in a single system is not possible. Any single system is constrained by the physical and logical capacity of either a single networking packet processor or, in multi-chip systems, the size of a network rack and distances between compute nodes. For this reason, we build multi-tier “networks” that scale up to address the total demand.

The Arista 7800R4 is a high performance multi-chip system that scales to over 1,000 400G ports and is the backbone of many large-scale data center networks. For AI networks looking at tens of thousands of 400G attached XPUs, we quickly hit the limits of a single 7800R4 and need multiple network tiers. Today, many large-scale AI designs have deployed 2-tier or even three-tier systems in leaf-spine architectures for back-end networks with choices of fixed and modular systems. In these designs, every platform is an independent node making forwarding decisions without automated or coordinated inter-node communication for lossless transport. While this provides maximum autonomy with broad multi-vendor interoperability, it also imposes complexity by forcing explicit configuration of AI-aware congestion management, performance tuning, and load balancing mechanisms between nodes.

Vendors and customers are working collectively as part of the Ultra Ethernet Consortium to propose enhancements that can address some of the challenges associated with lossless transport, efficient packet distribution, congestion, and traffic management in large multi-tier networks with intensive AI workloads.

Ideally, in a perfect world, a single system would scale up and deliver the capacity that avoids the need to build two-tier networks, but the modular data center switch systems commonly available are all designed around the capacity of a single rack and other limitations.

Time for a Change with a Distributed AI Platform

The 7700R4 DES platform is very different. While it may physically look and be cabled like a two-tier leaf/spine network, the similarities end there. DES provides single-hop forwarding with a highly efficient fabric spine layer that is a standalone, autonomous system with local forwarding lookups and independent path selection decisions.

The 7700R4 DES brings together the best of the Arista R-Series architecture, with dedicated VoQ for buffering intense flows, internal 100% efficient load balancing, eliminating the need for tuning, and fast failover.

The Arista 7700R4 DES was developed with input from our long-standing customer Meta, who knew, based on their experience with the 7800R3, the benefits of the R-Series architecture for AI workloads but who wanted a much larger scale solution that offered all the same benefits and a smooth path to 800G.

Meta-Arista-AI-Blog-MH

The 7700R4 behaves like a single system, with dedicated deep buffers to ensure system-wide lossless transport across the entire Ethernet-based AI network. DES is topology agnostic, UEC ready, optimized for both training and inference workloads, with a 100% efficient architecture, and offers the rich telemetry and smart features that the modern AI Center needs.

DES Key Advantages

Advantage	Description	Impact
Accelerator Agnostic	DES works with any XPU, workload, and vertical application.	Future-proof solution that is flexible with no lock-ins.
NIC Agnostic	DES works with all high-speed networks and delivers a lossless, fully scheduled solution with packet spraying without needing a dedicated smart NIC.	No special NICs are required, with substantial cost and power savings.
Topology Agnostic	DES accommodates commonly deployed 2-tier ToR and rail designs simultaneously.	Maximizes performance and reduces the cost and power of optics and fibers.
Ultra Ethernet Ready	DES works with or without UEC enhancements.	Future-proof solution, flexible – no need to wait.
No Special Tuning Required	DES is 100% efficient out of the box based on the R-Series VoQ and cell-based fabric architecture.	Saves time and maximizes XPU investment by accelerating deployment.
Fast Hardware Failover	DES provides 100ms link failure detection and reroute.	No active protocol failovers, no subnet manager or controller needed.
Built for LPO	All DES ports support Linear Drive Pluggable Optics.	This allows a 50% or greater power reduction on leaf-spine links.
Smart features for AI	DES provides native visibility, advanced traffic management, and NIC integration.	With a deep understanding of cluster performance and setting, troubleshooting is easy.

Summary

The rise of the AI center has created greater demands on modern open networking. The Arista Etherlink portfolio delivers choices in form factor, scaling from single-chip systems to modular multi-chip, multi-tier networks that scale out to thousands of XPU ports. The 7700R4 Distributed Etherlink Switch offers simplicity and scalability with a cost-effective and power-efficient solution for the AI Center. We are thrilled with the close engineering collaboration with Meta for the new era of AI.