Arista-20 Years of Growth and Innovation
Today marks the 20th anniversary of Arista! Over that time, our company has grown from nothing to #1 in Data Center Ethernet, a highly profitable...
The AI industry has taken us by storm, bringing supercomputers, algorithms, data processing and training methods into the mainstream. The rapid ramp of large language inference models combined with Open AI's ChatGPT has captured the interest and imagination of people worldwide. Generative AI applications promise benefits to just about every industry. New types of AI applications are expected to improve productivity on a wide range of tasks, be it marketing image creation for ads, video games or customer support. These generative large language models with over 100 billion parameters are advancing the power of AI applications and deployments. Furthermore, Moore's law is pushing silicon geometries of TPU/GPU processors that connect 100 to 400 to 800 gigabits of network throughput with parallel processing and bandwidth capacity to match.
Data and Compute Intensive AI Workloads
Not only are AI/ML applications a huge driver of compute today, but the silicon industry is also keeping up with the demand by churning out scalable processors. These could be CPUs, GPUs, or TPUs optimized for workloads with parallel cores or specialized processors optimized for tensor and matrix computations, with memory and IO interfaces to match. A common characteristic of these workloads is that they are not only data but also compute-intensive. A typical structure of an AI workload involves a large sparse matrix computation, so large that the parameters of the matrix are distributed across hundreds or thousands of processors. Each processor involved performs intense computations for a period of time, sharing the "parameters" with other processors involved in the computation. Once the data from all peers is received, it can be reduced (or merged) with the local data, and another round of processing begins. This compute-exchange-reduce cycle increases the volume of data exchanged exponentially. A slowdown due to a suboptimal network can critically impact the application performance creating inefficient wait-states and idling away processor performance by 30% or more while wasting the efficiency of expensive GPUs. A modern, scalable AI network is imperative.
Ethernet for AI networking Scale
The mandate to avoid these idle states with massive processor density requires a specialized AI network with wire-rate delivery of large and synchronized bursts of data improving performance at speeds of 400/800G. One must rethink the network to scale to many hundreds upon thousands of racks of AI servers. High-performance and repetitive transit of data export and import are critical to these applications. In the past, this sort of performance existed only in the domain of specialized HPC networks such as InfiniBand. Today the combination of RDMA Ethernet NICs and RoCE (RDMA over converged Ethernet) allows Ethernet and IP to be used as the transport fabric without overhead. The advantage of Ethernet for AI networking is obvious with the economics of standards, a massive installed base, industry-wide interoperability and merchant silicon support, as shown in the AI network design guidelines below.
Arista 7800 AI Spine
The mega performance against any AI workload is best captured by the Arista 7800 as the premier AI spine. It delivers an unmatched combination of high-bandwidth, lossless, high radix fabric interconnecting hundreds and thousands of GPUs at speeds of 400/800G. Arista’s AI spine addresses key characteristics, including:
Arista AI spines bring a balanced combination of low power, predictable performance/latency and reliability characteristics for the most demanding AI workloads.
Arista EOS for AI Networking
The Arista 7800 AI spine is based on the flagship software stack Arista EOS, which is critical to handling enormous workloads. We deliver our advanced customers the optimal AI network assurance for their mission-critical workloads. By infusing AI properties into the programmable EOS, we can construct a reliable AI network for automation, visibility, resilience and dynamic controls. Examples of customizable dimensions include:
AI Networking at an Inflection Point
It is an exciting time at Arista as we look forward to helping our customers with their AI networking strategies. We deliver high scale bandwidth capacity with predictable workload performance for cloud networking. With Arista AI platforms, we continue to deliver the best combination of Ethernet versatility and IP protocol capabilities at petascale with unmatched, congestion free, lossless fabric for our customer’s AI strategies. The exponential growth of AI workloads as well as distributed AI processing traffic are placing explosive demands on the network traffic. Welcome to the new wave of petascale AI networking!
Today marks the 20th anniversary of Arista! Over that time, our company has grown from nothing to #1 in Data Center Ethernet, a highly profitable...
We are excited to share that Meta has deployed the Arista 7700R4 Distributed Etherlink Switch (DES) for its latest Ethernet-based AI cluster. It's...
As I think about the evolution of the CloudVisionⓇ platform over the last 10 years, and our latest announcement today, I’m reminded of three...