The Sun Rises on Scale-Up Ethernet
Co-Authored by Hugh Holbrook, Chief Development Officer
Co-Authored by Hugh Holbrook, Chief Development Officer
As the demands of AI and cloud networking push data center infrastructure to its limits, operators need networks that are not only high-performing and extremely reliable but also adaptable to the latest advancements in power, thermal management, and physical connectivity for dense clusters of tightly coupled AI accelerators. The explosive growth and rapid advancement of large language models (LLMs) has introduced complex training and inference workloads that generate massive, synchronized “scale-up” communication between hundreds to thousands of accelerators. This creates a need for tightly integrated, scale-up networks, providing extremely high bandwidth and low latency connectivity. It is critical to simultaneously embrace an open ecosystem that offers system, accelerator, and data center designers the flexibility to choose a transport layer optimized for their deployment and application.
Today there are many options including PCIE, CXL, and NVlink that create disparate islands for compute I/O. But existing solutions are clearly not optimized for the open, interoperable needs of scale-up. The industry needs that ultra-high-speed, low-latency interconnect fabric that allows AI processing units or accelerators (XPUs), within one or more racks, to function as a unified compute system, while preserving the benefits of an open standards-based solution. Once again, Ethernet is expected to be the consistent winner and equalizer for scale-up networking just as it is today for scale-out and scale-across. Some common characteristics of a scale-up network enable specific optimizations:
These characteristics allow the network and transport layer to be optimized, resulting in smaller headers and a simpler protocol, enabling unified, low-overhead memory access among XPUs, to support many forms of collectives.
Recognizing the importance of addressing real-world AI use cases, an ecosystem of industry leaders consisting of AMD, Arista, ARM, Broadcom, Cisco, HPE, Marvell, Meta, Microsoft, Nvidia, OpenAI, and Oracle have joined together to jump-start the ESUN initiative within OCP. Unveiled at the OCP Global Summit in October 2025, Ethernet for Scale-Up Networks is an open OCP workstream committed to the goal of open standards-based solutions for scale-up, based on Ethernet, and open to all. It will leverage the work of IEEE and UEC for Ethernet when possible, with the building blocks in three layers, as shown in the figure below.
Figure: At the heart of ESUN is a modular framework for Ethernet scale-up with defined Ethernet Headers, Ethernet Data Link layer functions and well understood Ethernet PHYs, as three key building blocks supported by 12 industry experts.
ESUN is designed to support any upper layer transport, including one based on SUE-T. SUE-T (Scale-Up Ethernet Transport) is a new OCP workstream, seeded by Broadcom’s contribution of SUE (Scale-Up Ethernet) to OCP. SUE-T looks to define functionality that can be easily integrated into an ESUN-based XPU for reliability scheduling, load balancing, and transaction packing, which are critical performance enhancers for some AI workloads.
In essence, the ESUN framework enables a collection of individual accelerators to become a single, powerful AI super computer, where network performance directly correlates to the speed and efficiency of AI model development and execution. The layered approach of ESUN and SUE-T over Ethernet promotes innovation without fragmentation. XPU accelerator developers retain flexibility on host-side choices such as access models (push vs. pull, and memory vs streaming semantics), transport reliability (hop-by-hop vs. end-to-end), ordering rules, and congestion control strategies while retaining system design choices. The ESUN initiative takes a practical approach for iterative improvements. Initial candidate focus areas are:
By aligning with the initial ecosystem of twelve prestigious industry leaders, we help our community of customers, standards bodies, and vendors to converge quickly on specifications and implementations that matter most for practical use cases, enabling fast iteration as requirements evolve.
Welcome to the new era of ESUN – Ethernet for Scale-Up Networking!
Co-Authored by Hugh Holbrook, Chief Development Officer
Co-Authored by Fred Hsu, Distinguished Solutions Engineer The world of technology is in the midst of a shift, driven by the rapid advancements in...
CloudVision was originally announced as a product on June 23, 2015 and now in 2025 CloudVision is officially ‘double-digits’! We received some...