Subscribe to Blog Notification Emails

Latest Blog Post

The Future of Cognitive Cloud Networking

Jayshree Ullal
by Jayshree Ullal on Feb 27, 2018 3:03:44 PM

Artificial Intelligence, machine and deep learning have to be among the most popular tech-words of the past few years, and I was hoping that I wouldn’t get swept away by it. But when I heard a panel on this topic at our customer event this month on the state of AI networks, I found it incredibly fascinating and it piqued my curiosity! Let me start with a note of disclaimer for readers who are expecting a deep tutorial from me. There is a vast amount of research behind models and algorithms on this topic that I will not even attempt to cover. Instead I will try to share some thoughts on the practical relevance of this promising field.

Behind the Buzz Words

AI and machine learning have been around for a long time. The difference now is that there is much more powerful compute and network infrastructure available, along with exponentially more data to analyze. The criticality of efficient data movement includes a lifecycle of improvements in deep learning for ingesting, processing and inferring data, thereby creating higher layer abstractions for data scientists to quickly develop and train models.

The result is problems that were previously in the realm of impossible such as real-time language translation, fraud detection, and autonomous vehicle control, are being addressed through the use of neural network models, detecting patterns and behaviors across huge amounts of structured and unstructured data. As an example, an AI program that learns Van Gogh paintings can match with similar new paintings. While the human brain may be better in detecting deeper meaning and “conscious thought”, AI is radically increasing the benefits of “raw intelligence.” The continuing goal is to minimize the cycle time, both for the development of new algorithms and models and then to scale AI applications to serve billions of devices in real-time. Scientists can now reduce their research time from years to hours for trials and studies. Machine learning algorithms are typically implemented as floating point, which is why NVIDIA GPUs have been so popular here. This is combined with inference that is typically done in integer logic. This combination delivers the most machine learning and inference performance at the lowest cost and power. It also allows these systems to be tuned for AI applications.

The Network Relevance

Within a typical AI appliance, multiple GPUs are interconnected with very high-speed chip-to-chip interfaces. The NVIDIA DGX-1 with Volta system can interconnect 4 GPU chips with NVLink into a cube-mesh topology, which is then packaged together with general purpose CPUs. 100G Ethernet and RDMA over Converged Ethernet (RoCE) can be used to enable any GPU in the network to access any other GPUs memory. The high-performance Ethernet network used between DGX systems also communicates to storage devices (such as Pure Storage FlashBlade) and the DGX-1 servers, vastly simplifying AI system configuration and deployment. The NVIDIA DGX-1 system starts at 4 100G networking ports that deliver a total of 400G or 50 Gigabytes/sec of throughput, which is 4 times as much network bandwidth compared to general purpose servers in cloud networks.

Cognitive Networking Implications

AI servers together with an Arista leaf-and-spine network and storage appliances can form an important AI nucleus. We have tested these solutions with NVIDIA and Pure Storage to offer the highest IO density per appliance. The common theme is both AI storage and networking need insatiable bandwidth to feed the powerful applications. The NVIDIA DGX-1 system is just a 3U footprint with 4 100G interfaces to ingest up to 100 Gigabytes/second.

Cloud titans may migrate easily between different kinds of AI workloads without compromising AI applications. This improves monetization to optimize the ad or movie they are recommending to drive real time user experiences. Yet the potential goes beyond the cloud to enterprises as well. In small steps, Arista has already begun its journey through CloudVision’s® machine learning implementations. If there is an abnormal traffic rate, anomalies are quickly pinpointed and corrected.

At Arista we are at the cusp of building new, transformative technologies in our Arista EOS® architecture for machine learning, telemetry and failure mitigation. I am excited by the prospects ahead in the decade of transformation and innovation. Welcome to 2018 and the age of cognitive cloud networking.

References:

Opinions expressed here are the personal opinions of the original authors, not of Arista Networks. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Arista Networks or any other party.
Jayshree Ullal
Written by Jayshree Ullal
As CEO and Chairperson of Arista, Jayshree Ullal is responsible for Arista's business and thought leadership in AI and cloud networking. She led the company to a historic and successful IPO in June 2014 from zero to a multibillion-dollar business. Formerly Jayshree was Senior Vice President at Cisco, responsible for a $10B business in datacenter, switching and services. With more than 40 years of networking experience, she is the recipient of numerous awards including E&Y's "Entrepreneur of the Year" in 2015, Barron's "World's Best CEOs" in 2018 and one of Fortune's "Top 20 Business persons" in 2019. Jayshree holds a B.S. in Engineering (Electrical) and an M.S. degree in engineering management. She is a recipient of the SFSU and SCU Distinguished Alumni Awards in 2013 and 2016.

Related posts

The New AI Era: Networking for AI and AI for Networking*

As we all recover from NVIDIA’s exhilarating GTC 2024 in San Jose last week, AI state-of-the-art news seems fast and furious....

Jayshree Ullal
By Jayshree Ullal - March 25, 2024
The Arrival of Open AI Networking

Recently I attended the 50th golden anniversary of Ethernet at the Computer History Museum. It was a reminder of how familiar...

Jayshree Ullal
By Jayshree Ullal - July 19, 2023
Network Identity Redefined for Zero Trust Enterprises

The perimeter of networks is changing and collapsing. In a zero trust network, no one and no thing is trusted from inside or...

Jayshree Ullal
By Jayshree Ullal - April 24, 2023