Subscribe to Blog Notification Emails

Latest Blog Post

Redefining Cloud-Networking Resilience & Visibility

Jayshree Ullal
by Jayshree Ullal on Oct 1, 2013 7:36:33 PM

In modern two-tier leaf-spine cloud networks, the increasing dominance of east-west traffic patterns, accompanied by the sheer volume of traffic and the increase in data rates from 10G to 40G/100G are combining to make it challenging to predict and analyze performance issues proactively. The scale involved in connecting 100K+ physical servers, 1M+ Virtual Machines and large big data storage elements is redefining expectations of a resilient network. Self-healing networks and new levels of visibility are no longer optional, but are, in fact, mandatory.


So how is today’s Network Software doing in these environments?

Self Healing & Programmable Software is Paramount



Most Network OS’s are unfortunately decades old and are designed for older enterprise data center applications of the 1990s. Traditional vendors do try valiantly to band-aid these legacy OS’s, but modern networks need a new ground-up architecture. Arista’s Extensible OS (EOS) is the ONLY purpose-built data center networking software to address this requirement. EOS was designed to support mission critical clouds and data centers as the primary goal. Brilliant engineers lead the EOS team and our SVP of Software Engineering and CTO, Ken Duda, pioneered the architecture. Indeed it is this engineering feat of software excellence that drew me to the company several years ago.


Published studies have shown that the operational costs of running a network are many times more than the capital expenditures over time. The cost of operational down-time from lack of visibility into the network infrastructure is estimated to be $5,600 per minute. This amounts to more than a million dollars for just several hours of outage. Cloud-scale operators must reduce downtime and detect, isolate, and resolve application performance problems proactively in order to meet their customer expectations and Service Level Agreements.


The secret sauce of Arista EOS is a multi-process state-sharing architecture which is self-healing and which exposes open APIs to enable programmability. EOS stores all system state in a central database (Sysdb) that holds and validates all system state and propagates updates. The schema-specific code in Sysdb is machine generated, providing the performance of hand-written code without the errors. The stateful publish-subscribe approach of EOS is intrinsically deterministic, borrowing heavily from the world of databases where state survives application shutdown. Many alternate data center vendors claim “improved” operating systems yet they deploy archaic message-passing schemes, where agents interact by sending messages back and forth to convey state, adding complexity and delays. Archaic check-pointing services are often deployed for restart only, which can be error-prone. This is because agents read their checkpoints only during a restart, not all the time. Initialization as well as the restart of agents within EOS is handled consistently through the same repository without reliance on recovery.

Virtual to Physical to Application Visibility



To improve down time and save costs, dynamic network troubleshooting and monitoring tool sets are needed. We must provide both fine-grained visibility to application performance, and also more global network-wide monitoring capabilities. How can you capture, analyze and troubleshoot traffic between two virtual servers when there are literally hundreds of paths between the racks where servers are located and the exact location of the server is unknown?


Arista Network Telemetry works in conjunction with applications so that the network is not in the way anymore. It dramatically reduces application downtime and network operational costs through improved real-time system and network performance visibility, correlation to application behavior and advanced end-to-end path monitoring tools. This saves millions of dollars and hours of downtime. Arista Tracers are enhancements to the Arista Network Telemetry application that bring deeper application level visibility by integrating with distributed applications like Big Data, Cloud, and Virtualized environments (see Figure below).



Figure: Arista EOS Tracer Technologies
Examples such as Health Tracer, Path Tracer, VM Tracer and MapReduce Tracer redefine resiliency and visibility.

The programmable foundation of Arista EOS combined with Network Tracers provides a real-world solution to the real-world problems of cloud network visibility, monitoring and troubleshooting. It enables tight linkages between the physical, virtual and application infrastructure that result in considerable savings in operational expenditures.

Welcome to the new world of software defined cloud networking with increased visibility, and lower operational costs and reduced down time. I look forward to your comments at: feedback@arista.com

References:

Opinions expressed here are the personal opinions of the original authors, not of Arista Networks. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Arista Networks or any other party.
Jayshree Ullal
Written by Jayshree Ullal
As CEO and Chairperson of Arista, Jayshree Ullal is responsible for Arista's business and thought leadership in AI and cloud networking. She led the company to a historic and successful IPO in June 2014 from zero to a multibillion-dollar business. Formerly Jayshree was Senior Vice President at Cisco, responsible for a $10B business in datacenter, switching and services. With more than 40 years of networking experience, she is the recipient of numerous awards including E&Y's "Entrepreneur of the Year" in 2015, Barron's "World's Best CEOs" in 2018 and one of Fortune's "Top 20 Business persons" in 2019. Jayshree holds a B.S. in Engineering (Electrical) and an M.S. degree in engineering management. She is a recipient of the SFSU and SCU Distinguished Alumni Awards in 2013 and 2016.

Related posts

The New AI Era: Networking for AI and AI for Networking*

As we all recover from NVIDIA’s exhilarating GTC 2024 in San Jose last week, AI state-of-the-art news seems fast and furious....

Jayshree Ullal
By Jayshree Ullal - March 25, 2024
The Arrival of Open AI Networking

Recently I attended the 50th golden anniversary of Ethernet at the Computer History Museum. It was a reminder of how familiar...

Jayshree Ullal
By Jayshree Ullal - July 19, 2023
Network Identity Redefined for Zero Trust Enterprises

The perimeter of networks is changing and collapsing. In a zero trust network, no one and no thing is trusted from inside or...

Jayshree Ullal
By Jayshree Ullal - April 24, 2023