Subscribe to Blog Notification Emails

Latest Blog Post

CloudVision: A Cognitive Management Plane

Kenneth Duda
by Kenneth Duda on May 7, 2018 4:04:58 PM

The last 40 years have seen tremendous growth and progress in the data networking industry. Ethernet, IP, MPLS, GRE, IPsec, MACsec, and VXLAN enable operators to build secure, multiservice, high-performance data planes that interoperate across multiple vendors, multiple operators, and multiple administrative domains. Likewise, BGP, OSPF, IS-IS, LDP, RSVP, BFD, LACP, L3VPN, VPLS, and EVPN enable operators to build scalable multi-vendor control planes that federate across organizational boundaries, supporting mission-critical networks with global reach.

There is a striking contrast between the maturity of the data and control planes on the one hand, and the void at the management plane on the other. What do vendors provide operators in the management plane? There are command-line interfaces designed for manual device-by-device operation; low-performance SNMP access to a small subset of device state; and a hodgepodge of proprietary programmatic interfaces involving custom engineering work to harness. The world's most sophisticated operators invest hundreds of millions of dollars to build a proper management plane from this primitive starting point. That approach is out of reach for the vast majority of operators, who simply cope without a proper management plane, scripting the most common tasks but doing most work manually. This approach is labor intensive, but worse, it's fundamentally error-prone, leading to a situation where the most common cause of service disruption is operator error. Surely the industry can do better.

At Arista, we believe we see a path forward. We have created CloudVision®, the industry's first cognitive management plane (CMP) cluster, what we hope will be a new industry-wide approach to network management. CloudVision harnesses the capabilities of cloud computing, big data and machine learning, collecting and archiving all network state over all time, and running a suite of applications providing visibility, automating deployment, and reporting and analyzing important events. CloudVision brings the benefits of custom in-house NMS at much lower cost and with much broader applicability across the operator community.

The CloudVision architecture is shown in the figure below.

KensBlog-Cognitive

The CloudVision cluster is a horizontally scalable pod of compute and storage with three layers of software: NetDB state storage (built on Kafka and HBase), stream computation and applications. Devices export all state via NetDB streaming (gNMI Notification messages over gRPC) to NetDB, capturing all device state over all time. As state enters NetDB, stream processors transform, clean, aggregate and analyze the state, writing derived state back into NetDB. Applications access the state, providing visibility and alerting, enabling the manager to take specific actions to change policy, reconfigure, upgrade, etc., and providing higher-level management applications to enable operators to apply policies uniformly across hundreds or thousands of network nodes.

The cognitive management plane architecture scales via multiple CMP clusters (CloudVision instances), where each cluster manages a subset of devices, typically based on vendor, geographic region, and/or administrative domain. Managed devices and CMP cluster software typically come from the same vendor (e.g., CloudVision for Arista switches), ingesting all device state in a mix of standardized and proprietary representations. Via stream computation, the CMP cluster transforms vendor-specific representations into well-standardized models, which may then be exported to CMP clusters for other vendors via OpenConfig streaming. In this way, one vendor's application can interact with state from devices from many vendors, providing end-to-end visibility and uniform policy control across multiple geographies, multiple vendors and multiple administrative domains.

The CMP architecture contemplates many possible deployment models. CMP clusters can run on-prem (in the operator's datacenter) or off-prem (in the cloud). They can be single-tenant or multi-tenant. They can be operated by the network operator or by the vendor. We expect that smaller operators are more likely to prefer cost-effective cloud-hosted multi-tenant deployments; the most sophisticated operators will operate their own on-prem CMP clusters. 

The cognitive management plane is very powerful. It provides:

  • full state history. Operators can see all state of any device from any point in time.Historical visibility is a big help in debugging transient or intermittent issues.

  • full network view. Arista CloudVision fully supports all Arista switches and routers today.We expect other vendors to provide comprehensive support for their devices in their management plane clusters.

  • high availability. A pair of clusters co-ingest state from the same set of devices, such that if one cluster is down, the other continues managing the devices.

  • machine learning.Machine learning requires large amounts of data. By collecting all data from all devices into one place, CMP supports machine learning algorithms to automatically identify which alerts are important, which device states are cause for concern, and what are the likely root causes of anomalous behavior.

  • in-service roll-out. Because the management plane is external from the managed devices, it can be upgraded independent of the physical infrastructure. Management plane failure has no impact on the network control plane or data plane, and so is non-disruptive to applications; hence, management plane upgrades are low risk, and new features can be deployed frequently.

  • multi-vendor scalability. Each vendor can provide its own CMP software, and can host that software on behalf of that vendor's customers.The vendor may choose to partition their hosted management plane into multiple clusters for better geographic segmentation, to reduce the failure blast radius, or to accommodate administrative divisions within the operator. These clusters work together (via state export) to provide a unified management experience.

  • cross-cluster applications. Through state export, an application can run in one cluster based on state in other clusters.For example, let's suppose the blue vendor has a really nice segmentation management application, where you can specify high level policy and the app automatically generates appropriate access lists and pushes them to each physical device. You'd like to use this app for all of your devices, green as well as blue. CMP supports this use case. The operator configures the green cluster to export topology and inventory state to the blue cluster, and configures the blue cluster to export access lists to the green cluster. Then, when the blue segmentation app generates access lists, the blue cluster exports them to the green cluster; the green cluster then pushes them into the green devices.

Arista is committed both to making CloudVision the industry's best network management system, and also to making the multivendor CMP vision a reality. As more customers experience CloudVision, we are hoping to generate the kind of operator pull that will convince other equipment vendors that it is in their interest (as well as in the operator's and industry's interests) to cooperate in creating the tools and building blocks that operators need to create a multi-vendor cognitive management plane.

Opinions expressed here are the personal opinions of the original authors, not of Arista Networks. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Arista Networks or any other party.

Kenneth Duda
Written by Kenneth Duda
Kenneth Duda is a pioneer in high-performance networking software and lead architect of Arista Networks EOS, a stateful modular operating system for all Arista Networks products. He is also the co-author of network virtualization specifications including VXLAN with VMware and NVGRE with Microsoft. From 2005 to 2008, Ken was also the Acting President of Arista Networks.

Related posts

Protecting IP or Market Share?

It's tough times on Tasman Drive. Struggling to apply old technology to the new world of cloud computing, Cisco is potentially...

Kenneth Duda
By Kenneth Duda - February 2, 2016
Why Java APIs and Industry-Standard CLIs are Different

In the past few years, the tech industry has watched with increasing concern as various entrenched participants have brandished...

Kenneth Duda
By Kenneth Duda - July 15, 2015