Arista-20 Years of Growth and Innovation
Today marks the 20th anniversary of Arista! Over that time, our company has grown from nothing to #1 in Data Center Ethernet, a highly profitable...
Once again new applications are pushing the envelope for modern cloud networking, a radical departure from traditional enterprises! Big Data was a hot topic at the GigaOm Structure conference last month.
As companies acquire and analyze vast amounts of structured and un-structured data, they increasingly re-engineer their data centers for new applications, which in turns fuels their need for a new cloud network. We witness an increasing trend towards Big Data for storing and efficiently processing petabytes to exabytes of unstructured data. Big Data and its associated storage archivals are changing the way networking must be repurposed for these new storage behaviors.
Storage Area Networks (SANS) using block-based Fiber channel were considered the only option in the past decade. Then came Network attached Storage with File protocols as well as ISCSI over high speed Ethernet. In the 2010 decade, the advent of dense storage with SSD combined with the popularity of Hadoop, is driving the third generation of storage that resembles the early direct-attached storage models. What goes around comes around with greater speed, scale and intelligence.
Hadoop, a big data software framework drives the compute engines in data centers from IBM to Google, Yahoo-spin off, EBay and Facebook. The framework is comprised of distributed file systems, databases, and data mining algorithms. A noteworthy aspect is the ability to create Hadoop clusters out of standards-based computing and networking elements to run parallelized, data-intensive workloads. This is a departure from convention where the storage archival process happens up front as the first step in the data's lifecycle.
In order to efficiently access stored results or simply calculate new ones, a well-designed network with full any-to-any capacity, low latency, and high bandwidth can significantly improve Hadoop performance. Also, as workloads grow, it is important that the network can sustain the inclusion of additional servers in an incremental fashion. Hadoop only scales in proportion to the compute resources networked together at any time.
Hadoop (using the Map-Reduce algorithm) to process large quantities of unstructured data is quickly changing the storage paradigm. In the map phase, in order to handle large data sets, the data is broken up into small chunks that are spread across the cluster nodes. Servers are given tasks relevant to the data already present in their directly attached storage (DAS). Pushing the computation to the data, storage and is a critical part of processing petabytes - even with the fattest pipe of 100 GbE, a badly allocated workload could take weeks to simply read in all the data necessary! Once this initial processing of data is completed, resulting outputs are sent to a smaller subset of the nodes for further processing and summarization. The resulting data movement is called a “shuffle”, and involves large amounts of data traversing into a few nodes, which is often demanding on the underlying Cloud Networking infrastructure.
Arista is once again at the forefront of delivering the attributes required to build a reliable Big Data Cloud Network including:
Big Data is indeed creating an extraordinary challenge and opportunity requiring critical decisions on application and storage performance. As always, I welcome your views at feedback@arista.com and I am excited by this trend.
Today marks the 20th anniversary of Arista! Over that time, our company has grown from nothing to #1 in Data Center Ethernet, a highly profitable...
We are excited to share that Meta has deployed the Arista 7700R4 Distributed Etherlink Switch (DES) for its latest Ethernet-based AI cluster. It's...
As I think about the evolution of the CloudVisionⓇ platform over the last 10 years, and our latest announcement today, I’m reminded of three...