IC2E 2015 Practitioners Talks

General Information

Calls

Program

Travel Information

IC2E 2015 Practitioners Talks

Cloud Storage Infrastructure Optimization Analytics

Ramani Routray; Senior Technical Staff Member (STSM) and Manager, IBM Research - Almaden

Abstract

Emergence and adoption of cloud computing have become widely prevalent given the value proposition it brings to an enterprise in terms of agility and cost effectiveness. Big data analytical capabilities (specifically treating storage/system management as a big data problem for a service provider) using Cloud delivery models is defined as Analytics as a Service or Software as a Service. This service simplifies obtaining useful insights from an operational enterprise data center leading to cost and performance optimizations. Software defined environments decouple the control planes from the data planes that were often vertically integrated in a traditional networking or storage systems. The decoupling between the control planes and the data planes enables opportunities for improved security, resiliency and IT optimization in general. This talk describes our novel approach in hosting the systems management platform (a.k.a. control plane) in the cloud offered to enterprises in Software as a Service (SaaS) model. Specifically, in this presentation, focus is on the analytics layer with SaaS paradigm enabling data centers to visualize, optimize and forecast infrastructure via a simple capture, analyze and govern framework. At the core, it uses big data analytics to extract actionable insights from system management metrics data. Our system is developed in research and deployed across customers, where core focus is on agility, elasticity and scalability of the analytics framework. We demonstrate few system/storage management analytics case studies to demonstrate cost and performance optimization for both cloud consumer as well as service provider. Actionable insights generated from the analytics platform are implemented in an automated fashion via an OpenStack based platform.

Biography

Ramani Routray is a Senior Technical Staff Member and Manager in the Storage Services Research Department at IBM Almaden Research Center. Ramani joined IBM in 1998. Ramani has architected and delivered several key products and service offering in the area of storage Disaster Recovery, Storage Management and Virtualization, and Cloud Service Architectures. He received a Bachelor’s degree in Computer Science from Bangalore University and a Master’s degree in Computer Science from Illinois Institute of Technology. He has received multiple IBM Outstanding Technical Achievement awards, authored over 30 scientific publications, technical standards, and is IBM Master Inventor being author or co-author of 40+ patents.

In-Memory Computing for Scalable Data Analytics

Jun Li; Principal Research Scientist, Hewlett-Packard Laboratories

Abstract

Current data analytics software stacks are tailored to use large number of commodity machines in clusters, with each machine containing a small amount of memory. Thus, significant effort is made in these stacks to partition the data into small chunks, and process these chunks in parallel. Recent advances in memory technology now promise the availability of machines with the amount of memory increased by two or more orders of magnitude. For example, The Machine [1] currently under development at HP Labs plans to use memristor, a new type of non-volatile random access memory with much larger memory density at access speed comparable to today’s dynamic random access memory. Such technologies offer the possibility of a flat memory/storage hierarchy, in-memory data processing and instant persistence of intermediate and final processing results. Photonic fabrics provide large communication bandwidth to move large volume of data between processing units at very low latency. Moreover, the multicore architectures adopt system-on-chip (SoC) designs to achieve significant compute performance with high power-efficiency. Such machines may require significant changes in software developed for current big data and cloud applications. Within this context, we have begun to characterize workloads and develop novel analytics platforms and techniques for big-data analytics. In this presentation, I will share our experience with two systems developed for in-memory computing, an inmemory port of Hadoop and a custom built in-memory search framework for high-dimensional data such as images. By making only a small number of changes to Apache Hadoop, we have developed an in-memory Hadoop platform that runs on a multicore big-memory machine. Specifically, we partitioned a large multi-core server into virtual nodes, each consisting of a number of cores, a unique IP address, and a Non-Uniform Memory Access (NUMA)-aware in-memory file system. Hadoop server nodes (such as Task Trackers and Data Nodes) are bound to these virtual nodes, as if they are assigned to different cluster machines. Our in-memory Hadoop implementation currently runs on a HP DL980, containing 80 cores and 2 TB main memory. Compared to a machine cluster of 16 DL380 machines with the total number of CPU cores and CPU frequency matched to our single DL980, we observe a 780% gain for read throughput and 103% gain for write throughput on the DL980. However, at the application level, performance gain in terms of latency for the DL980 is reduced to 48% in TeraSort benchmark, and actually becomes -41% in WordCount benchmark. Our experiment demonstrates that simply replacing the disk-based file system by an in-memory file system can provide significant low-level IO performance gains. However, such low-level gains do not translate to significant gains at the application level. This is because high-level processing layers such as the Map and Reduce engines are designed to support low-memory capacity machines, and thus process small data chunks one at a time in the allocated limited memory and rely on a large disk-based file system to support large data processing via memory/disk hierarchy. To better understand the implications of large memory machines, we built a second system that searches for the topmost similar images within a large image corpus using similarity indexes based on Locality Sensitive Hashing. High similarity search accuracy (95% or higher) demands large number of hash tables to be constructed, and low search latency (100 milliseconds or lower) demands that hash tables need to be built via in-memory data structures and accessed randomly at memory speed. In our system implementation, the entire image data set is partitioned into distributed partitions. Each partition conducts LSH-based search to return the most similar images in that partition and a coordinator aggregates all local search results to form the global search results. To understand the gains possible by using in-memory environments, the entire distributed index/search framework was developed from scratch to ensure that each partition’s hash tables fit in the local memory of the processor searching the partition. To achieve high performance search, the LSH code kernel has been extensively tuned for the DL980 multicore architecture using techniques such as software prefetching and code vectorization. We used three DL980 machines (each with 80 cores and 2 TB RAM) to host the search framework, and achieved about 110 milliseconds per image search query over an image set of 80 million images. As a comparison, our in-memory Hadoop implementation roughly spends about 50 seconds just at the Map phase of the search (that is, image scanning over the partitions) on the same data set across the three machines. These two experiments show that in order to achieve high performance benefited from in-memory computing, the software stack needs to be re-done from ground up, including data structures designed for large memory, fast access, optimization for CPU caching, along with proper data placement to match the underlying NUMA architecture. Our investigations currently only take advantage of large-memory machines. In addition to large memory, The Machine also provides other features such as byte-addressable non-volatile memory at data-center scale with optical speeds, and specialized cores. In order to truly take advantage of The Machine’s features, we will have to carefully re-examine and re-build the software architecture of today’s data analytics platforms. It opens up many research problems in fast in-memory data transportation over system fabrics, multicore and caching aware in-memory data structures, high-performance primitive analytics operators, bulk-synchronous versus asynchronous processing on partitioned in-memory data structures, caching and persistence in the new system-wide memory hierarchy, among many others.

REFERENCES
[1] The Machine: A new kind of computer. http://www.hpl.hp.com/research/systems-research/themachine/