Cloud Storage Infrastructure Optimization Analytics
Ramani Routray; Senior Technical Staff Member (STSM) and Manager, IBM Research - Almaden
Emergence and adoption of cloud computing have become widely
prevalent given the value proposition it brings to an enterprise in terms of
agility and cost effectiveness. Big data analytical capabilities (specifically
treating storage/system management as a big data problem for a service provider)
using Cloud delivery models is defined as Analytics as a Service or Software as
a Service. This service simplifies obtaining useful insights from an operational
enterprise data center leading to cost and performance optimizations. Software
defined environments decouple the control planes from the data planes that were
often vertically integrated in a traditional networking or storage systems. The
decoupling between the control planes and the data planes enables opportunities
for improved security, resiliency and IT optimization in general. This talk
describes our novel approach in hosting the systems management platform (a.k.a.
control plane) in the cloud offered to enterprises in Software as a Service
(SaaS) model. Specifically, in this presentation, focus is on the analytics
layer with SaaS paradigm enabling data centers to visualize, optimize and
forecast infrastructure via a simple capture, analyze and govern framework. At
the core, it uses big data analytics to extract actionable insights from system
management metrics data. Our system is developed in research and deployed across
customers, where core focus is on agility, elasticity and scalability of the
analytics framework. We demonstrate few system/storage management analytics case
studies to demonstrate cost and performance optimization for both cloud consumer
as well as service provider. Actionable insights generated from the analytics
platform are implemented in an automated fashion via an OpenStack based
Ramani Routray is a Senior Technical Staff Member and Manager in the Storage
Services Research Department at IBM Almaden Research Center. Ramani joined IBM
in 1998. Ramani has architected and delivered several key products and service
offering in the area of storage Disaster Recovery, Storage Management and
Virtualization, and Cloud Service Architectures. He received a Bachelor’s degree
in Computer Science from Bangalore University and a Master’s degree in Computer
Science from Illinois Institute of Technology. He has received multiple IBM
Outstanding Technical Achievement awards, authored over 30 scientific
publications, technical standards, and is IBM Master Inventor being author or
co-author of 40+ patents.
In-Memory Computing for Scalable Data Analytics
Jun Li; Principal Research Scientist, Hewlett-Packard Laboratories
Current data analytics software stacks are tailored to use large number of commodity machines in clusters, with each
machine containing a small amount of memory. Thus, significant effort is made in these stacks to partition the data into small
chunks, and process these chunks in parallel. Recent advances in memory technology now promise the availability of machines
with the amount of memory increased by two or more orders of magnitude. For example, The Machine  currently under
development at HP Labs plans to use memristor, a new type of non-volatile random access memory with much larger memory
density at access speed comparable to today’s dynamic random access memory. Such technologies offer the possibility of a flat
memory/storage hierarchy, in-memory data processing and instant persistence of intermediate and final processing results.
Photonic fabrics provide large communication bandwidth to move large volume of data between processing units at very low
latency. Moreover, the multicore architectures adopt system-on-chip (SoC) designs to achieve significant compute
performance with high power-efficiency.
Such machines may require significant changes in software developed for current big data and cloud applications. Within
this context, we have begun to characterize workloads and develop novel analytics platforms and techniques for big-data
analytics. In this presentation, I will share our experience with two systems developed for in-memory computing, an inmemory
port of Hadoop and a custom built in-memory search framework for high-dimensional data such as images.
By making only a small number of changes to Apache Hadoop, we have developed an in-memory Hadoop platform that
runs on a multicore big-memory machine. Specifically, we partitioned a large multi-core server into virtual nodes, each
consisting of a number of cores, a unique IP address, and a Non-Uniform Memory Access (NUMA)-aware in-memory file
system. Hadoop server nodes (such as Task Trackers and Data Nodes) are bound to these virtual nodes, as if they are assigned
to different cluster machines. Our in-memory Hadoop implementation currently runs on a HP DL980, containing 80 cores and
2 TB main memory. Compared to a machine cluster of 16 DL380 machines with the total number of CPU cores and CPU
frequency matched to our single DL980, we observe a 780% gain for read throughput and 103% gain for write throughput on
the DL980. However, at the application level, performance gain in terms of latency for the DL980 is reduced to 48% in
TeraSort benchmark, and actually becomes -41% in WordCount benchmark.
Our experiment demonstrates that simply replacing the disk-based file system by an in-memory file system can provide
significant low-level IO performance gains. However, such low-level gains do not translate to significant gains at the
application level. This is because high-level processing layers such as the Map and Reduce engines are designed to support
low-memory capacity machines, and thus process small data chunks one at a time in the allocated limited memory and rely on
a large disk-based file system to support large data processing via memory/disk hierarchy.
To better understand the implications of large memory machines, we built a second system that searches for the topmost
similar images within a large image corpus using similarity indexes based on Locality Sensitive Hashing. High similarity
search accuracy (95% or higher) demands large number of hash tables to be constructed, and low search latency (100
milliseconds or lower) demands that hash tables need to be built via in-memory data structures and accessed randomly at
memory speed. In our system implementation, the entire image data set is partitioned into distributed partitions. Each partition
conducts LSH-based search to return the most similar images in that partition and a coordinator aggregates all local search
results to form the global search results. To understand the gains possible by using in-memory environments, the entire
distributed index/search framework was developed from scratch to ensure that each partition’s hash tables fit in the local
memory of the processor searching the partition. To achieve high performance search, the LSH code kernel has been
extensively tuned for the DL980 multicore architecture using techniques such as software prefetching and code vectorization.
We used three DL980 machines (each with 80 cores and 2 TB RAM) to host the search framework, and achieved about 110
milliseconds per image search query over an image set of 80 million images. As a comparison, our in-memory Hadoop
implementation roughly spends about 50 seconds just at the Map phase of the search (that is, image scanning over the
partitions) on the same data set across the three machines.
These two experiments show that in order to achieve high performance benefited from in-memory computing, the software
stack needs to be re-done from ground up, including data structures designed for large memory, fast access, optimization for
CPU caching, along with proper data placement to match the underlying NUMA architecture.
Our investigations currently only take advantage of large-memory machines. In addition to large memory, The Machine
also provides other features such as byte-addressable non-volatile memory at data-center scale with optical speeds, and
specialized cores. In order to truly take advantage of The Machine’s features, we will have to carefully re-examine and re-build
the software architecture of today’s data analytics platforms. It opens up many research problems in fast in-memory data
transportation over system fabrics, multicore and caching aware in-memory data structures, high-performance primitive
analytics operators, bulk-synchronous versus asynchronous processing on partitioned in-memory data structures, caching and
persistence in the new system-wide memory hierarchy, among many others.
 The Machine: A new kind of computer. http://www.hpl.hp.com/research/systems-research/themachine/