Welcome to the EcoCloud 2015 Annual Event. This year’s event kicked off with a keynote followed by a poster session and cocktail on the evening of Monday June 22nd, 2015. On Tuesday June 23rd, 2015, EcoCloud researchers and industrial speakers highlighted the latest trends in energy-efficient data-centric technologies.
Raghu Ramakrishnan, Microsoft
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation.
Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the Map-Reduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit.
I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.
Raghu Ramakrishnan is a Technical Fellow in the Cloud and Enterprise (C&E) at Microsoft Corp. He focuses his work on big data and integration between C&E’s cloud offerings and the Online Services Division’s platform assets. He has more than 15 years of experience in the fields of database systems, data mining, search and cloud computing.
Ramakrishnan has been chief scientist for three divisions at Yahoo! Inc. over the past five years (Audience, Cloud Platforms, Search), as well as a Yahoo! Fellow leading applied science and research teams in Yahoo! Labs. He led the science teams for major Yahoo! initiatives, including the CORE personalization project, the PNUTS geo-replicated cloud service platform and the creation of Yahoo!’s Web of Objects through Web-scale information extraction. Before joining Yahoo! in 2006, he was member of the computer science faculty at University of Wisconsin-Madison since 1987, and was founder and chief technical officer of QUIQ, a company that pioneered crowd-sourced question-answering communities.
His work in database systems has influenced query optimization in commercial database systems and the design of window functions in SQL:1999. He has written the widely used text “Database Management Systems” (with Johannes Gehrke). Ramakrishnan has received several awards, including the ACM SIGKDD Innovations Award, the ACM SIGMOD Contributions Award and the 10-Year Test-of-Time Award, a Distinguished Alumnus Award from IIT Madras, and a Packard Foundation Fellowship. He is a Fellow of the ACM and IEEE, and serves on the steering committee of the ACM Symposium on Cloud Computing and the board of directors of ACM SIGKDD, and is a past chair of ACM SIGMOD and member of the board of trustees of the Very Large Data Base Endowment.
He earned a bachelor’s degree in electrical engineering from the Indian Institute of Technology Madras and a doctorate in computer science from the University of Texas at Austin. Ramakrishnan and his wife, Apu, brought up their two sons in Madison, Wis., where he taught for many years. He can attest that while you might freeze there, you would do so with a smile on your face. He likes to play tennis (both the lawn and table varieties) and read fiction, and tries to stay fit, with middling success.
This year’s industrial session will feature prominent speakers from the IT industry, including:
Cost efficiency is one of the major driving forces of cloud adoption. Dense and cost-effective storage is critical to cloud providers especially for storing large volumes of data on the cloud. In this talk, I will present storage technologies that enable cost efficiency for different types of cloud storage. Starting with the high-performance segment of storage, I’ll present SALSA, an I/O stack optimized for Flash, that can elevate the performance and endurance of low-cost, consumer Flash-based SSDs to meet datacenter requirements, thereby enabling all-Flash cloud storage at low cost. Next, I will talk about archival storage for the cloud, focusing on IceTier, a research prototype that enables the seamless integration of tape as an archival back-end to cloud object stores, offering dramatically reduced cost for cold data. Finally, I will present MCStore, a cloud gateway technology that enables traditional storage systems to take advantage of the cloud, thus bringing the merits of cost-effective cloud storage to the traditional datacenter.
Evangelos Eleftheriou received a B.S degree in Electrical Engineering from the University of Patras, Greece, in 1979, and M.Eng. and Ph.D. degrees in Electrical Engineering from Carleton University, Ottawa, Canada, in 1981 and 1985, respectively.
He joined the IBM Research – Zurich laboratory in Rüschlikon, Switzerland, as a Research Staff Member in 1986. Since 1998, he has held various management positions and currently heads the Cloud and Computing Infrastructure department of IBM Research – Zurich, which focuses on enterprise solid-state storage, storage for big data, microserver/cloud server and accelerator technologies, high-speed I/O links, storage security, and memory and cognitive technologies.
He holds over 100 patents (granted and pending applications). In 2002, he became a Fellow of the IEEE. He was co-recipient of the prestigious 2005 Technology Award of the Eduard Rhein Foundation in Germany. Also in 2005, he was appointed an IBM Fellow and inducted into the IBM Academy of Technology. In 2009, he was co-recipient of the IEEE CSS Control Systems Technology Award.
ASHRAE has for many years recommended that data center owners save energy by reducing air conditioning and warming up the data center. For the first 25 years of the computer industry, this was very sound advice because there was no relationship between the operating temperature of enterprise computing servers and their energy efficiency. But for the most recent ~4-5 years, this is no longer the case. Extensive Oracle research has demonstrated that with the latest generations of enterprise computing servers there are now very temperature-sensitive “energy wastage” mechanisms in IT server and storage systems that not only waste significant energy in warm data centers, but also degrade compute performance. This presentation shows novel Oracle temperature-aware algorithms that enable intelligent optimization of data center ambient temperatures to minimize or avoid these heretofore non-observable energy wastage mechanisms in IT systems. Oracle’s suite of “Energy Aware Data Center” (EADC) algorithms predict an optimal ambient temperature set point, decreasing energy wastage throughout the data center, significantly increasing overall compute performance, decreasing the carbon footprint, while increasing return-on-assets for data center owners.
Kenny Gross is a Distinguished Engineer for Oracle and researcher with the System Dynamics Characterization and Control team in Oracle’s Physical Sciences Research Center in San Diego. Kenny specializes in advanced pattern recognition, continuous system telemetry, and dynamic system characterization for improving the reliability, availability, and energy efficiency of enterprise computing systems, as well as the datacenters in which the systems are deployed. Kenny has 227 US patents issued and pending, 186 scientific publications, and was awarded a 1998 R&D 100 Award for one of the top 100 technological innovations of that year, for an advanced statistical pattern recognition technique that was originally developed for nuclear and aerospace applications and is now being used for a variety of applications to improve the quality-of-service, availability, and optimal energy efficiency for enterprise computer servers. Kenny’s Ph.D. is in nuclear engineering from the U. of Cincinnati.
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. With Impala, the academic and Hadoop communities now have an open-sourced codebase that helps query data stored in HDFS and Apache HBase in real time, using familiar SQL syntax. In contrast with other SQL-on-Hadoop initiatives, Impala’s operations are fast enough to do interactively on native Hadoop data rather than in long-running batch jobs.
This talk starts out with an overview of Impala from the user’s perspective, followed by a presentation of Impala’s architecture and implementation. It concludes with a summary of Impala’s benefits when compared with the available SQL-on-Hadoop alternatives.
Ippokratis Pandis is a software engineer at Cloudera working on the Impala project. Before joining Impala and Cloudera, Ippokratis was member of the research staff at IBM Almaden Research Center. At IBM, he was member of the core team that designed and implemented the BLU column-store engine, which currently ships as part of IBM’s DB2 LUW v10.5 with BLU Acceleration. Ippokratis received his PhD from the Electrical and Computer Engineering department at Carnegie Mellon University. He is the recipient of Best Demonstration awards at ICDE 2006 and SIGMOD 2011 and Best Paper award at CIDR 2013. He has served as PC chair of DaMon 2014 and DaMoN 2015.
As the number of on-die transistors continues to grow, new computing models are needed to utilize this growing compute capacity despite a clock-frequency scaling wall, and relatively sluggish improvements in I/O bandwidth. The spatial compute and programming model, as introduced by the OpenSPL specification, provides a method for taking advantage of compute capacity offered by current and trending hardware technology. With spatial computing, compute processing units are laid out in space (either physically or conceptually) and connected by flows of data. The result of this approach is compute implementations which are naturally highly-parallel and thus very effectively use modern transistor-rich hardware. In this talk, I will describe both the spatial computing model and a practical realization of the OpenSPL specification: Multiscale Dataflow Engines. Multiscale Dataflow Engines are a platform which directly implements the spatial computing model in hardware while at the same time supporting tight integration with conventional CPU-based compute resources.
Steve Pawlowski is Vice President of Advanced Computing Solutions at Micron Technology. He is responsible for defining and developing innovative memory solutions for the enterprise and high performance computing markets. Prior to joining Micron in July 2014, Mr. Pawlowski was a Senior Fellow and the Chief Technology Officer for Intel’s Data Center and Connected Systems Group. Mr. Pawlowski’s extensive industry experience includes 31 years at Intel, where he held several high-level positions and led teams in the design and development of next-generation system architectures and computing platforms. Mr. Pawlowski earned bachelor’s degrees in electrical engineering and computer systems engineering technology from the Oregon Institute of Technology and a master’s degree in computer science and engineering from the Oregon Graduate Institute. He also holds 58 patents.
This talk will highlight the challenges that big corporations face in embracing innovation: agility vs. legacy, dealing with the pace of innovation, digital transformation and its challenges for a company like AXA-Tech.
Daniele Tonella joined the AXA Group in 2013 as CEO of AXA Technology Services. In this role he is responsible for the overall vision, strategy and operation of AXA’s global IT infrastructure, including cloud and developer platforms, thus contributing to AXA’s digital transformation. Before joining AXA, Daniele was Global CIO of Evalueserve, a knowledge process outsourcing company headquartered in India. From 2002 to 2010 he held various IT leadership roles with increasing responsibility at Swiss Life, notably CTO and CIO. He spent the initial part of his professional career as a consultant, initially for Mercer Management Consulting and subsequently for McKinsey and Company. He is currently a member of the Foundation Board of the International Risk Governance Council (IRGC). Daniele Tonella was born in 1971 and is a Swiss citizen. He holds an engineering degree from the Swiss Federal Institute of Technology in Zurich (ETH).