Welcome to the EcoCloud 2017 Annual Event. This year’s event kicked off with a keynote followed by a poster session and cocktail on the evening of Monday June 12th, 2017. On Tuesday June 13th, 2017, EcoCloud researchers and industrial speakers highlighted the latest trends in sustainable cloud computing technologies.
The data deluge caused by proliferation of connected data sources is causing an unprecedented imbalance in the ability of our IT infrastructure to access and mine data. While it is relatively easier to provision compute resource, it is much more challenging to provision the necessary data resources to feed them. This is true across the entire memory and storage hierarchy which is going through a profound transformation, with the appearance of “storage-class memory” technology in the mix. This talk discusses why we need a new approach to architect the memory and storage systems, what the motivating use cases are, the obstacles, and what we can do to address them. It then introduces the technology behind the recently announced Gen-Z consortium (www.genzconsortium.org), an open systems interconnect designed to provide memory semantic access to data and devices via direct-attached, switched, or fabric topologies.
Paolo Faraboschi is a Fellow and VP at Hewlett Packard Enterprise. His interests are at the intersection of system architecture and software. He is currently working on The Machine project, researching how we can build better memory-driven computing systems and how we can apply them towards Exascale. From 2010 to 2104, he worked on low-energy servers and HP project Moonshot. From 2004 to 2009, at HP Labs in Barcelona, he led a research activity on scalable system-level simulation and modelling. From 1995 to 2003, at HP Labs Cambridge, he was the principal architect of the Lx/ST200 family of VLIW cores, widely used in video SoCs and HP’s printers. Paolo is an IEEE Fellow and an active member of the computer architecture community. He is an author on 30 patents over 100 publications and the book “Embedded Computing: a VLIW approach”. Before joining HP in 1994, he received a Ph.D. in EECS from the University of Genoa, Italy.
This year’s industrial session will feature prominent speakers from the IT industry, including:
The ongoing research on Neural Networks has started to focus on reducing the computation and storage requirements to make their deployment feasible in energy constraint compute environments. One of the promising opportunities is the reduction of the compute and storage down to a few bit precision whereby these networks achieve close to state of the art accuracy compared to their floating point counterparts. In this talk, we will show an automated framework for implementing these reduced precision (and in the extreme case fully binarized) neural networks on reconfigurable logic that can scale reduced precision neural networks onto an FPGA-based inference accelerator, given a set of fixed design constraints.Michaela Blott, Xilinx
We show, that the compute performance can scale well beyond typical floating point performance, currently demonstrating 10s of thousands to millions of images per second for inference, 14 TOPS compute performance with power consumption < 25W on today’s devices. Results on the accuracy, architecture comparison to other approaches and detailed implementation of the latest large networks will also be presented.
Michaela Blott graduated from the University of Kaiserslautern in Germany. She worked in both research institutions (ETH and Bell Labs) as well as development organizations and was deeply involved in large scale international collaborations such as NetFPGA-10G. Today, she works as a principal engineer at the Xilinx labs in Dublin heading a team of international researchers. She is leading Xilinx’s strategy in regards to bringing FPGAs into new markets and investigating system architectures with emerging memory technologies for different application domains. Her expertise spreads data centers, machine learning, high-speed networking, emerging memory technologies and distributed computing systems, with an emphasis on building complete implementations. She is on the industry advisor council to the Irish Center for High-End Computing (ICHEC), serves on the technical program committee of a numerous conferences (DATE, FPL, GLOBALSIP, Hipeac, PPoPP), organizer of SC’2015/6/7 workshop and industry review panel of various research projects.
Recent developments have caused data access and transfer to surpass actual data processing as the dominant cost factors in high-performance computing systems. As a result, interest has been triggered in hybrid computer systems that offload part of the processing to programmable near-memory accelerators that tightly integrate computation within the memory system. IBM’s POWER8® processors introduced open (OpenPOWER™) high-bandwidth interconnects with the ability to attach such accelerators. Its upcoming POWER9 processor with its new openCAPI™ interface will further extend those capabilities. Our team at IBM Research – Zurich is investigating novel ways to extend this data-centric concept beyond the reduction of expensive data transfers in order to improve power efficiency further. In this presentation, I will give an overview of our research activities, addressing in particular how the memory system can play a more important role in (dynamic) workload optimization by enabling different ways to adapt its operation to the accelerated workload behavior, and also how a novel accelerator architecture and programming model can be exploited to reduce the programmability overhead compared with that of traditional architectures.
Jan van Lunteren is Research staff member in the Cloud & Computing Infrastructure department at IBM Research – Zurich, which he joined in 1994. After his PhD research on advanced adaptive memory systems, he has been investigating and designing a broad range of accelerators, including search engines, packet classifiers, XML accelerators, protocol processing engines, and regular expression scanners. His current interests include near-memory processing, deep learning architectures, and high-performance programmable accelerators. Jan has an M.Sc. and a Ph.D. degree in Electrical Engineering from the Technical University of Eindhoven, The Netherlands. He holds more than 60 issued and pending patents.
What if we started the history of deep learning with no graphics processing units (GPUs) but 100s of thousands of devices with limited fast on-chip memory connected to each other via network? What kind of models would we develop? What building blocks would we use to construct these models? How would we train a model on such a system? We explore these questions and propose that such a system may allow us to efficiently train a class of neural network models that exhibit complex and instance-dependent control flows that are known to be challenging for GPUs.
Ryota Tomioka is currently a researcher at Microsoft Research Cambridge, UK. Prior to that he was a research assistant professor at Toyota Technological Institute at Chicago, and an assistant professor at the university of Tokyo. His interest lies in the intersection of deep learning, optimization, and systems.
Stratis ViglasNew advances in memory technology, like scalable distributed and persistent memory, make it possible to have a truly universal storage model, accessed directly through the programming language in the context of a fully managed runtime. This environment is further enhanced by language-integrated query, which has picked up significant traction and has emerged as a generic, safe method of combining programming languages with databases with considerable software engineering benefits.
We present the results of our work in integrating database and programming language runtimes through code generation and extensive just-in-time adaptation. Our techniques deliver significant performance improvements over non-integrated solutions. Our work makes important first steps towards a future where data processing applications will commonly access datasets as if they were fully in-memory, and will be written in a single programming language employing higher-level APIs and language-integrated query to provide transparent and highly efficient data processing.
Stratis Viglas is an engineer at Google’s Data Infrastructure and Analysis group, working on building software abstractions for petascale query processing. He is also a Professor in the School of Informatics at the University of Edinburgh, UK where he holds the Chair of Data Management on New Hardware. He has made contributions to data stream processing, XML data management, query processing and optimization, and data management over flash memory. His current work involves integrating scalable managed runtimes with database systems through just-in-time compilation of SQL queries and incorporating technologies like remote memory access and non-volatile memory into the data processing stack.