This year’s industrial session will feature prominent speakers from the IT industry, including:
Scaling Neural Network Performance on Reconfigurable Logic
Michaela Blott, Xilinx
The ongoing research on Neural Networks has started to focus on reducing the computation and storage requirements to make their deployment feasible in energy constraint compute environments. One of the promising opportunities is the reduction of the compute and storage down to a few bit precision whereby these networks achieve close to state of the art accuracy compared to their floating point counterparts. In this talk, we will show an automated framework for implementing these reduced precision (and in the extreme case fully binarized) neural networks on reconfigurable logic that can scale reduced precision neural networks onto an FPGA-based inference accelerator, given a set of fixed design constraints.
We show, that the compute performance can scale well beyond typical floating point performance, currently demonstrating 10s of thousands to millions of images per second for inference, 14 TOPS compute performance with power consumption < 25W on today’s devices. Results on the accuracy, architecture comparison to other approaches and detailed implementation of the latest large networks will also be presented.
Michaela Blott graduated from the University of Kaiserslautern in Germany. She worked in both research institutions (ETH and Bell Labs) as well as development organizations and was deeply involved in large scale international collaborations such as NetFPGA-10G. Today, she works as a principal engineer at the Xilinx labs in Dublin heading a team of international researchers. She is leading Xilinx’s strategy in regards to bringing FPGAs into new markets and investigating system architectures with emerging memory technologies for different application domains. Her expertise spreads data centers, machine learning, high-speed networking, emerging memory technologies and distributed computing systems, with an emphasis on building complete implementations. She is on the industry advisor council to the Irish Center for High-End Computing (ICHEC), serves on the technical program committee of a numerous conferences (DATE, FPL, GLOBALSIP, Hipeac, PPoPP), organizer of SC’2015/6/7 workshop and industry review panel of various research projects.
Jan van Lunteren, IBM
Recent developments have caused data access and transfer to surpass actual data processing as the dominant cost factors in high-performance computing systems. As a result, interest has been triggered in hybrid computer systems that offload part of the processing to programmable near-memory accelerators that tightly integrate computation within the memory system. IBM’s POWER8® processors introduced open (OpenPOWER™) high-bandwidth interconnects with the ability to attach such accelerators. Its upcoming POWER9 processor with its new openCAPI™ interface will further extend those capabilities. Our team at IBM Research – Zurich is investigating novel ways to extend this data-centric concept beyond the reduction of expensive data transfers in order to improve power efficiency further. In this presentation, I will give an overview of our research activities, addressing in particular how the memory system can play a more important role in (dynamic) workload optimization by enabling different ways to adapt its operation to the accelerated workload behavior, and also how a novel accelerator architecture and programming model can be exploited to reduce the programmability overhead compared with that of traditional architectures.
Jan van Lunteren
Jan van Lunteren is Research staff member in the Cloud & Computing Infrastructure department at IBM Research – Zurich, which he joined in 1994. After his PhD research on advanced adaptive memory systems, he has been investigating and designing a broad range of accelerators, including search engines, packet classifiers, XML accelerators, protocol processing engines, and regular expression scanners. His current interests include near-memory processing, deep learning architectures, and high-performance programmable accelerators. Jan has an M.Sc. and a Ph.D. degree in Electrical Engineering from the Technical University of Eindhoven, The Netherlands. He holds more than 60 issued and pending patents.
AMPNet: Distributed Training for Flexible Neural Network Models
Ryota Tomioka, Microsoft
What if we started the history of deep learning with no graphics processing units (GPUs) but 100s of thousands of devices with limited fast on-chip memory connected to each other via network? What kind of models would we develop? What building blocks would we use to construct these models? How would we train a model on such a system? We explore these questions and propose that such a system may allow us to efficiently train a class of neural network models that exhibit complex and instance-dependent control flows that are known to be challenging for GPUs.
Ryota Tomioka is currently a researcher at Microsoft Research Cambridge, UK. Prior to that he was a research assistant professor at Toyota Technological Institute at Chicago, and an assistant professor at the university of Tokyo. His interest lies in the intersection of deep learning, optimization, and systems.
Declarative Query Processing in Imperative Managed Runtimes
Stratis Viglas, Google
New advances in memory technology, like scalable distributed and persistent memory, make it possible to have a truly universal storage model, accessed directly through the programming language in the context of a fully managed runtime. This environment is further enhanced by language-integrated query, which has picked up significant traction and has emerged as a generic, safe method of combining programming languages with databases with considerable software engineering benefits.
We present the results of our work in integrating database and programming language runtimes through code generation and extensive just-in-time adaptation. Our techniques deliver significant performance improvements over non-integrated solutions. Our work makes important first steps towards a future where data processing applications will commonly access datasets as if they were fully in-memory, and will be written in a single programming language employing higher-level APIs and language-integrated query to provide transparent and highly efficient data processing.
Stratis Viglas is an engineer at Google’s Data Infrastructure and Analysis group, working on building software abstractions for petascale query processing. He is also a Professor in the School of Informatics at the University of Edinburgh, UK where he holds the Chair of Data Management on New Hardware. He has made contributions to data stream processing, XML data management, query processing and optimization, and data management over flash memory. His current work involves integrating scalable managed runtimes with database systems through just-in-time compilation of SQL queries and incorporating technologies like remote memory access and non-volatile memory into the data processing stack.