Rapid Exploration of Accelerator-rich Architectures:

Automation from Concept to Prototyping

Full Day Tutorial: Saturday, June 13, 2015 at ISCA 2015

Hardware acceleration in the form of customized datapath and control circuitry tuned to specific applications has gained popularity for its promise to utilize transistors more efficiently. Historically, the computer architecture community has focused on general-purpose processors and extensive research infrastructure has been developed to support research efforts in this domain, such as power-performance modeling and workload characterization. Envisioning future computing systems with a diverse set of general-purpose cores and accelerators, researchers must add accelerator-centric research infrastructures to their toolboxes to explore the future heterogeneous, accelerator-centric systems.

In this tutorial, we discuss state-of-the-art research infrastructures available for accelerator research ranging from applications to power-performance simulation to hardware prototyping. The first half of the tutorial will be focusing on modeling and simulation. We start the tutorial with Aladdin, a pre-RTL, power-performance simulator for fixed-function accelerators to help computer architects explore the design space of accelerators. We further introduce two complementary system-level simulation frameworks: gem5-aladdin, an integration of Aladdin and gem5 and PARADE, an integration of high-level synthesis and gem5, to study the interaction between accelerators and the rest of the system. To help designers understand workload intrinsic characteristics, we continue the tutorial with WIICA, an ISA-independent workload characterization tool for accelerators.

The second half of the tutorial focuses on applications and prototyping. We will present two accelerator benchmark suites: MachSuite and Medical Imaging Benchmarks, followed by a discussion on recent advances in high-level synthesis tools. We will introduce an FPGA prototype flow and evaluation framework for Accelerator-Rich Architectures (ARA). A live demo will be demonstrated using a Zync FPGA cluster at UCLA.

We will organize a panel to discuss the accelerator roadmap for the computer architecture community before the hands-on exercise. We will invite researchers from both industry and academia to discuss what we need to do to build a healthy ecosystem to enable ARA research.

Panelists

Ameen Akel (Micron)
Chris Batten (Cornell)
Derek Chiou (UT-Austin/Microsoft)
Michael Kishinevsky (Intel)
Boris Ginzburg (Intel)

Tutorial Schedule

Slides

Introduction
Aladdin
HLS
PARADE
gem5-Aladdin
MachSuite
Medical Imaging Pipeline

Tutorial Organizers

Prof. David Brooks, Harvard University (dbrooks@eecs.harvard.edu)
Yu-Ting Chen, UCLA (ytchen@cs.ucla.edu)
Prof. Jason Cong, UCLA (cong@cs.ucla.edu)
Zhenman Fang, UCLA (zhenman@cs.ucla.edu)
Brandon Reagen, Harvard University (reagen@fas.harvard.edu)
Prof. Glenn Reinman, UCLA (reinman@cs.ucla.edu)
Yakun Sophia Shao, Harvard University (shao@eecs.harvard.edu)
Prof. Gu-Yeon Wei, Harvard University (guyeon@eecs.harvard.edu)
Sam Xi, Harvard University (samxi@eecs.harvard.edu)

Software Download

Aladdin: A pre-RTL, power-performance-area simulator for accelerators.

LLVM-Tracer: An LLVM optimization pass to print a dynamic LLVM IR trace.

MachSuite: A benchmark suite for accelerators.

WIICA: An ISA-independent workload characterization tool for accelerators.

Medical Imaging Pipeline: Tools and algorithms for developing an accelerated medical image processing pipeline.

References

Accelerator-rich CMPs: From Concept to Real Hardware
Yu-Ting Chen, Jason Cong, Mohammad Ali Ghodrat, Muhuan Huang, Chunyue Liu, Bingjun Xiao and Yi Zou,

International Conference on Computer Design (ICCD), Oct 2013.

Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao

Morgan & Claypool, July 2015.

ARACompiler: A Prototying Flow and Evaluation Framework for Accelerator-Rich Architectures
Yu-Ting Chen, Jason Cong and Bingjun Xiao,

International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2015.

PARADE: A Cycle-Accurate Full-System Simulation Platform for Accelerator-Rich Architectural Design and Exploration,
Jason Cong, Zhenman Fang, Michael Gill, Glenn Reinman

International Conference on Computer-Aided Design (ICCAD) 2015, to appear.

Toward Cache-Friendly Hardware Accelerators,
Yakun Sophia Shao, Sam Xi, Viji Srinivasan, Gu-Yeon Wei and David Brooks

HPCA Sensors and Cloud Architectures Workshop (SCAW), Feb 2015. [PDF] [slides] [bibtex]

Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware,
Brandon Reagen, Yakun Sophia Shao, Gu-Yeon Wei and David Brooks

International Symposium on Low Power Electronics and Design (ISLPED), Sept 2013. [PDF] [bibtex]

MachSuite: Benchmarks for Accelerator Design and Customized Architectures,
Brandon Reagen, Bob Adolf, Yakun Sophia Shao, Gu-Yeon Wei and David Brooks

International Symposium on Workload Characterization (IISWC), October 2014. [PDF] [bibtex]

Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures,
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei and David Brooks

International Symposium on Computer Architecture (ISCA), June 2014. [PDF] [slides] [bibtex]

ISA-Independent Workload Characterization and its Implications for Specialized Architectures,
Yakun Sophia Shao and David Brooks

International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2013. [PDF] [slides] [bibtex]