Tarek El-Ghazawi
The George Washington University, Washington DC

 Reconfigurable Computing with Nanophotonics

Keynote Video Is Now Available!
More information can be found here.

The virtual conference format is new to us, so please
report any website issues to

Technical Program

Session Title Authors
Messages from the Chairs
Message from the General Chair David Andrews (University of Arkansas)
Message from the Program Chair Ken Eguro (Microsoft)
In Memory of Mike Butts
Mike Butts Memorial Jan Gray (Gray Research LLC) and Nachiket Kapre (University of Waterloo)
Share Your Favorite Stories and Memories
Paper Session 1:
Machine Learning
Best Paper Award Nominee High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression Hiroki Nakahara (Tokyo Institute of Technology), Zhiqiang Que (Imperial College London), and Wayne Luk (Imperial College London)
Optimizing Reconfigurable Recurrent Neural Networks Zhiqiang Que (Imperial College London), Hiroki Nakahara (Tokyo Institute of Technology), Eriko Nurvitadhi (Intel Corporation), Hongxiang Fan (Imperial College London), Chenglong Zeng (Corerain Technologies Ltd.), Jiuxi Meng (Imperial College London), Xinyu Niu (Corerain Technologies Ltd.), and Wayne Luk (Imperial College London)
Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms Yuan Meng (University of Southern California), Sanmukh Kuppannagari (University of Southern California), and Viktor Prasanna (University of Southern California)
Short Paper: Evaluating Low-Memory GEMMs for Convolutional Neural Network Inference on FPGAs Wentai Zhang (Peking University), Ming Jiang (Peking University), and Guojie Luo (Peking University)
Short Paper: CNN-based Feature-point Extraction for Real-time Visual SLAM on Embedded FPGA Zhilin Xu (Tsinghua University), Jincheng Yu (Tsinghua University), Chao Yu (Tsinghua University), Hao Shen (Meituan-Dianping Group), Yu Wang (Tsinghua University), and Huazhong Yang (Tsinghua University)
Paper Session 2:
Networks and Security
Corundum: An Open-Source 100-Gbps NIC Alex Forencich (University of California, San Diego), Alex C. Snoeren (University of California, San Diego), George Porter (University of California, San Diego), and George Papen (University of California, San Diego)
FFShark: A 100G FPGA Implementation of BPF Filtering for Wireshark Juan Camilo Vega (University of Toronto), Marco Antonio Merlini (University of Toronto), and Paul Chow (University of Toronto)
Hardware Architecture of a Number Theoretic Transform for a Bootstrappable RNS-based Homomorphic Encryption Scheme Sunwoong Kim (University of Washington), Keewoo Lee (Seoul National University), Wonhee Cho (Seoul National University), Yujin Nam (Seoul National University), Jung Hee Cheon (Seoul National University), and Rob A. Rutenbar (University of Pittsburgh)
Short Paper: Power-hammering through Glitch Amplification – Attacks and Mitigation Kaspar Matas (The University of Manchester), Tuan Minh La (The University of Manchester), Khoa Dang Pham (The University of Manchester), and Dirk Koch (The University of Manchester)
Short Paper: Exploring the Impact of Switch Arity on Butterfly Fat Tree FPGA NoCs Ian Lang (University of Waterloo), Ziqiang Huang (University of Waterloo), and Nachiket Kapre (University of Waterloo)
Paper Session 3:
Best Paper Award Winner Comparison of Arithmetic Number Formats for Inference in Sum-Product Networks on FPGAs Lukas Sommer (TU Darmstadt), Lukas Weber (TU Darmstadt), Martin Kumm (Fulda University of Applied Sciences), and Andreas Koch (TU Darmstadt)
High Density 8-bit Multiplier Systolic Arrays for FPGA Martin Langhammer (Intel Corporation), Sergey Gribok (Intel Corporation), and Gregg Baeckler (Intel Corporation)
Low-Cost Approximate Constant Coefficient Hybrid Binary-Unary Multiplier for DSP Applications S. Rasoul Faraji (University of Minnesota, Twin Cities), Pierre Abillama (University of Minnesota, Twin Cities), and Kia Bazargan (University of Minnesota, Twin Cities)
Paper Session 4:
Virtualization, HBM, and Soft Processors
Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud Shulin Zeng (Tsinghua University), Guohao Dai (Tsinghua University), Hanbo Sun (Tsinghua University), Kai Zhong (Tsinghua University), Guangjun Ge (Tsinghua University), Kaiyuan Guo (Tsinghua University), Yu Wang (Tsinghua University), and Huazhong Yang (Tsinghua University)
Shuhai: Benchmarking High Bandwidth Memory on FPGAs Zeke Wang (Zhejiang University), Hongjing Huang (Zhejiang University), Jie Zhang (Zhejiang University), and Gustavo Alonso (ETH Zurich)
Exploring Writeback Designs for Efficiently Leveraging Parallel-Execution Units in FPGA-Based Soft-Processors Eric Matthews (Simon Fraser University), Yuhui Gao (Simon Fraser University), and Lesley Shannon (Simon Fraser University)
Paper Session 5:
Safely Preventing Unbounded Delays During Bus Transactions in FPGA-based SoC Francesco Restuccia (Scuola Superiore Sant’Anna), Alessandro Biondi (Scuola Superiore Sant’Anna), Mauro Marinoni (Scuola Superiore Sant’Anna), and Giorgio Buttazzo (Scuola Superiore Sant’Anna)
Best Paper Award Nominee Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs Reza Rahimi (University of Virginia), Elaheh Sadredini (University of Virginia), Mircea Stan (University of Virginia), and Kevin Skadron (University of Virginia)
FP-AMG: FPGA-Based Acceleration Framework for Algebraic Multigrid Solvers Pouya Haghi (Boston University), Tong Geng (Boston University), Anqi Guo (Boston University), Tianqi Wang (University of Science and Technology of China), and Martin Herbordt (Boston University)
Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit Michael Lo (University of California, Los Angeles), Zhenman Fang (Simon Fraser University), Jie Wang (University of California, Los Angeles), Peipei Zhou (University of California, Los Angeles), Mau-Chung Frank Chang (University of California, Los Angeles), and Jason Cong (University of California, Los Angeles)
Short Paper: A Turbo Maximum-a-Posteriori Equalizer for Faster-than-Nyquist Applications Mohamed Omran Matar (University of British Columbia), Mrinmoy Jana (University of British Columbia), Jeebak Mitra (Huawei Canada), Lutz Lampe (University of British Columbia), and Mieszko Lis (University of British Columbia)
Short Paper: FPGA-accelerated Automatic Alignment for Three-dimensional Tomography Shuang Wen (Peking University) and Guojie Luo (Peking University)
Paper Session 6:
HLS and Tooling
Artisan: a Meta-Programming Approach For Codifying Optimisation Strategies Jessica Vandebon (Imperial College London), Jose G. F. Coutinho (Imperial College London), Wayne Luk (Imperial College London), Eriko Nurvitadhi (Intel Corporation), and Tim Todman (Imperial College London)
Hierarchical Modelling of Generators in Design-Space Exploration Charles Lo (University of Toronto) and Paul Chow (University of Toronto)
Investigating Performance Losses in High-Level Synthesis for Stencil Computations Wesson Altoyan (Stanford University) and Juan J. Alonso (Stanford University)
Poster Session 1:
Arithmetic and Security
Proposing a Fast and Scalable Systolic Array for Matrix Multiplication Bahar Asgari (Georgia Institute of Technology), Ramyad Hadidi (Georgia Institute of Technology), and Hyesoon Kim (Georgia Institute of Technology)
An Automated Tool for Design Space Exploration of Matrix Vector Multiplication (MVM) Kernels Using OpenCL Based Implementation on FPGAs Jannatun Naher (North Carolina A & T State University), Clay Gloster (North Carolina A & T State University), Christopher C. Doss (North Carolina A & T State University), and Shrikanth S. Jadhav (North Carolina A & T State University)
Fast Arithmetic Hardware Library For RLWE-Based Homomorphic Encryption Rashmi Agrawal (Boston University), Lake Bu (The Charles Stark Draper Laboratory), and Michel A. Kinsy (Boston University)
Primitive Instantiation for Speed-Area Efficient Architecture Design of Cellular Automata based Mageto Logic on FPGA with Built-In Testability Ayan Palchaudhuri (Indian Institute of Technology Kharagpur), and Anindya Sundar Dhar (Indian Institute of Technology Kharagpur)
TBOX-Based Mask Scrambling Against SCA João Carlos Resende (Universidade de Lisboa), Ricardo J. R. Maçãs (Universidade de Lisboa), and Ricardo Chaves (Universidade de Lisboa)
FPGA Implementation of Post-Quantum DME Cryptosystem José L. Imaña (Complutense University) and Ignacio Luengo (Complutense University)
A Dynamic Frequency Scaling Framework Against Reliability and Security Issues in Multi-tenant FPGA Yukui Luo (University of Illinois at Chicago) and Xiaolin Xu (University of Illinois at Chicago)
Poster Session 2:
Datacenter and Infrastructure
SHIP: Storage for Hybrid Interconnected Processors Juan Camilo Vega (University of Toronto), Qianfeng (Clark) Shen (University of Toronto), and Paul Chow (University of Toronto)
RISC-V Barrel Processor for Accelerator Control MohammadHossein AskariHemmat (Ecole Polytechnique Montreal), Olexa Bilaniuk (University of Montreal), Sean Wagner (IBM Canada), Yvon Savaria (Ecole Polytechnique Montreal), and Jean-Pierre David (Ecole Polytechnique Montreal)
Update Latency Optimization of Packet Classification for SDN Switch on FPGA Chenglong Li (National University of Defense Technology), Tao Li (National University of Defense Technology), Junnan Li (National University of Defense Technology), Zilin Shi (National University of Defense Technology), and Baosheng Wang (National University of Defense Technology)
Accommodating Multi-Tenant FPGAs in the Cloud Joel Mandebi Mbongue (University of Florida) and Christophe Bobda (University of Florida)
Accelerating MPI Collectives with FPGAs in the Network and Novel Communicator Support Qingqing Xiong (Boston University), Chen Yang (Boston University), Pouya Haghi (Boston University), Anthony Skjellum (University of Tennessee at Chattanooga), and Martin Herbordt (Boston University)
MeXT-SE: A System-Level Design Tool to Transparently Generate Secure MPSoC Md Jubaer Hossain Pantho (University of Florida) and Christophe Bobda (University of Florida)
Poster Session 3:
Early-stage Automated Identification Tool for Shared Accelerators Parnian Mokri (Tufts University) and Mark Hempstead (Tufts University)
An Analytical Model of Memory-Bound Applications Compiled with High Level Synthesis Maria A. Dávila-Guzmán (Universidad de Zaragoza), Rubén Gran Tejero (Universidad de Zaragoza), María Villarroya-Gaudó (Universidad de Zaragoza), and Darío Suárez Gracia (Universidad de Zaragoza)
FPGA Virtualization for Deprecated Devices Ian D. Taras (University of Toronto) and Andrew G. Schmidt (University of Southern California)
ZRLMPI: A Unified Programming Model for Reconfigurable Heterogeneous Computing Clusters Burkhard Ringlein (Friedrich-Alexander University Erlangen-Nürnberg), Francois Abel (IBM Research Europe), Alexander Ditter (Friedrich-Alexander University Erlangen-Nürnberg), Beat Weiss (IBM Research Europe), Christoph Hagleitner (IBM Research Europe), and Dietmar Fey (Friedrich-Alexander University Erlangen-Nürnberg)
Designing Domain Specific Computing Systems Anthony M. Cabrera (Washington University in St. Louis) and Roger D. Chamberlain (Washington University in St. Louis)
Poster Session 4:
Applications and Architectures
Improving the Availability of Secure Space Links through the Partial Reconfiguration of FPGAs Emmanuel Lesser (European Space Agency)
An FPGA-Optimized Architecture of Real-time Farneback Optical Flow Zhe Pan (Zhejiang University), Yuruo Jin (Zhejiang University), Xiaohong Jiang (Zhejiang University), and Jian Wu (Zhejiang University)
High-Performance Parallel Radix Sort on FPGA Bashar Romanous (University of California, Riverside), Mohammadreza Rezvani (University of California, Riverside), Junjie Huang (University of California, Riverside), Daniel Wong (University of California, Riverside), Evangelos E. Papalexakis (University of California, Riverside), Vassilis J. Tsotras (University of California, Riverside), and Walid Najjar (University of California, Riverside)
FPGA-Based Gesture Recognition with Capacitive Sensor Array using Recurrent Neural Networks Haoyan Liu (University of Arkansas), Atiyehsadat Panahi (University of Arkansas), David Andrews (University of Arkansas), and Alexander Nelson (University of Arkansas)
Gbit/s Non-Binary LDPC Decoders: High-Throughput using High-Level Specifications Oscar Ferraz (University of Coimbra), Srinivasan Subramaniyan (Amrita Vishwa Vidyapeetham), Guohui Wang (Rice University), Joseph R. Cavallaro (Rice University), Gabriel Falcao (University of Coimbra), and Madhura Purnaprajna (Amrita Vishwa Vidyapeetham)
A Quaternary FPGA Architecture Using Floating Gate Memories Ayokunle Fadamiro (Auburn University), Pouyan Rezaie (Auburn University), Christopher Harris (Auburn University), and Spencer Millican (Auburn University)
Rotary Register File: A Micro-Architectural Primitive on FPGA Reza Nakhjavani (University of Toronto) and Jianwen Zhu (University of Toronto)
Poster Session 5:
Machine Learning 1
Tiny On-Chip Memory Realization of Weight Sparseness Split-CNNs on Low-end FPGAs Akira Jinguji (Tokyo Institute of Technology), Shimpei Sato (Tokyo Institute of Technology), and Hiroki Nakahara (Tokyo Institute of Technology)
An Efficient FPGA-based Architecture for Contractive Autoencoders Madis Kerner (Tallinn University of Technology), Kalle Tammemäe (Tallinn University of Technology), Jaan Raik (Tallinn University of Technology), and Thomas Hollstein (Tallinn University of Technology)
Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing Akshay Dua (Arizona State University), Yixing Li (Arizona State University), and Fengbo Ren (Arizona State University)
Explore Efficient LUT-based Architecture for Quantized Convolutional Neural Networks on FPGA Yanpeng Cao (Southeast University), Chengcheng Wang (Southeast University), and Yongming Tang (Southeast University)
Realization of Quantized Neural Network for Super-resolution on PYNQ Feng Yu (Southeast University), Yanpeng Cao (Southeast University), and Yongming Tang (Southeast University)
Scalable Full Hardware Logic Architecture for Gradient Boosted Tree Training Tamon Sadasue (RICOH Company) and Tsuyoshi Isshiki (Tokyo Institute of Technology)
Optimized Distribution of an Accelerated Convolutional Neural Network across Multiple FPGAs Alaa Maarouf (American University of Beirut), Nour El Droubi (American University of Beirut), Raghid Morcel (American University of Beirut), Hazem Hajj (American University of Beirut), Mazen A. R. Saghir (American University of Beirut), and Haitham Akkary (American University of Beirut)
Poster Session 6:
Machine Learning 2
SqueezeJet-3: An Accelerator Utilizing FPGA MPSoCs for Edge CNN Applications Panagiotis Mousouliotis (Aristotle University of Thessaloniki), Ioannis Papaefstathiou (Aristotle University of Thessaloniki), and Loukas Petrou (Aristotle University of Thessaloniki)
Automatic Generation of FPGA Kernels From Open Format CNN Models Dimitrios Danopoulos (NTUA), Christoforos Kachris (Democritus University of Thrace), and Dimitrios Soudris (NTUA)
High-Throughput DNN Inference with LogicNets Yaman Umuroglu (Xilinx Research Labs), Yash Akhauri (Xilinx Research Labs), Nicholas J. Fraser (Xilinx Research Labs), and Michaela Blott (Xilinx Research Labs)
AIgean: An Open Framework for Machine Learning on Heterogeneous Clusters Naif Tarafdar (University of Toronto), Giuseppe Di Guglielmo (Columbia University), Philip C Harris (Massachusetts Institute of Technology), Jeffrey D Krupa (Massachusetts Institute of Technology), Vladimir Loncar (CERN), Dylan S Rankin (Massachusetts Institute of Technology), Nhan Tran (Fermilab), Zhenbin Wu (University of Illinois), Qianfeng Shen (University of Toronto), and Paul Chow (University of Toronto)
FPGA Based High-Throughput Real-Time Feature Extraction for Modulation Classification Joshua Mack (University of Arizona) and Ali Akoglu (University of Arizona)
Accelerating Large Scale GCN Inference on FPGA Bingyi Zhang (University of Southern California), Hanqing Zeng (University of Southern California), and Viktor Prasanna (University of Southern California)
EASpiNN: Effective Automated Spiking Neural Network Evaluation on FPGA Sathish Panchapakesan (Simon Fraser University), Zhenman Fang (Simon Fraser University), and Nitin Chandrachoodan (Indian Institute of Technology - Madras)
A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs Ning Li (Tsinghua University), Leibo Liu (Tsinghua University), Shaojun Wei (Tsinghua University), and Shouyi Yin (Tsinghua University)

The virtual conference format is new to us, so please
report any website issues to


All times shown in Pacific Standard Time (UTC-8)

May 6-8 (Wednesday – Friday)

FPGA Tutorials will be held on May 6 through May 8.

Please use the Registration Links to register for each tutorial. All tutorials will send emails to registered participants.

Intel FPGA Clouds for Academic Research and Teaching.

Time: May 6, 11:00 AM – 12:00 PM CDT

Organizer: Intel
Flyer: PDF

This workshop presents Intel FPGA clouds (HARP and DevCloud) for academic research and teaching. These clouds offer academics access to server nodes with one or more Intel FPGAs per node, along with development tools to facilitate productive research and teaching. We will provide overview of these clouds, tutorial on how to use them, and highlight several projects that have successfully utilized these clouds (e.g., from recent published FPGA work).

The Future of FPGA-Acceleration in Cloud and Datacenters

Time: May 6, 11 AM – 5:30 PM CDT

Organizers: Christophe Bobda (University of Florida) and Peter Hofstee (IBM)
Flyer: PDF
Detailed Program: Abstracts & Bios
Recorded Videos Will be Available Soon!

Field-Programmable Gate Arrays (FPGAs) are becoming integral components of general purpose heterogeneous cloud computing systems and datacenters due to their ability to serve as energy-efficient domain customizable accelerators. All major players such as Microsoft, Amazon, Intel, Baidu, Huawei and IBM now expose FPGAs to application developers in their cloud and datacenter infrastructures. Besides commercial infrastructure, a growing number of projects are on the way across the globe, in academia and other research organizations to provide the benefit of acceleration and flexibility remotely to users. Current developments are taking place in closed door and company and institution disclose very little on the challenges they encounter as well as the approach currently used to tackle those challenges. This workshop will bring experts in various fields around cloud, FPGA, computer architecture and applications to 1) discuss the status FPGA-acceleration in cloud computers and 2) explore the future and challenges in broad adoption of FPGAs in datacenter.

All times shown in Central Daylight Time (UTC-5)

Time Title Presenter Slides
11:00 Opening Peter Hofstee, IBM - Austin, TX
11:05 NSF Funding Opportunities and Priorities in CNS Erik Brunvand, NSF Available Here
11:30 The Future of FPGAs Needs Open Middleware Now Paul Chow, University of Toronto Available Here
12:00 Secure and Virtualized FPGA Management for FPGAs in Cloud and Datacenters Dirk Koch, University of Manchester
12:30 cloudFPGA: Promote FPGAs to 1st Citizen in the Cloud Francois Abel, IBM Research Europe Available Here
1:00 Break
1:15 The Open Cloud FPGA Testbed: Supporting Experiments on Emerging Datacenter Configurations Martin Herbordt, Boston University and Miriam Leeser, Northeastern University Available Here
1:45 openRole: Do we need a POSIX for FPGAs? Burkhard Ringlein, IBM Research Europe Available Here
2:15 Security and Privacy Concerns for the FPGA-Accelerated Cloud and Datacenters Russell Tessier, University of Massachusetts Amherst Available Here
2:45 Cloud-scale Key Value Store in FPGA John W Lockwood, Algo-Logic Available Here
3:15 Break
3:30 Powering Cloud and Datacenters with Xilinx Adaptive Compute Acceleration platforms Cathal McCabe, Xilinx Available Here
4:00 Global-Scale FPGA-Accelerated Deep Learning Inference with Microsoft's Project Brainwave Gabriel Weisz, Microsoft Available Here
4:30 Single-Tenant Cloud FPGA Security Jakub Szefer, Yale University Available Here
5:00 Gator Reconfigurable Cloud Computing: Hardware Virtualization Challenges Christophe Bobda, University of Florida Available Here

Compute Acceleration Workflow using Vitis and PYNQ

May 7 and May 8
**Same times both days**
10:00 AM – 1:00 PM (Presentation)
1:00 PM – 4:00 PM (Lab)

Organizer: Xilinx

Xilinx has recently introduced open-source free-downloadable Vitis unified software platform which enables the development of embedded software and accelerated applications on heterogeneous Xilinx platforms including FPGAs, SoCs, and Versal ACAPs. It provides a unified programming model for accelerating edge, cloud, and hybrid computing application. Vitis allows integration of high-level frameworks, development in C, C++, or Python using accelerated libraries or use of RTL-based accelerators & low-level runtime APIs for more fine-grained control over implementation.

Xilinx provides datacenter centric development boards (Alveo) suitable for application acceleration which are well supported by Vitis. Recently, Xilinx’s open-source project, PYNQ, has also been ported to Alveo boards using Vitis and Python. In this tutorial you’ll get your hands on Vitis and PYNQ to experience their compute acceleration features.

Tentative Agenda

May 7 Lecture Session
10:00 AM - 1:00 PM CDT
Webinar Introduction Agenda, outline
Introduction to Vitis Introduce open-source unified software platform environment and supported boards
Execution Model Understand how XRT works
Vitis Tool Flows Discuss Makefile and GUI flows
Design Analysis Understanding profiling and timing analysis reports
Optimization Methodology Describe optimization methodology for accelerating applications
May 7 Lab Session
1:00 PM - 4:00 PM CDT
Connecting to AWS Use provided credentials to connect to AWS 
GUI Flow Use Vitis GUI IDE to develop application 
Improving Performance Use memory targeting and other techniques to improve performance
May 8 Lecture Session
10:00 AM - 1:00 PM CDT
Host Code Optimization Describe techniques for optimizing the host program
Kernel Optimization  Describe techniques for developing a high-performance kernel
RTL Kernel Wizard How to integrate your own IP, developed using HDL, in Vitis application development environment
Debugging Various methods and components of hardware/software debugging
Vitis Accelerated Libraries Describe accelerated libraries structure and that are available for domain-specific and common applications
PYNQ for Compute Acceleration Learn how PYNQ can simplify driving acceleration kernels in the datacenter.
May 8 Lab Session
1:00 PM - 4:00 PM CDT
Optimization Lab Use DATAFLOW and PIPELING techniques to optimize an application
RTL Kernel Lab Use Vitis RTL Kernel wizard to develop and integrate custom IP developed in HDL
Debug Lab Perform hardware/software debugging
Computer Vision Lab Use Vitis opencv support to develop a vision application
PYNQ Lab Using JupyterLab this lab will show how to develop and optimize Python code targeting Vitis designs