Tarek El-Ghazawi
The George Washington University, Washington DC

 Reconfigurable Computing with Nanophotonics

Keynote Video Is Now Available!
More information can be found here.

The virtual conference format is new to us, so please
report any website issues to

Technical Program

Messages from the Chairs
Message from the General ChairDavid Andrews (University of Arkansas)
Message from the Program ChairKen Eguro (Microsoft)
In Memory of Mike Butts
Mike Butts MemorialJan Gray (Gray Research LLC) and Nachiket Kapre (University of Waterloo)
Share Your Favorite Stories and Memories
Paper Session 1:
Machine Learning
Best Paper Award Nominee High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG CompressionHiroki Nakahara (Tokyo Institute of Technology), Zhiqiang Que (Imperial College London), and Wayne Luk (Imperial College London)
Optimizing Reconfigurable Recurrent Neural NetworksZhiqiang Que (Imperial College London), Hiroki Nakahara (Tokyo Institute of Technology), Eriko Nurvitadhi (Intel Corporation), Hongxiang Fan (Imperial College London), Chenglong Zeng (Corerain Technologies Ltd.), Jiuxi Meng (Imperial College London), Xinyu Niu (Corerain Technologies Ltd.), and Wayne Luk (Imperial College London)
Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous PlatformsYuan Meng (University of Southern California), Sanmukh Kuppannagari (University of Southern California), and Viktor Prasanna (University of Southern California)
Short Paper: Evaluating Low-Memory GEMMs for Convolutional Neural Network Inference on FPGAsWentai Zhang (Peking University), Ming Jiang (Peking University), and Guojie Luo (Peking University)
Short Paper: CNN-based Feature-point Extraction for Real-time Visual SLAM on Embedded FPGAZhilin Xu (Tsinghua University), Jincheng Yu (Tsinghua University), Chao Yu (Tsinghua University), Hao Shen (Meituan-Dianping Group), Yu Wang (Tsinghua University), and Huazhong Yang (Tsinghua University)
Paper Session 2:
Networks and Security
Corundum: An Open-Source 100-Gbps NICAlex Forencich (University of California, San Diego), Alex C. Snoeren (University of California, San Diego), George Porter (University of California, San Diego), and George Papen (University of California, San Diego)
FFShark: A 100G FPGA Implementation of BPF Filtering for WiresharkJuan Camilo Vega (University of Toronto), Marco Antonio Merlini (University of Toronto), and Paul Chow (University of Toronto)
Hardware Architecture of a Number Theoretic Transform for a Bootstrappable RNS-based Homomorphic Encryption SchemeSunwoong Kim (University of Washington), Keewoo Lee (Seoul National University), Wonhee Cho (Seoul National University), Yujin Nam (Seoul National University), Jung Hee Cheon (Seoul National University), and Rob A. Rutenbar (University of Pittsburgh)
Short Paper: Power-hammering through Glitch Amplification – Attacks and MitigationKaspar Matas (The University of Manchester), Tuan Minh La (The University of Manchester), Khoa Dang Pham (The University of Manchester), and Dirk Koch (The University of Manchester)
Short Paper: Exploring the Impact of Switch Arity on Butterfly Fat Tree FPGA NoCsIan Lang (University of Waterloo), Ziqiang Huang (University of Waterloo), and Nachiket Kapre (University of Waterloo)
Paper Session 3:
Best Paper Award Winner Comparison of Arithmetic Number Formats for Inference in Sum-Product Networks on FPGAsLukas Sommer (TU Darmstadt), Lukas Weber (TU Darmstadt), Martin Kumm (Fulda University of Applied Sciences), and Andreas Koch (TU Darmstadt)
High Density 8-bit Multiplier Systolic Arrays for FPGAMartin Langhammer (Intel Corporation), Sergey Gribok (Intel Corporation), and Gregg Baeckler (Intel Corporation)
Low-Cost Approximate Constant Coefficient Hybrid Binary-Unary Multiplier for DSP ApplicationsS. Rasoul Faraji (University of Minnesota, Twin Cities), Pierre Abillama (University of Minnesota, Twin Cities), and Kia Bazargan (University of Minnesota, Twin Cities)
Paper Session 4:
Virtualization, HBM, and Soft Processors
Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the CloudShulin Zeng (Tsinghua University), Guohao Dai (Tsinghua University), Hanbo Sun (Tsinghua University), Kai Zhong (Tsinghua University), Guangjun Ge (Tsinghua University), Kaiyuan Guo (Tsinghua University), Yu Wang (Tsinghua University), and Huazhong Yang (Tsinghua University)
Shuhai: Benchmarking High Bandwidth Memory on FPGAsZeke Wang (Zhejiang University), Hongjing Huang (Zhejiang University), Jie Zhang (Zhejiang University), and Gustavo Alonso (ETH Zurich)
Exploring Writeback Designs for Efficiently Leveraging Parallel-Execution Units in FPGA-Based Soft-ProcessorsEric Matthews (Simon Fraser University), Yuhui Gao (Simon Fraser University), and Lesley Shannon (Simon Fraser University)
Paper Session 5:
Safely Preventing Unbounded Delays During Bus Transactions in FPGA-based SoCFrancesco Restuccia (Scuola Superiore Sant’Anna), Alessandro Biondi (Scuola Superiore Sant’Anna), Mauro Marinoni (Scuola Superiore Sant’Anna), and Giorgio Buttazzo (Scuola Superiore Sant’Anna)
Best Paper Award Nominee Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAsReza Rahimi (University of Virginia), Elaheh Sadredini (University of Virginia), Mircea Stan (University of Virginia), and Kevin Skadron (University of Virginia)
FP-AMG: FPGA-Based Acceleration Framework for Algebraic Multigrid SolversPouya Haghi (Boston University), Tong Geng (Boston University), Anqi Guo (Boston University), Tianqi Wang (University of Science and Technology of China), and Martin Herbordt (Boston University)
Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKitMichael Lo (University of California, Los Angeles), Zhenman Fang (Simon Fraser University), Jie Wang (University of California, Los Angeles), Peipei Zhou (University of California, Los Angeles), Mau-Chung Frank Chang (University of California, Los Angeles), and Jason Cong (University of California, Los Angeles)
Short Paper: A Turbo Maximum-a-Posteriori Equalizer for Faster-than-Nyquist ApplicationsMohamed Omran Matar (University of British Columbia), Mrinmoy Jana (University of British Columbia), Jeebak Mitra (Huawei Canada), Lutz Lampe (University of British Columbia), and Mieszko Lis (University of British Columbia)
Short Paper: FPGA-accelerated Automatic Alignment for Three-dimensional TomographyShuang Wen (Peking University) and Guojie Luo (Peking University)
Paper Session 6:
HLS and Tooling
Artisan: a Meta-Programming Approach For Codifying Optimisation StrategiesJessica Vandebon (Imperial College London), Jose G. F. Coutinho (Imperial College London), Wayne Luk (Imperial College London), Eriko Nurvitadhi (Intel Corporation), and Tim Todman (Imperial College London)
Hierarchical Modelling of Generators in Design-Space ExplorationCharles Lo (University of Toronto) and Paul Chow (University of Toronto)
Investigating Performance Losses in High-Level Synthesis for Stencil ComputationsWesson Altoyan (Stanford University) and Juan J. Alonso (Stanford University)
Poster Session 1:
Arithmetic and Security
Proposing a Fast and Scalable Systolic Array for Matrix MultiplicationBahar Asgari (Georgia Institute of Technology), Ramyad Hadidi (Georgia Institute of Technology), and Hyesoon Kim (Georgia Institute of Technology)
An Automated Tool for Design Space Exploration of Matrix Vector Multiplication (MVM) Kernels Using OpenCL Based Implementation on FPGAsJannatun Naher (North Carolina A & T State University), Clay Gloster (North Carolina A & T State University), Christopher C. Doss (North Carolina A & T State University), and Shrikanth S. Jadhav (North Carolina A & T State University)
Fast Arithmetic Hardware Library For RLWE-Based Homomorphic EncryptionRashmi Agrawal (Boston University), Lake Bu (The Charles Stark Draper Laboratory), and Michel A. Kinsy (Boston University)
Primitive Instantiation for Speed-Area Efficient Architecture Design of Cellular Automata based Mageto Logic on FPGA with Built-In TestabilityAyan Palchaudhuri (Indian Institute of Technology Kharagpur), and Anindya Sundar Dhar (Indian Institute of Technology Kharagpur)
TBOX-Based Mask Scrambling Against SCAJoão Carlos Resende (Universidade de Lisboa), Ricardo J. R. Maçãs (Universidade de Lisboa), and Ricardo Chaves (Universidade de Lisboa)
FPGA Implementation of Post-Quantum DME CryptosystemJosé L. Imaña (Complutense University) and Ignacio Luengo (Complutense University)
A Dynamic Frequency Scaling Framework Against Reliability and Security Issues in Multi-tenant FPGAYukui Luo (University of Illinois at Chicago) and Xiaolin Xu (University of Illinois at Chicago)
Poster Session 2:
Datacenter and Infrastructure
SHIP: Storage for Hybrid Interconnected ProcessorsJuan Camilo Vega (University of Toronto), Qianfeng (Clark) Shen (University of Toronto), and Paul Chow (University of Toronto)
RISC-V Barrel Processor for Accelerator ControlMohammadHossein AskariHemmat (Ecole Polytechnique Montreal), Olexa Bilaniuk (University of Montreal), Sean Wagner (IBM Canada), Yvon Savaria (Ecole Polytechnique Montreal), and Jean-Pierre David (Ecole Polytechnique Montreal)
Update Latency Optimization of Packet Classification for SDN Switch on FPGAChenglong Li (National University of Defense Technology), Tao Li (National University of Defense Technology), Junnan Li (National University of Defense Technology), Zilin Shi (National University of Defense Technology), and Baosheng Wang (National University of Defense Technology)
Accommodating Multi-Tenant FPGAs in the CloudJoel Mandebi Mbongue (University of Florida) and Christophe Bobda (University of Florida)
Accelerating MPI Collectives with FPGAs in the Network and Novel Communicator SupportQingqing Xiong (Boston University), Chen Yang (Boston University), Pouya Haghi (Boston University), Anthony Skjellum (University of Tennessee at Chattanooga), and Martin Herbordt (Boston University)
MeXT-SE: A System-Level Design Tool to Transparently Generate Secure MPSoCMd Jubaer Hossain Pantho (University of Florida) and Christophe Bobda (University of Florida)
Poster Session 3:
Early-stage Automated Identification Tool for Shared AcceleratorsParnian Mokri (Tufts University) and Mark Hempstead (Tufts University)
An Analytical Model of Memory-Bound Applications Compiled with High Level SynthesisMaria A. Dávila-Guzmán (Universidad de Zaragoza), Rubén Gran Tejero (Universidad de Zaragoza), María Villarroya-Gaudó (Universidad de Zaragoza), and Darío Suárez Gracia (Universidad de Zaragoza)
FPGA Virtualization for Deprecated DevicesIan D. Taras (University of Toronto) and Andrew G. Schmidt (University of Southern California)
ZRLMPI: A Unified Programming Model for Reconfigurable Heterogeneous Computing ClustersBurkhard Ringlein (Friedrich-Alexander University Erlangen-Nürnberg), Francois Abel (IBM Research Europe), Alexander Ditter (Friedrich-Alexander University Erlangen-Nürnberg), Beat Weiss (IBM Research Europe), Christoph Hagleitner (IBM Research Europe), and Dietmar Fey (Friedrich-Alexander University Erlangen-Nürnberg)
Designing Domain Specific Computing SystemsAnthony M. Cabrera (Washington University in St. Louis) and Roger D. Chamberlain (Washington University in St. Louis)
Poster Session 4:
Applications and Architectures
Improving the Availability of Secure Space Links through the Partial Reconfiguration of FPGAsEmmanuel Lesser (European Space Agency)
An FPGA-Optimized Architecture of Real-time Farneback Optical FlowZhe Pan (Zhejiang University), Yuruo Jin (Zhejiang University), Xiaohong Jiang (Zhejiang University), and Jian Wu (Zhejiang University)
High-Performance Parallel Radix Sort on FPGABashar Romanous (University of California, Riverside), Mohammadreza Rezvani (University of California, Riverside), Junjie Huang (University of California, Riverside), Daniel Wong (University of California, Riverside), Evangelos E. Papalexakis (University of California, Riverside), Vassilis J. Tsotras (University of California, Riverside), and Walid Najjar (University of California, Riverside)
FPGA-Based Gesture Recognition with Capacitive Sensor Array using Recurrent Neural NetworksHaoyan Liu (University of Arkansas), Atiyehsadat Panahi (University of Arkansas), David Andrews (University of Arkansas), and Alexander Nelson (University of Arkansas)
Gbit/s Non-Binary LDPC Decoders: High-Throughput using High-Level SpecificationsOscar Ferraz (University of Coimbra), Srinivasan Subramaniyan (Amrita Vishwa Vidyapeetham), Guohui Wang (Rice University), Joseph R. Cavallaro (Rice University), Gabriel Falcao (University of Coimbra), and Madhura Purnaprajna (Amrita Vishwa Vidyapeetham)
A Quaternary FPGA Architecture Using Floating Gate MemoriesAyokunle Fadamiro (Auburn University), Pouyan Rezaie (Auburn University), Christopher Harris (Auburn University), and Spencer Millican (Auburn University)
Rotary Register File: A Micro-Architectural Primitive on FPGAReza Nakhjavani (University of Toronto) and Jianwen Zhu (University of Toronto)
Poster Session 5:
Machine Learning 1
Tiny On-Chip Memory Realization of Weight Sparseness Split-CNNs on Low-end FPGAsAkira Jinguji (Tokyo Institute of Technology), Shimpei Sato (Tokyo Institute of Technology), and Hiroki Nakahara (Tokyo Institute of Technology)
An Efficient FPGA-based Architecture for Contractive AutoencodersMadis Kerner (Tallinn University of Technology), Kalle Tammemäe (Tallinn University of Technology), Jaan Raik (Tallinn University of Technology), and Thomas Hollstein (Tallinn University of Technology)
Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge ComputingAkshay Dua (Arizona State University), Yixing Li (Arizona State University), and Fengbo Ren (Arizona State University)
Explore Efficient LUT-based Architecture for Quantized Convolutional Neural Networks on FPGAYanpeng Cao (Southeast University), Chengcheng Wang (Southeast University), and Yongming Tang (Southeast University)
Realization of Quantized Neural Network for Super-resolution on PYNQFeng Yu (Southeast University), Yanpeng Cao (Southeast University), and Yongming Tang (Southeast University)
Scalable Full Hardware Logic Architecture for Gradient Boosted Tree TrainingTamon Sadasue (RICOH Company) and Tsuyoshi Isshiki (Tokyo Institute of Technology)
Optimized Distribution of an Accelerated Convolutional Neural Network across Multiple FPGAsAlaa Maarouf (American University of Beirut), Nour El Droubi (American University of Beirut), Raghid Morcel (American University of Beirut), Hazem Hajj (American University of Beirut), Mazen A. R. Saghir (American University of Beirut), and Haitham Akkary (American University of Beirut)
Poster Session 6:
Machine Learning 2
SqueezeJet-3: An Accelerator Utilizing FPGA MPSoCs for Edge CNN ApplicationsPanagiotis Mousouliotis (Aristotle University of Thessaloniki), Ioannis Papaefstathiou (Aristotle University of Thessaloniki), and Loukas Petrou (Aristotle University of Thessaloniki)
Automatic Generation of FPGA Kernels From Open Format CNN ModelsDimitrios Danopoulos (NTUA), Christoforos Kachris (Democritus University of Thrace), and Dimitrios Soudris (NTUA)
High-Throughput DNN Inference with LogicNetsYaman Umuroglu (Xilinx Research Labs), Yash Akhauri (Xilinx Research Labs), Nicholas J. Fraser (Xilinx Research Labs), and Michaela Blott (Xilinx Research Labs)
AIgean: An Open Framework for Machine Learning on Heterogeneous ClustersNaif Tarafdar (University of Toronto), Giuseppe Di Guglielmo (Columbia University), Philip C Harris (Massachusetts Institute of Technology), Jeffrey D Krupa (Massachusetts Institute of Technology), Vladimir Loncar (CERN), Dylan S Rankin (Massachusetts Institute of Technology), Nhan Tran (Fermilab), Zhenbin Wu (University of Illinois), Qianfeng Shen (University of Toronto), and Paul Chow (University of Toronto)
FPGA Based High-Throughput Real-Time Feature Extraction for Modulation ClassificationJoshua Mack (University of Arizona) and Ali Akoglu (University of Arizona)
Accelerating Large Scale GCN Inference on FPGABingyi Zhang (University of Southern California), Hanqing Zeng (University of Southern California), and Viktor Prasanna (University of Southern California)
EASpiNN: Effective Automated Spiking Neural Network Evaluation on FPGASathish Panchapakesan (Simon Fraser University), Zhenman Fang (Simon Fraser University), and Nitin Chandrachoodan (Indian Institute of Technology - Madras)
A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNsNing Li (Tsinghua University), Leibo Liu (Tsinghua University), Shaojun Wei (Tsinghua University), and Shouyi Yin (Tsinghua University)

The virtual conference format is new to us, so please
report any website issues to


All times shown in Pacific Standard Time (UTC-8)

May 6-8 (Wednesday – Friday)

FPGA Tutorials will be held on May 6 through May 8.

Please use the Registration Links to register for each tutorial. All tutorials will send emails to registered participants.

Intel FPGA Clouds for Academic Research and Teaching.

Time: May 6, 11:00 AM – 12:00 PM CDT

Organizer: Intel
Flyer: PDF

This workshop presents Intel FPGA clouds (HARP and DevCloud) for academic research and teaching. These clouds offer academics access to server nodes with one or more Intel FPGAs per node, along with development tools to facilitate productive research and teaching. We will provide overview of these clouds, tutorial on how to use them, and highlight several projects that have successfully utilized these clouds (e.g., from recent published FPGA work).

The Future of FPGA-Acceleration in Cloud and Datacenters

Time: May 6, 11 AM – 5:30 PM CDT

Organizers: Christophe Bobda (University of Florida) and Peter Hofstee (IBM)
Flyer: PDF
Detailed Program: Abstracts & Bios
Recorded Videos Will be Available Soon!

Field-Programmable Gate Arrays (FPGAs) are becoming integral components of general purpose heterogeneous cloud computing systems and datacenters due to their ability to serve as energy-efficient domain customizable accelerators. All major players such as Microsoft, Amazon, Intel, Baidu, Huawei and IBM now expose FPGAs to application developers in their cloud and datacenter infrastructures. Besides commercial infrastructure, a growing number of projects are on the way across the globe, in academia and other research organizations to provide the benefit of acceleration and flexibility remotely to users. Current developments are taking place in closed door and company and institution disclose very little on the challenges they encounter as well as the approach currently used to tackle those challenges. This workshop will bring experts in various fields around cloud, FPGA, computer architecture and applications to 1) discuss the status FPGA-acceleration in cloud computers and 2) explore the future and challenges in broad adoption of FPGAs in datacenter.

All times shown in Central Daylight Time (UTC-5)

11:00OpeningPeter Hofstee, IBM - Austin, TX
11:05NSF Funding Opportunities and Priorities in CNSErik Brunvand, NSFAvailable Here
11:30The Future of FPGAs Needs Open Middleware NowPaul Chow, University of TorontoAvailable Here
12:00Secure and Virtualized FPGA Management for FPGAs in Cloud and DatacentersDirk Koch, University of Manchester
12:30cloudFPGA: Promote FPGAs to 1st Citizen in the CloudFrancois Abel, IBM Research EuropeAvailable Here
1:00 Break
1:15The Open Cloud FPGA Testbed: Supporting Experiments on Emerging Datacenter ConfigurationsMartin Herbordt, Boston University and Miriam Leeser, Northeastern UniversityAvailable Here
1:45openRole: Do we need a POSIX for FPGAs?Burkhard Ringlein, IBM Research EuropeAvailable Here
2:15Security and Privacy Concerns for the FPGA-Accelerated Cloud and DatacentersRussell Tessier, University of Massachusetts AmherstAvailable Here
2:45Cloud-scale Key Value Store in FPGAJohn W Lockwood, Algo-LogicAvailable Here
3:15 Break
3:30Powering Cloud and Datacenters with Xilinx Adaptive Compute Acceleration platformsCathal McCabe, XilinxAvailable Here
4:00Global-Scale FPGA-Accelerated Deep Learning Inference with Microsoft's Project BrainwaveGabriel Weisz, MicrosoftAvailable Here
4:30Single-Tenant Cloud FPGA SecurityJakub Szefer, Yale UniversityAvailable Here
5:00Gator Reconfigurable Cloud Computing: Hardware Virtualization ChallengesChristophe Bobda, University of FloridaAvailable Here

Compute Acceleration Workflow using Vitis and PYNQ

May 7 and May 8
**Same times both days**
10:00 AM – 1:00 PM (Presentation)
1:00 PM – 4:00 PM (Lab)

Organizer: Xilinx

Xilinx has recently introduced open-source free-downloadable Vitis unified software platform which enables the development of embedded software and accelerated applications on heterogeneous Xilinx platforms including FPGAs, SoCs, and Versal ACAPs. It provides a unified programming model for accelerating edge, cloud, and hybrid computing application. Vitis allows integration of high-level frameworks, development in C, C++, or Python using accelerated libraries or use of RTL-based accelerators & low-level runtime APIs for more fine-grained control over implementation.

Xilinx provides datacenter centric development boards (Alveo) suitable for application acceleration which are well supported by Vitis. Recently, Xilinx’s open-source project, PYNQ, has also been ported to Alveo boards using Vitis and Python. In this tutorial you’ll get your hands on Vitis and PYNQ to experience their compute acceleration features.

Tentative Agenda

May 7 Lecture Session
10:00 AM - 1:00 PM CDT
Webinar IntroductionAgenda, outline
Introduction to VitisIntroduce open-source unified software platform environment and supported boards
Execution ModelUnderstand how XRT works
Vitis Tool FlowsDiscuss Makefile and GUI flows
Design AnalysisUnderstanding profiling and timing analysis reports
Optimization MethodologyDescribe optimization methodology for accelerating applications
May 7 Lab Session
1:00 PM - 4:00 PM CDT
Connecting to AWSUse provided credentials to connect to AWS 
GUI FlowUse Vitis GUI IDE to develop application 
Improving PerformanceUse memory targeting and other techniques to improve performance
May 8 Lecture Session
10:00 AM - 1:00 PM CDT
Host Code OptimizationDescribe techniques for optimizing the host program
Kernel Optimization Describe techniques for developing a high-performance kernel
RTL Kernel WizardHow to integrate your own IP, developed using HDL, in Vitis application development environment
DebuggingVarious methods and components of hardware/software debugging
Vitis Accelerated LibrariesDescribe accelerated libraries structure and that are available for domain-specific and common applications
PYNQ for Compute AccelerationLearn how PYNQ can simplify driving acceleration kernels in the datacenter.
May 8 Lab Session
1:00 PM - 4:00 PM CDT
Optimization LabUse DATAFLOW and PIPELING techniques to optimize an application
RTL Kernel LabUse Vitis RTL Kernel wizard to develop and integrate custom IP developed in HDL
Debug LabPerform hardware/software debugging
Computer Vision LabUse Vitis opencv support to develop a vision application
PYNQ LabUsing JupyterLab this lab will show how to develop and optimize Python code targeting Vitis designs