Workshops & Tutorials

FCCM 2021 Workshops & Tutorials

All times shown in Eastern Daylight Time (UTC-4)
Links will be accessed through the virtual platform.

Date Time Type Name Organizer
May 9th 11:00 AM - 1:00 PM Workshop Intel FPGA Cloud Services and Remote Learning Landis Lawrence (Intel)
May 9th 11:00 AM - 2:00 PM Tutorial FPGA High-Level Synthesis: Good Practices for Quality and Productivity Vanderlei Bonato (USP, Brazil)
May 12th 11:00 AM - 2:00 PM Tutorial AI Optimized Intel® Stratix® 10 NX FPGA Eriko Nurvitadhi (Intel)
May 12th 11:00 AM - 2:00 PM Tutorial Productive Construction of High-Performance Systolic Arrays on FPGAs Zhiru Zhang (Cornell)
May 12th 11:00 AM - 2:00 PM Workshop Using Intel® oneAPI Toolkits with FPGAs Susannah Martin (Intel)
May 12th 10:00 AM - 2:00 PM Workshop From RTL to Compute Acceleration using Vitis and Cloud Computing (Day 1) Parimal Patel (Xilinx)
May 13th 10:00 AM - 2:00 PM Workshop From RTL to Compute Acceleration using Vitis and Cloud Computing (Day 2) Parimal Patel (Xilinx)

Intel FPGA Cloud Services and Remote Learning

Date: Sunday May 9

Organizer: Landis Lawrence (Intel)

Najmeh Nazari, Research Assistant, UC Davis School of Electrical and Computer Engineering

Gain hands-on experience using Intel® FPGA development tools and kits/accelerator cards in a remote environment. The first half of the course will focus on how to teach undergraduate level courses using Verilog/Schematics/Prebuilt IP and accessing Terasic’s DE10-Lite kit in a remote environment. Topics covered are network setup, installation, compilation and download. The second half of the course will focus on graduate level heterogeneous computing teaching and research on the Intel® FPGA Devcloud and Hardware Accelerator Research Program clouds . These Intel cloud services have the latest configurations of Quartus (RTL), OpenCL, OneAPI and Openvino workload compilation in a XEON+ Arria 10/Stratix 10 FPGA development environment available free to the academic community. Labs include using the Intel FPGA Devcloud for dpc++ workload compilations and performance analysis.

FPGA High-Level Synthesis: Good Practices for Quality and Productivity

Date: Sunday May 9

Organizer: Vanderlei Bonato (USP, Brazil)


  • BSc. Andre B. Perina (Institute of Mathematics and Computer Sciences – University of São Paulo – Brazil)
  • Dr. Leandro S. Rosa (Event-Driven Perception – Istituto Italiano di Tecnologia – Italia)
  • Dr. Vanderlei Bonato (Institute of Mathematics and Computer Sciences – University of São Paulo – Brazil)

This tutorial is orientated to starters in the HLS world. It brings a broad view about HLS, connecting the software-like input to functional and temporal simulations, and the final hardware design. The goal is to understand the effects of data type, arithmetic, loops, interfaces and memory organisation inferred from software, providing to the designers a set of good practices to improve the final hardware quality while minimising the implementation efforts. Practical experiments will be conducted in Vivado and Vitis HLS and the participants are motivated to replicate the experiments remotely and to share their experiences. For the tutorial, participants are encouraged to install in advance the Vitis Core Development Kit – 2020.2 (select Vitis on the Xilinx Unified Installer to enable Vivado Design Suite to be installed together).

AI Optimized Intel® Stratix® 10 NX FPGA

Date: Wednesday May 12

Organizer: Eriko Nurvitadhi (Intel)


  • Eriko Nurvitadhi (Intel)
  • Rohit B DSouza (Intel)
  • Andrew M Boutros (Intel)
  • Tim Vanderhoek (Intel)

Tutorial Description:

The Intel® Stratix® 10 NX FPGA is Intel’s first AI-optimized FPGA. It introduces a new type of AI-optimized tensor arithmetic block called the AI Tensor Block and is designed for high-bandwidth, low-latency, artificial intelligence (AI) applications. The Intel® Stratix® 10 NX FPGA delivers accelerated AI compute solutions with up to 143 INT8 TOPS at ~1 TOPS/W, in package 3D stacked HBM2 high-bandwidth DRAM, and up to 57.8G PAM4 transceivers. In this tutorial, we will first provide an overview of the Intel Stratix 10 NX FPGA followed by example designs, such as Text-To-Speech application and PE array design. We also offer an application evaluation and comparison against GPUs. Using an approach such as the soft AI processor overlay we developed in our recently published research [FPT’20], we will show how the Intel Stratix 10 NX FPGA can be programmed purely in software to deliver excellent performance in real-time AI workloads. The agenda of this tutorial is as follows:

  • Part 1: Introduction
  • Part 2: Stratix 10 NX FPGA
    • Overview & Platforms
    • Text-to-Speech Application Study
  • Part 3: PE Array Example Design for Stratix 10 NX
  • Part 4: AI Soft Processor on Stratix 10 NX
    • Intro and Motivation
    • Optimized AI Soft Processor for Stratix 10 NX
  • Part 5: Demo/Lab

Productive Construction of High-Performance Systolic Arrays on FPGAs

Date: Wednesday May 12

Organizer: Zhiru Zhang (Cornell)


  • Jason Cong, UCLA, <>
  • Hongbo Rong, Intel Labs, <>
  • Zhiru Zhang, Cornell University, <>

Recent years have seen a growing number of application-specific systolic arrays (SAs) implemented on modern FPGAs for efficient compute acceleration. The characteristics of near-neighbor connections make SAs a great match for FPGAs, where it is particularly important to minimize long interconnects to meet the target clock frequency. However, it requires a tremendous amount of human effort to design and implement a high-performance SA for a given algorithm using the traditional RTL-based methodology. On the other hand, existing high-level synthesis (HLS) tools force the programmers to do “micro-coding” where many optimizations must be carried out through tedious code restructuring and/or insertion of vendor-specific pragmas.

In this tutorial, we introduce our recent efforts on developing new programming models and automatic synthesis capabilities that enable FPGA programmers to productively build high-performance SAs. More specifically, the tutorial consists of three major segments, each of which will include a technical presentation (30-35 mins) followed by a short demo and Q&A (5-10 mins). The outline of the proposed tutorial is as follows:

  • Introduction / Overview
  • Segment 1 (led by Cong) — AutoSA [FPGA’21], an end-to-end compilation framework for generating systolic arrays on FPGA. AutoSA is based on the polyhedral framework, and further incorporates a set of optimizations on different dimensions to boost performance.  As an example, we also show how AutoSA is used in an end-to-end deep learning acceleration framework FlexCNN [FPGA’2020].
  • Segment 2 (led by Rong) — T2S/SuSy [FCCM’19, ICCAD’20], a programming framework built upon Halide for productively building high-performance SAs on FPGAs. T2S decouples the algorithm specification from spatial optimizations, where the former can concisely express any systolic algorithm while the latter can describe essential optimizations for systolic arrays.
  • Segment 3 (led by Zhang) — HeteroCL [FPGA’19], a Python-based DSL and an automated compilation flow that maps the input algorithm into special-purpose accelerators through HLS. HeteroCL integrates AutoSA as a compiler backend for mapping systolic algorithms to efficient SA architectures.

Using Intel® oneAPI Toolkits with FPGAs

Date: Wednesday May 12

Organizer: Susannah Martin (Intel)

In this tutorial, you will learn to write and compile Data Parallel C++ (DPC++) code to target an Intel FPGA. You will learn and practice the development flow to (1) emulate your code to ensure functionality, (2) optimize your code using reports, and (3) generate and profile the hardware bitstream created from your code. You will also be introduced to the concepts and strategies needed to ensure your code is optimized for performance. A hands-on lab will take you through multiple stages of optimization of example DPC++ code. The hands-on lab portion of this tutorial will make use of the Intel DevCloud. You will receive instructions and practice on the use of the Intel DevCloud during the tutorial.

From RTL to Compute Acceleration using Vitis and Cloud Computing

Date: Wednesday May 12 and Thursday May 13

Organizer: Parimal Patel (Xilinx)

Xilinx technology has evolved from simple glue-logic supporting FPGA to very complex but efficient heterogeneous programmable platform devices through several generations of innovations. In order to enable efficient and high performant use of this architecture, a correspondingly sophisticated set of development tools are needed. Xilinx has developed Vitis, a unified software development environment, which supports RTL design using included Vivado tools, high-level synthesis capability to enable higher-level of abstraction allowing users to write models in C/C++, and complete domain specific application development and deployment. Its features include system level profiling, automated software acceleration in programmable logic, automated system connectivity generation, and libraries to speed up performance. The Vitis environment enables the user to easily and productively develop accelerated algorithms and then efficiently implement and deploy them onto heterogeneous CPU-FPGA-ACAP systems. Xilinx also has developed an open-source framework, PYNQ, which can enable productivity improvement using Python driving an already designed hardware. You will use a board to validate your initial designs. You will also use Amazon Web Services (AWS) Elastic Cloud Computing (EC2) which offers the Vitis environment for cloud development and in-cloud acceleration.

Topics covered:

  • FPGA Technology and Vitis Intro
  • Introduction to RTL development using Vivado
  • Introduction to High-Level Synthesis and Vivado HLS
  • Embedded application development Vitis
  • Introduction to PYNQ
  • Lab 1: Vivado Design Flow using RTL on a PYNQ-Z2 board
  • Lab 2: Embedded application development targeting PYNQ-Z2
  • Lab 3: Interacting with PYNQ
  • Introduction to Vitis for Acceleration Platforms
  • Vitis flow for a project creation and design analysis
  • Vitis design methodology
  • Host and Kernel optimization
  • Vitis accelerated libraries
  • Lab 4: Creating a Vitis project using Vitis GUI flow
  • Lab 5: Design analysis of the created application
  • Lab 6: Improving performance through bandwidth improvement
  • Lab 7: Applying various optimization techniques
  • Lab 8: Using Vision Accelerated library