FCCM 2021 Workshops & Tutorials

All times shown in Eastern Daylight Time (UTC-4)
Links will be accessed through the virtual platform.

Date	Time	Type	Name	Organizer
May 9th	11:00 AM - 1:00 PM	Workshop	Intel FPGA Cloud Services and Remote Learning	Landis Lawrence (Intel)
May 9th	11:00 AM - 2:00 PM	Tutorial	FPGA High-Level Synthesis: Good Practices for Quality and Productivity	Vanderlei Bonato (USP, Brazil)
May 12th	11:00 AM - 2:00 PM	Tutorial	AI Optimized Intel® Stratix® 10 NX FPGA	Eriko Nurvitadhi (Intel)
May 12th	11:00 AM - 2:00 PM	Tutorial	Productive Construction of High-Performance Systolic Arrays on FPGAs	Zhiru Zhang (Cornell)
May 12th	11:00 AM - 2:00 PM	Workshop	Using Intel® oneAPI Toolkits with FPGAs	Susannah Martin (Intel)
May 12th	10:00 AM - 2:00 PM	Workshop	From RTL to Compute Acceleration using Vitis and Cloud Computing (Day 1)	Parimal Patel (Xilinx)
May 13th	10:00 AM - 2:00 PM	Workshop	From RTL to Compute Acceleration using Vitis and Cloud Computing (Day 2)	Parimal Patel (Xilinx)

Intel FPGA Cloud Services and Remote Learning

Date: Sunday May 9

Organizer: Landis Lawrence (Intel)

Speaker
Najmeh Nazari, Research Assistant, UC Davis School of Electrical and Computer Engineering

Gain hands-on experience using Intel® FPGA development tools and kits/accelerator cards in a remote environment. The first half of the course will focus on how to teach undergraduate level courses using Verilog/Schematics/Prebuilt IP and accessing Terasic’s DE10-Lite kit in a remote environment. Topics covered are network setup, installation, compilation and download. The second half of the course will focus on graduate level heterogeneous computing teaching and research on the Intel® FPGA Devcloud and Hardware Accelerator Research Program clouds . These Intel cloud services have the latest configurations of Quartus (RTL), OpenCL, OneAPI and Openvino workload compilation in a XEON+ Arria 10/Stratix 10 FPGA development environment available free to the academic community. Labs include using the Intel FPGA Devcloud for dpc++ workload compilations and performance analysis.

FPGA High-Level Synthesis: Good Practices for Quality and Productivity

Date: Sunday May 9

Organizer: Vanderlei Bonato (USP, Brazil)

Speakers

BSc. Andre B. Perina (Institute of Mathematics and Computer Sciences – University of São Paulo – Brazil)
Dr. Leandro S. Rosa (Event-Driven Perception – Istituto Italiano di Tecnologia – Italia)
Dr. Vanderlei Bonato (Institute of Mathematics and Computer Sciences – University of São Paulo – Brazil)

This tutorial is orientated to starters in the HLS world. It brings a broad view about HLS, connecting the software-like input to functional and temporal simulations, and the final hardware design. The goal is to understand the effects of data type, arithmetic, loops, interfaces and memory organisation inferred from software, providing to the designers a set of good practices to improve the final hardware quality while minimising the implementation efforts. Practical experiments will be conducted in Vivado and Vitis HLS and the participants are motivated to replicate the experiments remotely and to share their experiences. For the tutorial, participants are encouraged to install in advance the Vitis Core Development Kit – 2020.2 (select Vitis on the Xilinx Unified Installer to enable Vivado Design Suite to be installed together).

AI Optimized Intel^® Stratix^® 10 NX FPGA

Date: Wednesday May 12

Organizer: Eriko Nurvitadhi (Intel)

Speakers:

Eriko Nurvitadhi (Intel)
Rohit B DSouza (Intel)
Andrew M Boutros (Intel)
Tim Vanderhoek (Intel)

Tutorial Description:

The Intel® Stratix® 10 NX FPGA is Intel’s first AI-optimized FPGA. It introduces a new type of AI-optimized tensor arithmetic block called the AI Tensor Block and is designed for high-bandwidth, low-latency, artificial intelligence (AI) applications. The Intel® Stratix® 10 NX FPGA delivers accelerated AI compute solutions with up to 143 INT8 TOPS at ~1 TOPS/W, in package 3D stacked HBM2 high-bandwidth DRAM, and up to 57.8G PAM4 transceivers. In this tutorial, we will first provide an overview of the Intel Stratix 10 NX FPGA followed by example designs, such as Text-To-Speech application and PE array design. We also offer an application evaluation and comparison against GPUs. Using an approach such as the soft AI processor overlay we developed in our recently published research [FPT’20], we will show how the Intel Stratix 10 NX FPGA can be programmed purely in software to deliver excellent performance in real-time AI workloads. The agenda of this tutorial is as follows:

Part 1: Introduction
Part 2: Stratix 10 NX FPGA
- Overview & Platforms
- Text-to-Speech Application Study
Part 3: PE Array Example Design for Stratix 10 NX
Part 4: AI Soft Processor on Stratix 10 NX
- Intro and Motivation
- Optimized AI Soft Processor for Stratix 10 NX
Part 5: Demo/Lab

Productive Construction of High-Performance Systolic Arrays on FPGAs

Date: Wednesday May 12

Organizer: Zhiru Zhang (Cornell)

Speakers:

Jason Cong, UCLA, <cong@cs.ucla.edu>
Hongbo Rong, Intel Labs, <hongbo.rong@intel.com>
Zhiru Zhang, Cornell University, <zhiruz@cornell.edu>

Recent years have seen a growing number of application-specific systolic arrays (SAs) implemented on modern FPGAs for efficient compute acceleration. The characteristics of near-neighbor connections make SAs a great match for FPGAs, where it is particularly important to minimize long interconnects to meet the target clock frequency. However, it requires a tremendous amount of human effort to design and implement a high-performance SA for a given algorithm using the traditional RTL-based methodology. On the other hand, existing high-level synthesis (HLS) tools force the programmers to do “micro-coding” where many optimizations must be carried out through tedious code restructuring and/or insertion of vendor-specific pragmas.

In this tutorial, we introduce our recent efforts on developing new programming models and automatic synthesis capabilities that enable FPGA programmers to productively build high-performance SAs. More specifically, the tutorial consists of three major segments, each of which will include a technical presentation (30-35 mins) followed by a short demo and Q&A (5-10 mins). The outline of the proposed tutorial is as follows:

Introduction / Overview
Segment 1 (led by Cong) — AutoSA [FPGA’21], an end-to-end compilation framework for generating systolic arrays on FPGA. AutoSA is based on the polyhedral framework, and further incorporates a set of optimizations on different dimensions to boost performance. As an example, we also show how AutoSA is used in an end-to-end deep learning acceleration framework FlexCNN [FPGA’2020].
Segment 2 (led by Rong) — T2S/SuSy [FCCM’19, ICCAD’20], a programming framework built upon Halide for productively building high-performance SAs on FPGAs. T2S decouples the algorithm specification from spatial optimizations, where the former can concisely express any systolic algorithm while the latter can describe essential optimizations for systolic arrays.
Segment 3 (led by Zhang) — HeteroCL [FPGA’19], a Python-based DSL and an automated compilation flow that maps the input algorithm into special-purpose accelerators through HLS. HeteroCL integrates AutoSA as a compiler backend for mapping systolic algorithms to efficient SA architectures.

Using Intel® oneAPI Toolkits with FPGAs

Date: Wednesday May 12

Organizer: Susannah Martin (Intel)

In this tutorial, you will learn to write and compile Data Parallel C++ (DPC++) code to target an Intel FPGA. You will learn and practice the development flow to (1) emulate your code to ensure functionality, (2) optimize your code using reports, and (3) generate and profile the hardware bitstream created from your code. You will also be introduced to the concepts and strategies needed to ensure your code is optimized for performance. A hands-on lab will take you through multiple stages of optimization of example DPC++ code. The hands-on lab portion of this tutorial will make use of the Intel DevCloud. You will receive instructions and practice on the use of the Intel DevCloud during the tutorial.

From RTL to Compute Acceleration using Vitis and Cloud Computing

Date: Wednesday May 12 and Thursday May 13
Schedule: https://www.xilinx.com/support/university/workshops/schedule.html

Organizer: Parimal Patel (Xilinx)

Xilinx technology has evolved from simple glue-logic supporting FPGA to very complex but efficient heterogeneous programmable platform devices through several generations of innovations. In order to enable efficient and high performant use of this architecture, a correspondingly sophisticated set of development tools are needed. Xilinx has developed Vitis, a unified software development environment, which supports RTL design using included Vivado tools, high-level synthesis capability to enable higher-level of abstraction allowing users to write models in C/C++, and complete domain specific application development and deployment. Its features include system level profiling, automated software acceleration in programmable logic, automated system connectivity generation, and libraries to speed up performance. The Vitis environment enables the user to easily and productively develop accelerated algorithms and then efficiently implement and deploy them onto heterogeneous CPU-FPGA-ACAP systems. Xilinx also has developed an open-source framework, PYNQ, which can enable productivity improvement using Python driving an already designed hardware. You will use a board to validate your initial designs. You will also use Amazon Web Services (AWS) Elastic Cloud Computing (EC2) which offers the Vitis environment for cloud development and in-cloud acceleration.

Topics covered:

FPGA Technology and Vitis Intro
Introduction to RTL development using Vivado
Introduction to High-Level Synthesis and Vivado HLS
Embedded application development Vitis
Introduction to PYNQ
Lab 1: Vivado Design Flow using RTL on a PYNQ-Z2 board
Lab 2: Embedded application development targeting PYNQ-Z2
Lab 3: Interacting with PYNQ

Introduction to Vitis for Acceleration Platforms
Vitis flow for a project creation and design analysis
Vitis design methodology
Host and Kernel optimization
Vitis accelerated libraries
Lab 4: Creating a Vitis project using Vitis GUI flow
Lab 5: Design analysis of the created application
Lab 6: Improving performance through bandwidth improvement
Lab 7: Applying various optimization techniques
Lab 8: Using Vision Accelerated library

FCCM 2021 Workshops & Tutorials

Intel FPGA Cloud Services and Remote Learning

FPGA High-Level Synthesis: Good Practices for Quality and Productivity

AI Optimized Intel® Stratix® 10 NX FPGA

Productive Construction of High-Performance Systolic Arrays on FPGAs

Using Intel® oneAPI Toolkits with FPGAs

From RTL to Compute Acceleration using Vitis and Cloud Computing

AI Optimized Intel^® Stratix^® 10 NX FPGA