FCCM 2023 Workshops & Tutorials

Tentative Workshop and Tutorial Schedule

Date	Time	Name	Organizer
5/8/2023	9 AM – 12 PM	Exploring FPGAs in the Open Cloud Testbed: A Hands-On Tutorial [slides]	Suranga Handagala and Miriam Leeser
5/8/2023	9 AM – 12 PM	Streaming Time Series Analysis for FPGAs [slides]	Phillip Brisk and Eamonn Keogh (U.C. Riverside)
5/8/2023	9 AM – 5 PM	Introduction to the Versal ACAP Adaptable Intelligent Engine and to its Programming Model [slides]	Mario Ruiz, Naveen Purushotham, and Hugo Andrade (AMD)
5/8/2023	9 AM – 5 PM	NDK+OFM: Rapid Development of Accelerated Applications for FPGA SmartNICs [slides]	Jirı Matousek, Daniel Kondys, and Jakub Cabal (CESNET)
5/8/2023	1 PM – 5 PM	Developing a community-driven cloud-based infrastructure for post-quantum cryptography side-channel attack analysis [slides]	Miaoqing Huang, Alexander Nelson, and David Andrews (University of Arkansas)
5/8/2023	1 PM – 5 PM	DSAGEN: An Full-stack End-to-End Framework for Domain-Specific Accelerator Generation [slides]	Sihao Liu, Jian Weng, Dylan Kupsh, and Tony Nowatzki (UCLA)
5/11/2023	9 AM – 12 PM	Leveraging MLIR to Design for AI Engines [slides]	Stephen Neuendorffer, Kristof Denolf, Erwei Wang, Jack Lo and Andra Bisca (AMD)
5/11/2023	9 AM – 12 PM	FPGA Architecture for Deep Learning [slides]	Vaughn Betz and Andrew Boutros (University of Toronto)
5/11/2023	1 PM – 5 PM	How to implement Intel OFS (Open FPGA Stack) in Stratix 10 Platform [slides]	John Tio (Intel)

Workshop and Tutorial Details

Exploring FPGAs in the Open Cloud Testbed: A Hands-On Tutorial

Suranga Handagala and Miriam Leeser (Northeastern University)

The Open Cloud Testbed (OCT) is a CISE Community Research Infrastructure (CCRI) project, supported by NSF at the Grand level, that provides researchers access to network-connected FPGA-enhanced server nodes through the CloudLab framework. CloudLab nodes are bare metal, meaning they are provided without an operating system or any pre-installed software or tools. This provides users with a flexible and powerful computing environment that can be customized to meet their unique needs. A distinctive feature of OCT is the exposure of its network interfaces to users, a capability not typically available on commercial and private clouds. This feature enables users to conduct advanced research using network-attached FPGAs, taking advantage of a high speed network, and making new discoveries in various research fields.

This tutorial will guide participants through the process of building and deploying FPGA applications in the OCT, providing an overview of the FPGA development tools required to create bitstreams and deploy them on FPGA hardware. Participants will also gain insight into the applications and benefits of network-attached FPGAs.

The tutorial will cover the following topics:
• Introduction to OCT and its FPGA capabilities
• Overview of the FPGA development process in OCT
• Building and deploying FPGA-based applications in OCT
• An example of using network-attached FPGAs
• Research directions for network-attached FPGAs

By the end of the tutorial, participants will have a thorough understanding of the FPGA build and deployment process in the OCT, and will also be able to apply this knowledge to their own research projects. This tutorial is ideal for researchers who are interested in exploring the potential of FPGA-enabled computing in various data center and cloud computing applications.

Streaming Time Series Analysis for FPGAs

Phillip Brisk and Eamonn Keogh (U.C. Riverside)

This tutorial will introduce streaming time series analysis as an application area that can benefit significantly from acceleration using FPGAs. The tutorial will be split into two parts: the first part will present fundamental topics and concepts that are germane to time series analysis, including algorithms and analyses that have stood the test-of-time, and culminating with recent work that has evolved along two different axes: shape-based and feature-based approaches, the latter of which is closely connected to ongoing advances in artificial intelligence and machine learning; the second part of the tutorial will explain why FPGAs are an appropriate computational platform for streaming time series analysis, and why CPUs and GPUs suffer from significant disadvantages; it will also explain tips and techniques to achieve the highest overall performance, and will outline directions for future research on FPGA-accelerated time series analysis, including novel algorithms, numerical formats and precision, and approximate arithmetic.

Introduction to the Versal ACAP Adaptable Intelligent Engine and to its Programming Model

Mario Ruiz, Naveen Purushotham, and Hugo Andrade (AMD)

This tutorial will briefly introduce the heterogeneous Versal Adaptive Compute Acceleration Platform. We will primarily focus on the Adaptable Intelligent Engine (AIE). The AI Engine is a tiled array of Very Long Instruction Word (VLIW) and Single Instruction Multiple Data (SIMD) processing elements that provide high compute density. We will describe the AI Engine tile and AI Engine array architecture as well as the different data movement alternatives. We will also introduce the AI Engine programming model, which consists of a Data Flow Graph Specification written in C++ and the kernel description written either in C or C++. The application can be compiled and executed using the AI Engine tool chain, which is part of the Vitis Unified Software.

This tutorial will cover the following topics:
• Versal ACAP Architecture
• Versal AI Engine Architecture & Memory and Data Movement
• Scalar and Vector data types
• Windows and Streaming Data types
• Vitis tool flow for AI Engine
• The AI Engine Programming Model
• Optimized open sources libraries for the AI Engine

NDK+OFM: Rapid Development of Accelerated Applications for FPGA SmartNICs

Jirı Matousek, Daniel Kondys, and Jakub Cabal (CESNET)

Since the number of FPGA SmartNICs deployed in modern networks is rising, researchers need platforms that would allow them to quickly explore and rapidly prototype novel accelerated applications for these devices. Although such platforms exist, they only support decade-old interface technologies. This tutorial, therefore, presents NDK (Network Development Kit), an FPGA-vendor-independent platform supporting up to 400G Ethernet network interfaces and up to PCIe Gen5 host interfaces. In addition, the tutorial also introduces OFM (Opensource FPGA Modules), a library of components that can be used for speeding up the development of accelerated networking applications. Both NDK and OFM are based on 20+ years of CESNET’s experience with research and development of accelerated applications for FPGA SmartNICs.

In the first half of this full-day tutorial, attendees will have the opportunity to get familiar—both theoretically and practically—with NDK via its reference application ndk-app-minimal. The second half of the tutorial will guide the attendees through the whole process of developing an accelerated application based on NDK and utilizing OFM. In the end, the tutorial will also provide a quick look on selected advanced topics under the hood of NDK.

Developing a community-driven cloud-based infrastructure for post-quantum cryptography side-channel attack analysis

Miaoqing Huang, Alexander Nelson, and David Andrews (University of Arkansas)

The goal of this workshop is to raise community awareness of a new cloud-based infrastructure being defined for developing and testing PQC side-channel attacks and countermeasures implemented on FPGAs and embedded processors. The infrastructure will allow researchers throughout the security community to develop an open source set of common benchmarks and hardware/software implementations for both attacks and countermeasures. This represents a current need within the research community.

More information can be found at: http://www.csce.uark.edu/~mqhuang/pqc_workshop_2023

DSAGEN: An Full-stack End-to-End Framework for Domain-Specific Accelerator Generation

Sihao Liu, Jian Weng, Dylan Kupsh, and Tony Nowatzki (UCLA)

Because of the slowing of technology scaling, recent research has shifted from general-purpose processors to specialized architectures, which can provide orders of magnitude acceleration and energy savings. However, developing new specializedTutorial: Leveraging MLIR to Design for AI Engines architectures is engineering intensive — besides the architecture itself, it is also important to have an accessible software stack which enables productive application development. In this tutorial, we describe a paradigm and hardware/software stack that is a step towards automated accelerator codesign. The primary principle is to represent hardware in a rich design space as a graph of simple components with composable semantics. A modular compiler and performance/area model, robust to the presence of hardware/software features, together enable automated codesign. Our framework, DSAGEN, can be viewed as a high-level synthesis tool, but for programmable accelerators.

Leveraging MLIR to Design for AI Engines

Stephen Neuendorffer, Kristof Denolf, Erwei Wang, Jack Lo and Andra Bisca (AMD)

The AI Engine array of the AMD Versal ACAP device is a set of VLIW vector processors with adaptable interconnect. This tutorial is targeted at tool developers and system designers who are looking for fast and completely open source design tools to support their research. Participants will first get insight into the Versal ACAP architecture, more specifically the AI Engine compute and data movement capabilities. Through small design examples expressed in the MLIR-AIE dialect and executed on an ACAP device, participants will leverage AI Engine features for optimizing performance of increasingly complex designs. This will enable them to recognize how this physical-level dialect can be connected to higher level abstraction in the MLIR framework and understand how logical concepts can be expressed to increase productivity and reduce complexity. The labs will be done using AWS instances with opportunities to execute their own designs on real hardware.

FPGA Architecture for Deep Learning

Vaughn Betz and Andrew Boutros (University of Toronto)

Deep learning (DL) is becoming the cornerstone of numerous applications both in the edge and in large-scale datacenters. As a result, specialized DL hardware is being widely deployed to meet the performance requirements of state-of-the-art DL models. Field-programmable gate arrays (FPGAs) have unique capabilities that make them an attractive platform for accelerating DL inference. They offer the ability to customize processing pipelines and thus achieve lower latency and higher energy efficiency compared to general-purpose CPUs and GPUs, at a fraction of the development time and cost of custom-made application-specific integrated circuits (ASICs). Their diverse and high-speed IOs also enable directly interfacing the FPGA to the network and/or a variety of external sensors (e.g. cameras and lidar sensors in an autonomous vehicle), making them suitable for both datacenter and edge use cases.

FPGA architecture has been continuously evolving over the course of the past three decades to better suit key FPGA use cases. With DL inference becoming a major market segment, FPGA architecture is also evolving to match its requirements. FPGA vendors are announcing new FPGA families specifically targeted for DL workloads and many academic research efforts are proposing FPGA architecture modifications for DL. In this tutorial, we will focus on both academic and industrial FPGA architecture enhancements for DL that have been introduced in recent years. First, we will give a brief introduction on the basics of FPGA architecture and how the key components of FPGAs lead to strengths and weaknesses in DL applications. Then, we will cover DL-specific enhancements to traditional FPGA building blocks (e.g. logic blocks, DSPs, on-chip RAMs) as well as new specialized blocks that have been introduced for DL. We will also highlight promising directions for future research in this area. Finally, we will have a panel discussion with representatives from major FPGA vendors and academia to present their perspectives on the future of FPGA architecture and use cases in the DL domain. By the end of this tutorial, participants will have an understanding of:
• Key concepts in FPGA architecture
• Common design styles of DL accelerators on conventional FPGAs
• Research tools and methodologies for exploring new FPGA architectures
• FPGA architecture enhancements for DL
• Key challenges and future research directions in this area

How to implement Intel OFS (Open FPGA Stack) in Stratix 10 Platform

John Tio (Intel)

Scope and topic of the workshop: OFS is a software and hard infrastructure providing an efficient approach to develop a custom FPGA-based platform or workload. We will provide tutorial on how to access on the OFS database:
• Source-accessible RTL and software that can be modified fit your design goals.
• Upstreamed Linux kernel drivers that provide management of your FPGA card.
• Automated compilation scripts with compile switches to add or remove features

To take advantage of the OFS infrastructure, we will be providing tutorial on how to start using OFS FPGA shell targeting Intel® Stratix 10® FPGA provided with the GitHub release along with the OFS software stack. After the tutorial, participant will be able to understanding what OFS has to offer and can leverage the repositories to begin customizing OFS FPGA design, workload or software application.