#### Security and Privacy Concerns for the FPGA-Accelerated Cloud and Datacenters

#### **Russell Tessier, Daniel Holcomb, and George Provelengios**

Electrical and Computer Engineering University of Massachusetts, Amherst

May 6, 2020

Research funded by the Intel Research Council and NSF grant CNS-1902532

Department of Electrical and Computer Engineering

### Overview

- Background
  - FPGAs in the cloud
  - Multi-tenant FPGAs
  - FPGA voltage attack approaches
- Characterizing voltage attacks on Arria 10
  - Experimental approach
  - Characterization test results
  - Fault induction
- RSA attack using power fluctuation on Cyclone V and Arria 10
  - Induce delay faults in RSA
  - Use Chinese remainder theorem to extract key

### Multi-Tenant FPGA

- Shared (multi-tenant) FPGAs
  - Devices are expensive. Desire to fully use resources
- Cloud computing: target for multi-tenant FPGAs?
  - Why not use partial reconfiguration?
  - User has no idea what "neighbor" is doing (side channels)
  - Don't want to risk leaking information
- Need to understand vulnerabilities
  - Previous: temperature, voltage
  - This work: no physical access needed



<sup>1</sup> Stratix V FPGAs: Built for Bandwidth, Intel Corporation, 2010 Department of Electrical and Computer Engineering

### Example: AmorphOS for Amazon EC2 F1

- Multiple users deploy circuits (Morphlets) on FPGA
- Virtualizing software multiplexing software and memory interfaces
- Attempt to create "virtual machine" like environment on the FPGA
- Increase income level for hardware use.
- Security?: not really a focus



<sup>1</sup> A. Khawaja et al., Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphoS, OSDI, Oct. 2018 Department of Electrical and Computer Engineering

## What Type of FPGA Voltage Attacks are Possible?

- On-chip voltage sensors to extract encryption key
  - Ring oscillators used to extract RSA key<sup>1</sup>
  - Time-to-digital converters used to extract AES key<sup>2</sup>
- Voltage fluctuation-based communication
  - Communication on single FPGA<sup>3,4</sup>
- On-chip voltage supply attacks
  - Induce stealthy faults<sup>5, 6, 7</sup>
- Drive FPGA into reset<sup>7</sup>



FPGA voltage sensors surrounding RSA core<sup>1</sup>

<sup>1</sup> Zhao and Kuh, FPGA-Based Remote Power Side-Channel Attacks, IEEE Symp. Security and Privacy, May 2018
<sup>2</sup> Schellenberg et al, An Inside Job: Remote Power Analysis Attacks on FPGAs, DATE, March 2018
<sup>3</sup> Gnad et al, "Voltage-based covert channels in multi-tenant FPGAs," Cryptology ePrint Archive, vol. Report 2019/1394, 2019
<sup>4</sup> Giechaskiel et al., "Reading between the dies: Cross-SLR covert channels on multi-tenant cloud FPGAs" ICCD, Oct. 2019
<sup>5</sup> Krautter et al, FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES, CHES, vol 3, 2018
<sup>6</sup> Mahmoud and Stojilovic, "Timing violation induced faults in multi-tenant FPGAs," in DATE 2019
<sup>7</sup> Provelengios, "Characterizing Power Distribution Attacks in Multi-User FPGA Environments", FPL 2019

Department of Electrical and Computer Engineering

#### Overview

- Two tenants are using simultaneously the device
- Tenant A (attacker) consumes power aggressively in an attempt to induce timing faults in tenant B (victim)
- Threat model:
- Tenants are spatially isolated but share the FPGA power distribution network (PDN)
- Tenants do not have physical access to the board
- The tools used for interacting with the FPGA are secure





## Contribution

- We investigate on-chip voltage attacks and specifically how their impact depends on:
  - Duration of voltage disruption
  - Consumed power by attacker
  - Distance between attacker & victim
- We evaluate the ability of power wasting circuits to induce timing faults to victim
- We examine the ability of power wasting circuits to reveal an RSA encryption key through fault injection

## Voltage sensor architecture

- A regular rectangular grid of 46 sensors
- 19 inverting stages:
  - ✓ Meet timing constraints
  - ✓ Minimize local effects<sup>1</sup>
  - ✓ Fit in a single CV LAB
- Arria 10 parameters
  - f<sub>RO</sub>=150 MHz
  - Samp. period = 10µs



Controller reads and resets all the sensors simultaneously in every sampling period

<sup>1</sup> M. Barbareschi, G. Di Natale, and L. Torres, "Implementation and analysis of ring oscillator circuits on Xilinx FPGAs," in *Hardware Security and Trust.* N. Sklavos, R. Chaves, G. Di Natale, and F. Regazzoni, Eds. Springer, 2017, ch. 12, pp. 237-251

Department of Electrical and Computer Engineering

## Attacker circuitry

- $P_{dyn} = C \times V_{DD}^2 \times f_{SW}$
- 1-stage ROs as power wasters
- Arria 10: 11,424 LABs fit up to 28K PW
- Placed uniformly at random locations in the attack area



### Arria 10 sensor calibration

- To use ROs as on-chip voltage sensors:
  - Vary power waster count between 8,000 and 28,000 and record:
    - ✓ Voltage on on-chip sensor
    - RO counts from on-chip sensors
- Minimize the power drawn by the FPGA during measurements



### Voltage drop characterization in Arria 10

- Evaluate the Arria 10 PDN response
- 28k RO-based PW instances
- 12 on-chip sensors at different distances to the center of the waster
- Peak voltage drop ~8us after activating PWs



#### Department of Electrical and Computer Engineering

## Characterizing timing faults

- Voltage drop causes delay of combinational logic to increase
- Wrong values captured if paths do not complete before capturing clock edge arrives
- Must overcome conservative timing models
- Use ripple carry adder as a representative test circuit which allows us to sensitize various path lengths



### Arria 10 timing faults

- 28k PWs randomly placed in an area of 11,424 LABs (168x68)
- Steep voltage drop at 20 ns induces faults
- Faults peak at 8 µs
- Substantially fewer faults than Cyclone V



### Can a victim evade the attack?

- In Arria 10, the initial fast voltage drop is not location dependent
- Faults from legal paths reported even at the edge of the device





## Mapping the Arria 10 voltage drop

- Using 132 on-chip sensors for deriving the voltage contours
- Varying the magnitude of disturbance and location of attacker
- Center of attack:
  - 28K PWs: 767mV
  - 8K PWs: 862mV
- Upper right corner of the chip:
  - 28K PWs: 797mV



(A) 28K power waster attack



(B) 8K power waster attack

## Locating the Arria 10 attack area

- The disturbance of the shared PDN reveals the location of the attacker
- Evaluate how many sensors required to find its location
- 64 sensors are sufficient to \_\_\_\_\_\_\_
   identify the attacker

Resource utilize.: Arria 10AX115N2F45E1SG

| Num. RO    | ALMs             | Flip-flops          |  |  |
|------------|------------------|---------------------|--|--|
| Sensors    | (Avail.:427,200) | (Avail.: 1,708,800) |  |  |
| 64         | 1,280 (<1%)      | 1,280 (<1%)         |  |  |
| 132        | 2,640 (<1%)      | 2,640 (<1%)         |  |  |
| Controller | 1,008 (<1%)      | 134 (<1%)           |  |  |



(A) 28K power waster attack



(B) 8K power waster attack

## Preliminary Results with Stratix 10 on DE10-Pro



Department of Electrical and Computer Engineering

### Attacking RSA through fault injection

 Exploiting the use of the Chinese Remainder Theorem (CRT)<sup>1</sup>:

| Direct RSA       | RSA with CRT (4x faster)                                                                                  |  | Х, Ү | Input, Output      |
|------------------|-----------------------------------------------------------------------------------------------------------|--|------|--------------------|
| $Y = X^e \mod N$ |                                                                                                           |  | е    | Priv. key exponent |
|                  | $Y = aY_1 + bY_2$<br>$Y_1 = (X \mod p)^{e \mod (p-1)} \mod p$<br>$Y_2 = (X \mod q)^{e \mod (q-1)} \mod q$ |  | N    | n-bit Modulus      |
|                  |                                                                                                           |  | p, q | n/2-bit Primes     |
|                  |                                                                                                           |  | a, b | Constants          |

 Goal: Inject fault(s) while computing Y<sub>1</sub> or Y<sub>2</sub>



- Fault during CRT reveals key
  - Output Y is assembled with a faulty  $Y_1$
  - Prime number *q* is revealed
  - Private key *e* can be reconstructed
  - *e* can also be extracted with a faulty  $Y_2$
- The attack works for any key length
- A single interaction is sufficient<sup>2</sup>

<sup>1</sup> D. Boneh et al., On the Importance of Eliminating Errors in Cryptographic Computations, Journal of Cryptology, 2001

<sup>2</sup> A.K Lenstra, Memo on RSA signature generation in the presence of faults, 1996

## RSA experimental setup

- 128-bit RSA implementation is placed in an area of 256 LABs
- Wasters are placed at random locations around the RSA core covering an area of 1,940 LABs
- A script running on host PC is responsible for controlling the experiment

Had Processor System ARM AS Subsystem PLS Memory Controller

(Quartus Prime 17.1 - ChipPlanner)

Resource utilization: Cyclone V 5CSEMA5F31C6

| RSA     | ALMs             | Flip-flops        | Memory [Kb]        | F <sub>max</sub> |
|---------|------------------|-------------------|--------------------|------------------|
| core    | (Avail.: 32,070) | (Avail.: 128,280) | (Avail.: 3,970 Kb) | [MHz]            |
| 128-bit | 1,236 (3.9%)     | 1,925 (1.5%)      | 16                 | 94.74            |

### Extracting the RSA private key



Department of Electrical and Computer Engineering

### How many wasters are required?

margin!

- Vary the number of wasters and find the probability of extracting the key
- Cyclone V:
  - 11K-12K PWs: high chance of extracting the key undetected
  - F<sub>max</sub>: 94.74MHz, F<sub>break</sub>: 166MHz (w/o wasters) ~4.5ns



- Number of PWs that can safely be activated
- Yield in less timing margin

work in progress



#### Department of Electrical and Computer Engineering

#### How many wasters are required? (cont'd)

- Vary the number of wasters and find the probability of extracting the key
- 11K-12K PWs: high chance of extracting the key undetected
- F<sub>max</sub>: 94.74MHz, F<sub>break</sub>: 166MHz (w/o wasters)





#### Summary

- Multi-tenant FPGAs
  - Logical next step for cloud computing
- Voltage based attacks
  - Easy to create power wasting circuits that induce faults or crash FPGA
- Characterizing voltage attacks on Arria 10
  - 15% core voltage drop within 8 us
  - Induces faults throughout device
- RSA attack
  - Single fault sufficient to expose key
  - Effective for Cyclone V (even defeats built in timing margin)
  - Effective for Arria 10 if design is overclocked