# OPEN POSSIBILITIES.

### **Composable Security Architectures**



Security

### **Composable Security Architectures**

Andrés Lagar-Cavilla, Principal Engineer, Google
Alberto Muñoz, Senior Principal Engineer, Intel
Bryan Kelly, Principal Engineer, Microsoft
Indranil Banerjee, Security Partner, Meta
Prabhu Jayanna, Director Product Security, AMD





#### **OPEN POSSIBILITIES.**

# **Premise and Problem Statement**



**Project Cerberus TRUSTED**<sup>®</sup> Cerberus COMPUTING 💼 open**titan** GROUP **CyRes OpenTitan** ヽ\_(ツ)\_/<sup>\_</sup> intel PFR DMTF D **Compute Project** Attestation Secure Boot OCP SPDM OPEN POSSIBILITIES.

### **Premise and Problem Statement**

- **1. Server Boot Integrity:** A system boots the intended mutable code.
- 2. **Remote Attestation:** A system is able to provide cryptographic proof of its identity and firmware integrity.



#### **System Integrators**

"we don't know how to build machines that will satisfy **all** customers/hyperscalers"

#### Hyperscalers

"There are too many different solutions for the same problem. OEM/OTS/OCP products do not fit our needs (yet)"

#### Suppliers

"there are so many [OCP] specs"

All: We would benefit from alignment and consistency to attain boot integrity and attestation

Presentation Goal: Admissible Architectures to drive alignment & consistency



#### OPEN POSSIBILITIES.

# Admissible Architecture: Approach



*"Journey of Discovery*": we will build the idea of an admissible architecture by breaking it down into three steps

| Capability                                                                 | Interoperability                                                                   | Orchestration                                                                           |
|----------------------------------------------------------------------------|------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| Establish definition of "critical devices" of a server platform            | <ul> <li>Physical interfaces (inband v/s sideband) for critical devices</li> </ul> | <ul> <li>Establish a common mission for an<br/>orchestrating device ("pRoT")</li> </ul> |
| <ul> <li>Set "Minimum Elements" needed<br/>for critical devices</li> </ul> | <ul> <li>Functional APIs for "Minimum<br/>Elements" for each device</li> </ul>     | Root critical device in a server<br>platform hierarchy / topology                       |
| CPU,<br>SSD,<br>DGPU,<br>SmartNIC,<br>Etc                                  |                                                                                    | PRoT,<br>BMC,<br>Control Plane Processor,<br>Etc                                        |
| OPEN POSSIBIL                                                              | API for admissi                                                                    | ible architectures<br>ace (Sideband)                                                    |

### Server Critical Devices: Criteria

#### NIST <u>SP 800 193</u> definition of a "Critical Platform Device":

"the set of platform devices necessary to minimally restore operation of the system, and sufficient to restore reasonable functionality, should themselves be resilient. We call this set of devices *critical platform devices*"

#### Start with CPUs and Peripherals Handling User Data for DC



OPEN POSSIBILITI<mark>ES</mark>.





### Server Critical Device RoT Structure



#### NIST 800-193, Platform Firmware Resiliency Guidelines

| Detection  | <b>RTM</b> : RoT for<br>Measurement<br>(a.k.a. RTD) | <b>Integrity</b> TCB without this,<br>can't tell what code is<br>handling user data | <ul> <li>Integrated Silicon RoT</li> <li>Higher Resiliency Bar</li> <li>Protects against MITM attacks</li> <li>Limited fuses</li> </ul>                       | and the second se |
|------------|-----------------------------------------------------|-------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Protection | <b>RTU</b> : RoT for<br>Update                      | <b>Availability</b> TCB without this, service may be denied at                      | <ul> <li>Discrete RoT Chip</li> <li>Platform Owner enforces Resiliency Policy</li> </ul>                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| Recovery   | RTRec                                               | scale                                                                               | <ul> <li>OOB updates, to what version, signed by whom</li> <li>Integrated flash for unlimited renewability</li> <li>Single portable implementation</li> </ul> | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |





# Server Critical Devices: RoT of Measurement



#### • First thing out of power-on/reset

- Must require no *programmable* power/clock external elements to function
- Measures **all** firmware for the device, without exception
- Measures (and protects) life cycle and HW state fuses
- Controls internal reset for all sub-systems in the device
- Provides (and protects) a unique device identity
- Provides unforgeable attestation rooted in unique identity



SECURITY

#### OPEN POSSIBILITI<mark>ES</mark>.

## **RTM Identity**

#### RTM firmware wields identity certificate chain, rooted in device hardware



Mutable RTM code must use secure boot: No unsigned code runs within the RTM perimeter



Each identity certifies the next



### **RTU - Update and Protection**

For large, automated fleets, the **update** path is a **DoS at scale risk** 





### **RTRec - Recovery**

For large, automated fleets, operator-based repairs are not scalable

- Need an additional layer of automated protection for availability
- Risk: Adversaries and bugs











# Call to Action (1/2)

- Define standardized FW descriptor and signing
  - TCG CO-RIM going to ballot in 2021, aiming for 2022/2023 1.0
- OCP defined manifest format for SPDM reporting of attestation blocks
  - Using CoMID, CoSWID and Tag IDs join Piotr and Alberto in Security WG calls
- Need MCTP hw channel in all phys connector specifications
  - Not i2c. We propose **i3c**
  - **Requirements**: Out of band, scalable, switchable, bi-directional, high bandwidth (ex: push 1Gbit of fw in seconds)
  - HW Specifications: PCI CEM @PCI-SIG, small form factor SSD, OCP NIC 3.0



#### OPEN POSSIBILITI<mark>ES</mark>.





# Admissible Arch Platform Root of Trust



#### NIST 800-193, Platform Firmware Resiliency Guidelines

| Detection  | <b>Aggregate</b> all RTM <b>measurements</b> , <b>verify fitness</b> of the system against <b>policy</b> of target firmware | <ul> <li>Detection: Control Plane aggregates measurements from all RTMs</li> <li>Trusted fabric service verifies fitness and authorizes service</li> </ul>                                                         |
|------------|-----------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Protection | Centralize <b>out of band update</b> flow                                                                                   | BMC centralizes OOB fw updating                                                                                                                                                                                    |
| Recovery   | Provide out of band recovery with <b>golden images</b> to RTRec of unresponsive critical devices                            | <ul> <li><b>pRoT</b> chip is the <b>minimum recovery foothold</b></li> <li>pRoT is its own RTM/RTU/RTRec</li> <li>Bootstraps recovery of BMC</li> <li>BMC supplies recovery images for critical devices</li> </ul> |







# Composable Admissible Architecture

Tl;dr list for offline viewing -- alignment on

- NIST 800-193 structure with RTM in SoC, discrete RTU/RTRec in PCB
- <u>DICE</u>-like key derivation for RTM renewable security
- <u>SPDM</u> 1.2+ as attestation lingua franca
  - Key derivation and transport consistent with OCP Attestation
  - Call to action on Co-RIM for fw descriptor, CoMID/CoSWID for SPDM attestation blocks
- OOB fw push path: <u>PLDM</u>
- Call to action on scalable, bi-directional, out of band path for MCTP: i3c
- Use of OCP <u>DC-SCM</u> for system and PRoT modularity
- Disaggregated PRoT model with trusted external verifier
- Documented in progress at <u>OCP Platform Security Overview</u>
  - Consistent with <u>OCP threat model</u> and <u>security checklist</u>

### OPEN POSSIBILITI<mark>ES</mark>.



# Call to Action (2/2)

- **Contribute** to <u>OCP Platform Root of Trust</u> document
- Make **MCTP** scalable, switchable, high-bandwidth i3c **sidebands** a standard
  - HW Specifications: PCI CEM @PCI-SIG, small form factor SSD, OCP NIC 3.0
- Define standardized FW descriptor and signing
  - TCG CO-RIM going to ballot in 2021, aiming for 2022/2023 1.0
- OCP defined manifest format for SPDM reporting of attestation blocks
  - Using CoMID, CoSWID and Tag IDs join Piotr & Alberto in Security WG calls
- Timeline for OCP docs ratification
  - <u>Attestation, Checklist, Secure Boot, Platform Security Overview</u>
- Upcoming silicon products that comply as "critical devices"

Project Wiki with latest specification : <u>https://www.opencompute.org/wiki/Security</u> **OPEN POSSIBILITIES.** 



#### **Open Discussion**







### **Critical Devices in Server: Minimal Elements**

|   | Resiliency RoT Service   | Critical Device | Orchestrator Device |
|---|--------------------------|-----------------|---------------------|
| _ | RoT for Detection (RTD)  | Mandatory       | Mandatory           |
|   | RoT for Update (RTU)     | Optional        | Mandatory           |
|   | RoT for Recovery (RTRec) | Optional        | Mandatory           |

#### Minimum Elements to Enable RTD

- Identity
- Confidentiality
- Integrity
  - Authentication
  - Measurement

#### **OPEN POSSIBILITIES.**



### Alternate View Critical Device RoT



SECURITY

#### NIST 800-193, Platform Firmware Resiliency Guidelines

| Detection  | <b>RTM</b> (RoT for Measurement)<br>a.k.a. RTD | <b>Integrity</b> TCB without this, can't tell what code is handling user data |
|------------|------------------------------------------------|-------------------------------------------------------------------------------|
| Protection | <b>RTU</b> (RoT for Update)                    | <b>Availability</b> TCB without this, service may be denied at scale          |
| Recovery   | RTRec                                          | State                                                                         |







# Renewable DICE problem

**Immutable keys** should be wielded by **immutable code**. But what if there's a bug in the code that wields that key?



Mutable RTM code must use secure boot: No unsigned code runs within the RTM perimeter







Options:

1.

3.

- Hardware identity is wielded by mask ROM.
- Hardware identity is wielded by FMC (first-mutable-code) and is bound to FMC's code identity.
  - a. FMC is effectively immutable.
  - Hardware identity is wielded by FMC and is bound to FMC's *signer identity*.
    - a. FMC is mutable, but its code identity cannot be attested.

Available for **nascent** implementations

Recommended for **mature** implementations



# SPDM Profile - measurement manifest

- FW Manifest and signing
  - SBOM produced by manufacturers
  - SWID unified Software Identification Tag xml based
  - CoSWID Concise SoftWare Identification Tag JSON based
  - CoMID Concise Module Identification Tag. For HW and embedded FW CDDL/JSON based
  - CoRIM Concise Reference Integrity Manifest. Signing container for CoMID and CoSWID
- Measurement manifest:
  - Use TCG CoMID as baseline
  - Direct mapping between CoMID and SPDM Measurement Manifest (i.e., provide CoMID schema as additional format for Measurement Manifest)
- Define minimum set of measurements required per device class

#### OPEN POSSIBILITI<mark>ES</mark>.



### SPDM Measurement - per device class

• Different class of devices might need different minimum set of measurements required. Some examples:





### PRoT and DC-SCM

- OCP <u>DC-SCM</u> provides a natural "house" for PRoT
- RTM in CPU neutralizes DC-SCI interposition attack
- BMC within SCM behaves as availability only actor
  - BMC can route SPDM/PLDM traffic, attestation challenges and fw images
  - End-to-end crypto keeps BMC out of boot TCB
  - BMC can always shut down the system -- but cannot brick with bad fw or forge attestations
- SCM enables PRoT modularity

OPEN POSSIBILITIES.



### pRoT Alternatives Considered



DC-SCM

All-in-one

- Cerberus, PFR, Titan: chip is RTM/U/Rec for BMC
- RoT chip is the minimum recovery foothold

PEN POSSIBILITIES.

- Bootstraps recovery of an entire system
- BMC media can supply recovery images for leafs
- BMC centralizes OOB fw updating
- BMC aggregates measurements from all RTMs
- Trusted fabric service verifies fitness and authorizes service
- Chip verifies fitness against manifest deployed to its NVM

- BMC + RoT are a single PRoT ASIC.
- All functions merged: BMC integrity, system recovery, fitness
- Typical in OEMs
- May not stay in preso





OPEN POSSIBILITI<mark>ES</mark>.

NOVEMBER 9-10, 2021