

## Datacenter-ready Secure Control Module for Open Accelerator Infrastructure (OAI-SCM)

#### Siamak Tavallaei, Principal Architect Microsoft, Azure **OCP** Server Project co-Lead

September 27, 2019







SERVER

PLATINUM





# Modular Building Block Architecture (MBA)

- Clearly defined physical dimensions, power, cooling, management, and input/output ports
- Module interoperability in various **OAI** Systems







# Modular Building Block Architecture (MBA)

- Clearly defined physical dimensions, power, cooling, management, and input/output ports
- Module interoperability in various OAI Systems

**MBA** is a <u>Catalyst</u> for interoperable Innovation! 







# **OAI-SCM Facilitates MBA**

- A datacenter-ready secure control module for OAI
- Hyperscale Datacenters







#### OAI-SCM streamlines the design and deployment of OAI Modules for



# **OAI-SCM Facilitates MBA**

- A datacenter-ready secure control module for OAI
- Hyperscale Datacenters
- at-scale remote debug)







OAI-SCM streamlines the design and deployment of OAI Modules into

 Drives a common and Secure Monitoring, Control, and Remote Debug for various OAI Modules and Systems (firmware, diagnostic tools, manufacturing tools,



# **OAI-SCM Tenets**

- a standard Interface (**OAI-SCI**)
- OAI Modules are stateless
- OAI-SCM provides a common datacenter interface (1GE, BMC, RoT, ...)
- Flexible Architecture for modular implementation
- Flexible form factors while maintaining the same interface (OAI-SCI)
- The same firmware
- The same look and feel





Modular, Interoperable: Clear delineation of **OAI-SCM** from the rest of the OAI System via

OAI-SCM holds control bits secured via a Root of Trust Chip (RoT) and Chains of Trust

A common set of SCM circuitry and logic may reside on Host Interface Board (OAI-HIB)





#### The Datacenter-ready Secure Control Module for OAI (OAI-SCM)

#### The Datacenter-ready Secure Control Interface (OAI-SCI)





&



# Standalone OAI-SCM

- Remote Control, Management, and Debug at Cloud-scale Receives Power (12V, 3.3V Stand-by, ...) over OAI-SCI from the PDB or the HIB
- Includes:

FAN/PSU/HIB/PDB/UBB/OAM/Chassis/Module Control Realtime Clock (RTC) and its battery RoT and TPM for Security (attestation for all firmware and programmable devices) Clearly defined JTAG and I<sup>2</sup>C Trees





- The main PowerEn, PowerGood, and Reset functions for power-up and power-down sequences BMC for OAI-level monitoring, control, diagnostics, and remote debug and for Rack Management Interface
- Flash Devices for infrastructure-level FW controlled by RoT (OAM maintains its own FW and provides attestation to RoT)
- Front-IO (LEDs, {RJ45: 1GE for BMC}, {RJ45: Presence/PowerEn/Throttle Control}, {RJ45: 1GE for Connection Manager})







# OAI Host Interfaces

Accommodating various implementations:

- Disaggregated Host
- Integrated Host
- Multi-UBB or Multi-Host OAI Systems







# Host DC-SCM and OAI-SCM (**Disaggregated** Host)

 The Host and the OAI System will have their own independent set of BMC and RoT chip









# Disaggregated Host-to-OAI Interface (draft) via a Re-timer Card in a PCIe Slot

| Signal Name              | Pin Cnt.(MB1) | Direction | Function                                                             |
|--------------------------|---------------|-----------|----------------------------------------------------------------------|
| Host interface x16 Diff. | 32            | BI        | For host interface x 16 signal                                       |
| SMCLK/SMDAT              | 2             | BI        | Host BMC to OAM system BMC                                           |
| Bifur_ID0                | 1             | OUT       | To host for host interface bifurcation                               |
| Bifur_ID1                | 1             | OUT       | To host for host interface bifurcation                               |
| Present Pin              | 1             | OUT       | Cable present for MB, PD on baseboard, PU on Re-timer card to IO EXT |
| 100M CLK                 | 2             | IN        | Host interface 100M CLK(From host's CLK Gen)                         |
| WarmRST                  | 1             | IN        | warm reset from host                                                 |
| host interface Reset     | 1             | IN        | if it's PCIe, then PCIe RST                                          |
| Diff. reference GND      |               |           |                                                                      |
| USB                      | 2             | BI        | USB from host to OAM system BMC                                      |
| Accel_PWR_EN             | 1             | IN        | From host SYS_PWRGD, to enble Accel system power                     |
| Accel_PWRGOOD            | 1             | OUT       | Accel system pwrgd back to host                                      |
| ÷ · · ·                  | 15            |           |                                                                      |









# Host DC-SCM and OAI-SCM (Integrated Host)

 BMC and RoT of the Integrated Host extend management, control, and security to the whole OAI System





# Host DC-SCM and OAI-SCM (Integrated Host)

- BMC and RoT of the Integrated Host extend management, control, and security to the whole OAI System
- Expansion Manager to control Expansion Ports







# Host DC-SCM and OAI-SCM (multi-UBB or multi-Host OAI Systems)

- Consolidated power/cooling for the **OAI** System
- Independent BMC/RoT
- Each UBB/HIB will have its own BMC/RoT







# Host DC-SCM and OAI-SCM (multi-UBB or multi-Host OAI Systems)

- Consolidated power/cooling for the **OAI** System
- Independent BMC/RoT
- Each UBB/HIB will have its own BMC/RoT









### Power Enable, Power Good, Reset# Sequence

- **L**. OAI-SCM provides Presence# to and receives Chassis\_PowerEn and Throttle signals from the Rack Manager (Chassis\_PowerEn controls all DC Power including Standby power)
- 2. OAI-SCM Asserts Power\_En to Modules such as UBB
- 3. Every Module enables its VRs, HSCs, etc..
- 4. The output all VRs, HSCs, etc. reflect Module\_OK
- 5. Based on all Modules Okay, SCM drives OAI\_PowerGood
- 6. Clocks/PLLs stabilize
- Later, SCM negates OAI\_Reset# to allow OAMs etc. to run

Power\_En based on Firmware etc.





The BMC of SCM may choose to hold OAI in Reset#, to negate PowerGood, or not to assert







# Power Enable, Power Good, Reset#







# OAI-SCM Interface (OAI-SCI)

- **Receives Power Rails**
- Controls Power UP and Power Down Sequences of OAI Modules
- Controls PowerGood, SystemHardReset# (clears sticky bits), and WarmReset# (resets) state machines, but preserves error status registers)
- JTAG to control devices on HIB and UBB
- I<sup>2</sup>C to control devices on HIB, UBB, and other OAI Modules
- UBB/OAM Power Brake control
- Critical Signals such as ThermTrip#
- Communications (UART, I<sup>2</sup>C, ...)
- Pin-reducing sGPIO (LD, CLK, Di, Do) between SCM/HIB CPLD and UBB CPLD















# JTAG Tree







#### UBB Vref Secondary CPLD OAM7\_L\_JTAG Aux\_3V3 OAM6\_L\_JTAG OAM5 L JTAG OAM4\_L\_JTAG OAM3\_L\_JTAG OAM2\_L\_JTAG OAM1\_L\_JTAG OAMO\_L\_JTAG SEL[3:0] Main\_3V3 BMC\_JTAG\_B SEC\_CPLD\_JTAG OAM7\_H\_JTAG OAM6\_H\_JTAG OAM5 H JTAG OAM4 H JTAG OAM3\_H\_JTAG OAM2 H JTAG OAM1\_H\_JTAG OAMO\_H\_JTAG





# **USB Ports** (for OAM Debug via UART– not for production)









# I<sup>2</sup>C Tree 1<sup>2</sup>C BMC Connector









# Power Brake Tree





#### UBB P<u>3V</u>3 **PWRBRK 0#** P<u>3V</u>3 PWRBRK\_1# P<u>3V</u>3 P3V3\_AUX PWRBRK 2# P<u>3V</u>3 GPIO0-CPLD PWRBRK\_3# GPIO1 GPIO2 P<u>3V</u>3 GPIO3 GPIO5 PWRBRK\_4# GPIO6 GPIO8 P<u>3V</u>3 GPIO7 PWRBRK\_5# P<u>3V</u>3 PWRBRK\_6# P<u>3V</u>3 PWRBRK\_7# PWRBRK#



# OAM Presence# and ThermTrip#







#### UBB PRSNT THERMTRIP



#### Summary:

**MBA** is a *Catalyst* for interoperable *Innovation*!

**OAI-SCM** facilitates Modular Building Block Architecture

 As a datacenter-ready secure control module for OAI, OAI-SCM tools, manufacturing tools, at-scale remote debug)





• Clearly defining physical dimensions, power, cooling, management, and input/output ports enables Module interoperability in various OAI Systems

streamlines the design and deployment of OAI Modules for Hyperscale Datacenters by driving a common and Secure Monitoring, Control, and Remote Debug for various OAI Modules and Systems (firmware, diagnostic



# Call to Action

Design your OAI System with OAI-SCM in mind

Make your solution Datacenter-Ready!

#### Join the effort to enhance OAL https://www.opencompute.org/wiki/Server/OAI













# Open. Together.

OCP Regional Summit 26–27, September, 2019





# Presenter

<u>Siamak Tavallaei</u> is a Principal Architect at Microsoft Azure, co-chair of OCP Server Project, and co-chair of CXL Technical Task Force. Collaborating with industry partners, he drives several initiatives in research, design, and deployment of hardware for Microsoft's cloud-scale services at Azure. He is interested in Big Compute, Big Data, and Artificial Intelligence solutions based on distributed, heterogeneous, accelerated, and energy-efficient computing. His current focus is the optimization of large-scale, mega-datacenters for general-purpose computing and accelerated, tightly-connected, problem-solving machines built on collaborative designs of secure, hardware, software, and management.





