# OPEN POSSIBILITIES.

Introduction to the OCP HPC SubProject's HPCM (High Performance Computing Module)

Nomenclature : Read "Processor" as CPU and/or Accelerator



SERVER

### Re-Inventing HPC Architectures for a "Domain Specific Architecture" Computing World

Allan Cantle, CEO, Nallasway







#### System Level Domain Specific Architecture Template

#### System Level Domain Specific Architecture Template

**SERVER** 

HPC is increasingly Data Bound & Less So Compute Bound





OPEN POSSIBILITIES.

#### Heterogeneous w/ blurred Storage/Memory Boundaries



SERVER



#### HIGH PERFORMANCE COMPUTING



### Today's HPC Compute & Storage Challenge

- CORAL Summit HPC Machine example
  - 18 Minutes to Load 2.8PB Memory from Filesystem once!
  - 1.2 Days to Push ALL 250PB Filesystem thru Compute Racks!
- Need to Bring Compute, Memory and Storage much closer







### Data Centric HPC Solution - Abstract View

- Tightly Couple Compute with ALL/ANY Memory Types
- Efficiently share Processors Near Memory with Other Processors



SERVER

### If Tesla can "Re-Invent" then why not OCP? Training Tile **SERVER**



OPEN POSSIBILITIES.

....





### We need to Innovate across Silos!





SERVER







mage sourced from Stage 2 Planning Partners

### Disaggregated Racks to Hyper-converged Chiplets



**SERVER** 



Software Composable

Power Ignored Rack Interconnect >20pJ/bit

Poor Latency

Rack Volume >53K Cubic Inches Baseline Physical Composability

Power Baseline Node Interconnect 5-10pJ/bit

**Baseline Latency** 

Node Volume >800 Cubic Inches



Expensive Physical composability

Power Optimized Chiplet Interconnect <1pJ/bit

**Optimal Latency** 

SIP Volume <1 Cubic Inch



HIGH PERFORMANCE COMPUTING

NCVEMBER 9-10, 2021







## HPCM brings the two Together



Software Composable **Power Ignored** Rack Interconnect

**Poor Latency** 

>20pJ/bit

**Rack Volume** >53K Cubic Inches **Baseline Physical** Composability

**Power Baseline** Node Interconnect 5-10pJ/bit

**Baseline Latency** 

Node Volume >800 **Cubic Inches** 



Software & Physical Composability

**Power Optimized Flexible Chiplet** Interconnect 1-2pJ/bit

**Optimal Latency** 

Module Volume <150 Cubic Inches **Expensive Physical** composability

**Power Optimized Chiplet Interconnect** <1pJ/bit

**Optimal Latency** 

SIP Volume <1 Cubic Inch



**SERVER** 



HIGH PERFORMANCE COMPUTING

OCP HPC Module, HPCM,

#### Populated with E3.S, NIC-3.0, & Cable IO

VCVEMBER 9-10, 2021

SERVER

Overview of OCP HPC SubProject's HPCM (High Performance Computing Module)

Allan Cantle, CEO, Nallasway





### High Performance Computing Module, HPCM

- Modular, Flexible and Composable Module Protocol Agnostic!
  - Memory, Storage & IO interchangeable depending on Application Need
  - Processor must use HBM or have Serially Attached Memory



could Support Todav's Processors e.g. **NVIDIA Ampere** Google TPU **IBM POWFR10** Xilinx FPGAs Intel FPGAs **Graphcore IPU** PCIe Switches **Ethernet Switches** 

POSSIBILITIES



HPCM Interconnect for all Processor / Switch types 16x EDSFF 4C/4C+ + 8x Nearstack x8 Connectors Total of 320x Transceivers



**Example HPCM Bottom** View Populated with 8x E3.S Modules, 2x OCP NIC 3.0 Modules, 4x TA1002 4C Cables & 8x Nearstack x8 Cables



#### HIGH PERFORMANCE COMPUTING





# Memory IO is finally going Serial!

• Making Memory Composable with EDSFF E3.S like Storage & IO



**OPEN POSSIBILITIES.** 

NOVEMBER 9-10, 2021

### Modular Building Blocks Available Today

**SERVER** 

Network, Memory, & IO use <u>Common EDSFF Interconnect</u>



#### open possibiliti<mark>es</mark>.

# Dense Modularity = Power Saving Opportunity

- Processor Die Bump to E3.S ASIC <5 Inches Manhattan Distance</li>
  - Opportunity to reduce PHY Channel to 5-10dB, 1-2pJ/bit
- Enabling Low Power











# Installing 8 HPCMs in OAI Chassis

Inspur 21" Co-Planar system



- 21 inch 3OU, 34.6" (800mm) depth
- 8\*OAMs
- UBB: Combined FC+ 6 port HCM Topology

OPEN POSSIBILITIES.

- 4\*PCIE Gen4 x16 Link to connect Hosts
- 4\*PCIE Gen4 x16 Slots support 100G Infiniband or Ethernet for expansion



- 19 inch 6RU, 30 inch (762mm) depth
- 8\*OAMs
- UBB: Combined FC+ 6 port HCM Topology
- 4\*PCIE Gen3x16 slots for host uplink
- 12\*PCIE Gen3 x16 slots for flexible IO

expansion

(PCIE interface will be revised to Gen4 in next release.)

ZT Systems 19" Co-Planar System



- 19 inch 4RU, 34.6" (880mm) depth
- 8\*OAMs
- UBB: 8-port HCM topology
- 2\*PCIE Gen4 x16 Uplinks for Multi-Host
- 4\*PCIE Gen4 x16 Slots
- 4\*2.5" NVME hot plug drives in front



SERVER



#### HIGH PERFORMANCE COMPUTING



#### Re-Architect - Start with a Cold Plate For High Wattage HPCM Modules

- Capillary Heatspreader on module to dissipate die heat across module surface area
- · Heatsinks are largest Mass, so make them the structure of the assembly
  - · Integrate liquid cooling into the main cold plate



### Cold Plate from Backside

• 54V Power Bus Bars shown - Powering HPCMs





# Add Topology Cabling - No Retimers

Fully Connected Topology + Connections to HIB & QDD IO



## Add E3.S and NIC 3.0 Modules

• Pluggable into OCP OAI Chassis











**SERVER** 

How HPCM provides Efficient & Flexible Interconnect to support increased Fabric Speeds

Allan Cantle, CEO, Nallasway Tang Junyan, Mahesh Bohra, Dan Dreps, IBM Bob Dillman & Gus Panella, Molex





### Challenges of Compute Interconnect

- Growing demand for Faster and wider Interconnect
  - IO increasing % of Total Power
  - More IO = More Complex PCBs
  - PCB Losses increase
    - Shorter traces
- Retimers increasingly required
  - Add latency, Power, cost, & consume real estate
  - Zero return on investment!

#### OPEN POSSIBILITI<mark>ES</mark>.



Retimers and Active Cables increasingly required as Fabric Speeds increase

OAL

CIAM1

CIAM2

CIAM5

CAM6



### **HPCM Interconnect Innovation**

- HPCM Increases System Level Density
  - 3D Construction brings Compute, Media & IO Closer together
- Leverage TA-1002 Interconnect to support Media & IO Modules as well as Direct IO
- Leverage Nearstack-PCIe for motherboard-less cabled Fabric Topology Interconnect



### HPCM Processor to Media/IO Module

- Processor to Media / IO Module Manhattan Distance
  - 128mm (<5 Inches) worst case
  - ~10dB Channel with opportunity to reduce IO Power
- Possible further improvement using HPCM as Processor Substrate















SERVER

#### **HPCM** Processor to Module Interconnect With Packaged Processor and Controller Chips

#### Channel based on OMI Module:

- GL102 pkg wiring (30mm)
- Module Via S12
- 24mm Pin Area Wiring
- Meg6 Open Area (67mm)
- DIMM PCB Via S12
- DIMM Conn (C2)
- Meg6 Open Area (37mm)
- E3.S Controller Package (Nominal PCB and PKG corner)



### Insertion Loss allocation Table - Conservative

With Packaged Processor and Controller Chips

| Channel Section                   | Loss @ 32Gb/s | Comments                                                                                    |
|-----------------------------------|---------------|---------------------------------------------------------------------------------------------|
| GL102 package wiring (30mm)       | 2.8dB         |                                                                                             |
| Module Via S12                    | 1dB           | Assume 1.6mm via length & 15mil back drilled stub                                           |
| Meg6 - 24mm Pin Area Wiring       | 1.2dB         | Conservative assumption with 30mm package wiring & 24mm PCB<br>zig-zag wiring under package |
| Meg6 - Open Area PCB Trace (67mm) | 2.6dB         |                                                                                             |
| DIMM PCB Via S12                  | 0.9dB         | Assume 1.6mm via length & 15mil back drilled stub                                           |
| DIMM Conn (C2)                    | 1dB           |                                                                                             |
| Meg6 - DIMM Open Area (37mm)      | 1.4dB         |                                                                                             |
| E3.S Controller Package           | 0.4dB         |                                                                                             |
| Total Channel                     | 11.3dB        | Measured channel difference due to impedance discrepancies & behavior                       |

#### Insertion Loss allocation Table - Conservative Derived with Bare Die Processor and Controller Chips

| Channel Section                   | Loss @ 32Gb/s | Comments                                          |
|-----------------------------------|---------------|---------------------------------------------------|
| Module Via S12                    | 1dB           | Assume 1.6mm via length & 15mil back drilled stub |
| Meg6 - 24mm Open Area Wiring      | 1dB           |                                                   |
| Meg6 - Open Area PCB Trace (67mm) | 2.6dB         |                                                   |
| DIMM PCB Via S12                  | 0.9dB         | Assume 1.6mm via length & 15mil back drilled stub |
| DIMM Conn (C2)                    | 1dB           |                                                   |
| Meg6 - DIMM Open Area (37mm)      | 1.4dB         |                                                   |
| Total Channel                     | 7.9dB         | Empirical estimate only                           |

#### OAI Node Fully Connected Topology Module to Module Interconnect Topology SERVER Assume 112G PAM4 Fabric Speed • HPCM Loss, w/ packaged Proc ~8.5dB Longest Cable - 12 Inches ~ 5.8dB • 34awg Cable Loss = 0.37 dB/inch Connector Loss = 0.7dB/ea (typ) HIGH PERFORMANCE Total Channel Loss Estimate COMPUTING • Proc to Proc = 8.5 + 5.8 + 8.5• ~22.8dB • 7.2dB Spare on a 30dB 112G channel OPEN POSSIBILITIES.

### OAI Node HPCM to HIB Interconnect

- HPCM TA-1002 to HIB Examax Backplane
- 64G PAM4 CXL/PCIe G6 Fabric Speed
- HPCM Loss, w/ packaged Proc ~8.5dB
- Longest Cable ~12 Inches
  - TA-1002 Loss + PCB fingers ~ 2 dB
  - 34awg Cable loss = 0.37 dB/in
  - Backplane Loss ~ 0.7 dB
- Total Proc to HIB Backplane Loss
  - ~ 8.5 + 2 + 4.4 + 0.7 = 15.6dB
  - 14.4dB spare for HIB PCB and Switch package losses
  - Retimerless compared to UBB implementations



### OAI Node HPCM to QDD Interconnect

- HPCM Nearstack to QDD Fabric IO
- Assume 112G PAM4 Fabric Speed
- HPCM Loss, w/ packaged Proc ~8.5dB
- Longest Cable ~17 Inches
  - Connector Loss ~ 0.7 dB
  - 34awg Cable Loss ~ 0.37 dB/in
  - QDD Loss ~ 2.5 dB
  - Total Proc to QDD Loss
  - ~ 8.5 + 0.7 + 6.3 + 2.5 = 18dB
  - 12dB spare May Support Passive QDD Cables
  - Retimerless compared to UBB implementations

#### OPEN POSSIBILITI<mark>ES</mark>.



### Cabled Solutions are reliable

• IBM's High Reliability E1080 Server



#### OPEN POSSIBILITIES.









HIGH PERFORMANCE COMPUTING



**SERVER** 

How HPCM's Thermal Management Cold Plate solution turns traditional approaches on their head

Chris Chapman, Boyd Corporation Bob Dillman, Molex Allan Cantle, Nallasway







### **HPCM proposed Thermal Solution**

- Thermal Heat Spreader
- Required to Normalize Different HPCM Modules for mating to the main cold plate
- Cavities in Heat Spreaders required for surrounding components, primarily PSUs
- Necessitates 2 Thermal Interfaces
  - Silicon to Heat Spreader
  - Heat Spreader to Cold Plate



### **HPCM** proposed Thermal Solution

- Water Cooled Cold Plate
- Provides HPCM Mechanical Infrastructure
- Cold water to each HPCM site









HIGH PERFORMANCE COMPUTING



### **Cold Plate Feasibility**

 A single cold plate concept that delivers power to OAM modules utilizing 8 mesochannel "cooling cores" should perform similarly to a cooling loop array if each of the 8 OAM interfaces are independently fastened to the cold plate





### Evaluate "Cooling Core"

2.171.63

1.08

0.542

0.000

- Initial CFD analyzed the new form factor required
- Similar performance was obtained compared to a traditional OAM module cold plate





Out (S): T (C)

44.3

42.0

39.8

37.5

35.3

### 4x2 and 8x1 Flow Network

- Two flow network models were developed for the cold plate assembly
- The all parallel 8x1 array shows the lower pressure drop as shown in the 'PQ' curve



OPEN POSSIBILITIES.



HIGH

COMPUTING

### Cold Plate Thermal Performance

- The thermal resistance "RQ" curves are shown and the 4x2 array is split into two curves; one for parallel cores 1-4 and another for 5-8 which are in series in order
- The 8x1 resistance is lower however the cores 1-4 in the 4x2 will run cooler than any 8x1 core











### Summary

- Initial study indicates that a cold plate with meso-channel cooling cores will achieve the necessary cooling required as compared to conventional cooling loops
- Further study is recommended as additional electro-mechanical and packaging features can be incorporated into the cold plate as we now understand the keep out area necessary for cooling

#### OPEN POSSIBILITIES.





COMPUTING





### Air Cooling 128x E3.S Modules

- Up to 128x E3.S Modules @ 25W each
- Maximum Total Power 3.2KW
- Proposed airflow from bottom to top of E3.S modules
- Large Cooling surface area per module
- Baffling and Managing Airflow challenge

OPEN POSSIBILITIES.

# HIGH PERFORMANCE COMPUTING



### Call to Action

- Please help bring HPCM to reality by Joining the OCP HPC Sub Project
- We are also seeking Funding in order to build PoCs to prove out Concepts
- Where to find additional information (URL links)

Project Wiki with latest HPC Charter and Meeting Recordings : <u>http://www.opencompute.org/wiki/HPC</u>

Mailing list: <u>https://ocp-all.groups.io/g/OCP-HPC</u>

Meeting Calendar : <u>https://www.opencompute.org/projects/high-performance-computing-incubation</u>





### Thank you! Any Questions?

