Open Domain-Specific Architecture: Protocols to PHYs

Dave Kehlet
Sam Fuller
Jaideep Dastidar
Imran Ahmed
Legal Information

This presentation contains the general insights and opinions of Intel Corporation ("Intel"). The information in this presentation is provided for information only and is not to be relied upon for any other purpose than educational. Statements in this document that refer to Intel's plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel's results and plans is included in Intel's SEC filings, including the annual report on Form 10-K.

Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.

Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

† Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Copyright © 2019 Intel Corporation.
Intel, the Intel logo, the Intel Experience What's Inside logo, eASIC, and Stratix are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others
Contributors and Collaborators

Imran Ahmed – AnalogX
Jaideep Dastidar – Xilinx
Kevin Drucker – Facebook
Sam Fuller, NXP
Quinn Jacobson – Achronix
Peter Jenkins – Avera Semi
Brian Kahne – NXP
David Kehlet – Intel
Mark Kuemerle – Avera Semi
Paul Mattos – Avera Semi
Dimitry Melts – Facebook
Gary Miller – NXP
Millind Mittal – Xilinx
Mats Myrberg – Microsoft
Bapi Vinnakota – OCP
Two Broad Use Cases

Package Level Integration
Standardized motherboard interfaces enable the PC ecosystem

SOC Disaggregation
Standardized chiplet interfaces enable a package-level integration ecosystem
Standardized SOC interfaces (AMBA/AXI) enable foundry ecosystem

Use cases drive protocols!
ODSA Protocol Stack – 9/2019

- Bunch of Wires (BOW), AIB
- PIPE Interface
- ODSA Protocol Stack
- Firmware Infrastructure for: Accelerator Use, Host Integration
- Coherent Protocol (CCIX or TileLink or...)
- Instruction-Driven Transfer (ISF Transport Layer)
- DMA
- Routing (To Be Developed From ISF Routing Layer)
- Inter-Chiplet (e.g. PCIe, TileLink or new Link Layer)
- Intra-Chiplet (From ISF Link Layer)
- PCIe PIPE
- PHY
- USR SerDes
- Bunch of Wires (BOW), AIB
- Organic Substrate Packaging + Interconnect
- NFP, RISC, FPGA, ML ASIC, SerDes

Open. Together.
Common Link Layer Exploration

Link Layer
- Can we find a useful standard link layer interface?
- Typical link layer services are:
  - Framing (delimit a bundle of die-to-die transfer)
  - Flow control
  - Error correction, detection
- Link layer interface upwards (toward application)
- Link layer definition downwards (to PHY)
  - Flit (Flow Control Unit, a stream unit) format

Protocols (e.g. AXI/TileLink/CHI) are framed and sent to the D2D PHY.
Flit width needs to be adaptable to the D2D physical interface.
ODSA Protocol Stack

**Already Exists**
- **TPA New**
  - **On-Die Protocol Tx/Rx Channelization New**
  - **On-Die Protocol Tx/Rx Channelization & TPA New**

**Package Level Integration**
- **Transport optimized**
  - **PHY optimized**
    - Protocol (e.g. PCIe)
      - PIPE I/F
      - PHY
    - TPA?
    - PHY-specific Link Layer
    - PHY Specific I/F?

**SOC Disaggregation**
- **PHY optimized**
  - **Transport optimized**
    - Protocol (e.g. AXI4)
      - TPA?
      - PHY-specific Link Layer
      - PHY
Coming Up

**DiePort Overview**, Sam Fuller
- Adapts the AXI4/ACE-Lite application interface to a 2-channel die-to-die PHY

**Transport and Protocol Agnostic Interface**, Jaideep Dastidar
- Encapsulates and sends/receives multi-protocol packets between chiplets

**PIPE Adapter**, Imran Ahmed
- Mapping the industry standard PHY Interface for PCI Express (PIPE) to your own PHY
NXP’S DIPORT DIE TO DIE TECHNOLOGY

TECHNICAL INTRODUCTION TO ODSA

SAM FULLER,
GARY MILLER, BRIAN KAHNE
18.DEC.19
Goals of the DiPort Technology

- Optimized for die-to-die communication
  - Latency, bandwidth, power, and cost
- Automatic integration of a unified memory map at start-up
- Connected die appears to software as on-die resources
- Interoperability of mixed technology die
- Critical system information and messages are automatically shared across die
- Quality of service and decoupling of latency
- Low-cost and scalable SiP interface for use with low density SiP assembly technologies
- Solve start-up, reset, frequency change, and errors for SiPs (e.g., power up/down, localized/catastrophic errors)
- Error detection, prevention and functional safety
- Robust test support
- Optional Plug-and-Play if needed for a generalized solution
Serial diPort General Features

• Provides a virtual interconnection of AXI bus between two die
• Supports signaling of hundreds of system states, messages, etc. between two die
  - Automatic replication on other die
  - Handles all clock-domain crossing for scalar and vectors
• Error detection, corruption prevention and resiliency (e.g., functional safety)
• Optimized for AXI channels protocol
  - Tracks AXI state flow with a minimal amount of AXI attributes transferred between die
  - Minimal packetizing delays for both single and burst transactions
  - Pipelining for many simultaneous reads and writes for improved latency and bandwidth
  - Expandable for ACE-Lite (or ACE) support
  - Virtual-linked/hardware-synchronized AXI bus channels between die
Architectural Overview

Serial diPort (SD) Master/Slave

- AXI Master (raddr, waddr, wdata)
  - SAXI Interface
  - SAXI Msg Gen

- AXI Slave (wresp, rdata)
  - MAXI Interface
  - MAXI Msg Gen
  - TX Manager

- Internal signals (die output)
  - Signalling Interface
  - Signalling Msg Gen

- Internal signals (die input)
  - Signalling Interface
  - Sig/FC Manager

- AXI Master (wresp, rdata)
  - SAXI Interface
  - SAXI Transact Gen

- AXI Slave (raddr, waddr, wdata)
  - MAXI Interface
  - MAXI Transact Gen

- APB
  - Registers

- TX EB
  - Link Layer (LL)
  - Sig/FC EB
  - Splitter
  - AXI EB

- Physical Interface Macrocell (DPIM)
  - Override Muxing

- RX IN
  - RX
  - TX Out

Serial diPort Bus
Overview of Messages

- **Message types**
  - AXI, Flow Control (FC) and Signaling (SIG)

- **Message packing**
  - Packed for back-to-back AXI messages
  - Packed for back-to-back SIG/FC messages
  - Close 1 clock when transitioning

- **Message ID and CRC**
  - 6-bit header for all messages
  - 8-bit CRC for AXI and 6-bit for FC and SIG

### AXI/FC Message

<table>
<thead>
<tr>
<th>AXI/FC Message</th>
<th># of Bytes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Write Address 32</td>
<td>10</td>
</tr>
<tr>
<td>Write Data 8 bits</td>
<td>6</td>
</tr>
<tr>
<td>Write Data 16 bits</td>
<td>6</td>
</tr>
<tr>
<td>Write Data 32 bits</td>
<td>8</td>
</tr>
<tr>
<td>Write Data 64 bits</td>
<td>12</td>
</tr>
<tr>
<td>Write Data Last 64 bits</td>
<td>12</td>
</tr>
<tr>
<td>Write Data Response</td>
<td>4</td>
</tr>
<tr>
<td>Read Address 32</td>
<td>10</td>
</tr>
<tr>
<td>Read Address 48</td>
<td>10</td>
</tr>
<tr>
<td>Read Data 8</td>
<td>6</td>
</tr>
<tr>
<td>Read Data 16</td>
<td>6</td>
</tr>
<tr>
<td>Read Data 32</td>
<td>8</td>
</tr>
<tr>
<td>Flow Control</td>
<td>4</td>
</tr>
<tr>
<td>Idle</td>
<td>2</td>
</tr>
</tbody>
</table>

All AXI/FC messages provide flow control

### SIG Messages

<table>
<thead>
<tr>
<th>SIG Messages</th>
<th># of Bytes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Signal write</td>
<td>6</td>
</tr>
<tr>
<td>Signal read</td>
<td>6</td>
</tr>
</tbody>
</table>
Transport and Protocol Agnostic (TPA) Chiplet Interface

ODSA Workshop
Dec 18, 2019
ODSA Protocol Stack

Use Cases

PHY-Optimized

Transport Optimized

Txn Layer / Port

PCIe Txn  CCIX Txn  CXL Txn

Link Layer

PCIe Link Layer  CXL Link Layer

PIPE I/F

Link Layer

optional bypass

Open TPA Interface that is Multi Protocol capable

Pipe Agnostic Port (PA Port) - Mux/Demux/Port forwarding (optional)

Legend:

ODSA Standardized Interfaces (no logic)

ODSA Spec Scope

© Copyright 2019 Xilinx
What the proposal covers

- On-chip interface + framing between Protocol’s Transaction Layer and Data Link Layer of Chip-to-Chip Interface
- On-chip interface + framing is the same whether Serial or Parallel Chip-to-Chip PHY and Scales across Protocol Types
- Standard bandwidth quanta for Interface width:
  - Common Interface Protocol constructs at 0.5Tbps, 1Tbps, and 2Tbps
    - Normalized to 1GHz: 512b, 1024b, and 2048b interface
- Interface definition sanity check against Protocols:
  - AXI, PCIe, CCIX, CXL
    - PCIe, CCIX are already packetized so can map to interface framing
    - AXI, CXL are not packetized formats – ODSA can create one or consider proposals for packetization – e.g. DiePort
TPA Framing for Packetized Protocols
512b Interface with Max 4 Start/End Markers per cycle

4 16B-aligned Packet Start/End Markers

512b Interface (0.5Tbps @1GHz)

1024b Interface with Max 8 Start/End Markers per cycle

8 16B-aligned Packet Start/End Markers

1024b Interface (1Tbps @1GHz)
Example 4 Protocol Mix over 1024b Interface

PID: Protocol ID
SM: Start Marker
EM: End Marker

<table>
<thead>
<tr>
<th>PID</th>
<th>SM</th>
<th>EM</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>1 0</td>
<td>xx</td>
</tr>
<tr>
<td>xx</td>
<td>0 0</td>
<td>xx</td>
</tr>
<tr>
<td>xx</td>
<td>0 0</td>
<td>xx</td>
</tr>
<tr>
<td>xx</td>
<td>0 0</td>
<td>xx</td>
</tr>
<tr>
<td>xx</td>
<td>0 0</td>
<td>xx</td>
</tr>
<tr>
<td>xx</td>
<td>0 1</td>
<td>01</td>
</tr>
<tr>
<td>xx</td>
<td>0 0</td>
<td>xx</td>
</tr>
<tr>
<td>xx</td>
<td>0 0</td>
<td>00</td>
</tr>
<tr>
<td>xx</td>
<td>0 0</td>
<td>xx</td>
</tr>
</tbody>
</table>
For lower levels, extend TPA with PHY framing and EDC
PIPE adapter group contributors and collaborators

- Imran Ahmed - AnalogX
- Halil Cirit – Facebook
- Ramin Farjad – Aquantia
- Brian Holden – Kandou Bus
- Rita Horner – Synopsys
- David Kehlet – Intel
- Mark Kuemerle – Global Foundries
- Paul Mattos – Global Foundries
- Bapi Vinnakota – Netronome
- Robert Wang – AnalogX
- Jerrold Wheeler – Synopsys
Overview

1. Why do we need a PIPE interface for Chiplets?
2. How to implement a PIPE interface for Chiplets
3. Progress on first draft
4. Next steps
Which Chiplet PHY to use?

- Choice of optimal PHY depends on application and physical interconnect

**Die # 1**

- Interposer based

**Die # 2**

- Organic substrate

- Higher pin count, parallel bus
- E.g. PHYs: AIB, BoW
- Lower speed bus
- Higher cost packaging

**Die # 1**

- Lower pin count, serial bus
- E.g. PHYs: AXDieIO, CNRZ-5
- Higher speed bus
- Lower cost packaging

- No universal PHY solution to connect chiplets
- **An open Chiplet ecosystem needs to accommodate different PHYs**
How to connect the PHY to upper layers in a chiplet?

- Given there are many choices of chiplet PHYs how do we efficiently connect the PHY to the upper layers?

1. Fully custom MAC/PHY interface
   - Custom design, one-off
   - Cannot mix/match PHYs

2. Common MAC/PHY interface
   - Enables interoperability, ecosystem
   - PIPE a good example of a common MAC/PHY interface already used extensively
Use PIPE to interface PHY to upper layers

- PIPE defines interface between MAC controller and PHY
  - Establishes connectivity and features to be implemented on either side of the interface
- Enables PHY to be designed independently of MAC and vice-versa
- PIPE compliant with PCIe, SATA, USB, DP, Converged IO, CXL, CCIX,
- PIPE compliant IP readily available
Adapting PIPE for Chiplets

- Some features of PIPE supported protocols may not be needed for chiplets
- Provide guidance on how to adapt PIPE for Chiplets while optimizing for lower reach, power and latency
How to implement PIPE for Chiplets

• PIPE specification categorized into following options:

1. **Mandatory**
   - Any requirement that is essential to PIPE functional correctness is mandatory
   - I.e. do not break the controller, don’t break the PIPE, don’t break the firmware

2. **Optional / Not necessary**
   - Chiplet PHY can assert and behave as though it met the PIPE requirement even if it does not implement PIPE feature
   - Chiplet PHY can spoof/mimic PIPE behavior to preserve upper layers
   - To MAC and upper layers it looks like PIPE is satisfied/preserved, thus can use off-the-shelf PIPE compliant upper layer IP
First draft will target PIPE compliant rates

- To take advantage of existing IP first draft supports PIPE compliant rates e.g. PCIe gen 5 (32Gbps) between MAC and PHY

- Future releases to support non-standard rates and configurations
Next steps

• Good progress on converging to a public release document
• Working document accommodates different PHY topologies
• Building first draft release of document by OCP Global Summit
• Looking for further feedback especially from the controller community

Join our weekly meetings
Tuesday mornings 9AM PST!