

# BoW: A Die-to-Die Interface Solutions Specification Update

ODSA Project Workshop June 10, 2019 Ramin Farjadrad

Mark Kuemerle

#### High-level Targets for a Inter-Die Connectivity IP in MCM Packages

- Performance Targets:
  - Throughput Efficiency → ~1Tbps/mm (die edge)
  - Energy Efficiency → 1pJ/bit 0.5pJ/bit
- Small silicon area per IO port for dense integration
  - Goal: IP silicon area not to limit IO density
- Minimal analog/complex circuitry to offer easy/fast process porting
  - Limit the maximum baud rate of the interface → ~10G-16Gbaud
  - Common clock to replace CDR with DLL, Digital PLL to generate clock, etc
- Single supply IP supporting wide Vdd range: 0.70V 0.9V
  - To be compatible with most existing SoC/ASIC in popular/available process nodes
- Logical compatibility with AIB for ease of adoption

## BoW (ODSA Proposal): Single-ended Signaling

- BOW Base
  - Unterminated lanes → up to 4 Gbps/wire over 10mm
  - Source Synchronous with clock alignment
- BOW Fast
  - Terminated lanes → up to 16Gbps/wire over 50mm
  - DDR Source Synchronous with clock alignment
- BOW Turbo
  - Simultaneous Bidirectional → both directions
  - Terminated → up to 2x16Gbps/wire= 32Gbps/wire over 50mm
  - Source Synchronous with clock alignment







Note: Baud rate limited to 16Gbaud for simplicity of design and ease of port

#### Top Level View: Bow Backward Compatibility



- A Bow-Fast can be configurable to be backward compatible to Bow-Base
  - By disconnecting the line terminations
- A Bow-Turbo can be configurable to be backward compatible to Bow-Base/Fast
- •Bow-Fast:
  - By disabling Tx or Rx per lane
- •Bow-Base:
  - By disabling Tx or Rx per lane and disconnecting terminations

#### BoW Slice (Building Block) Bump Map



- BoW Slice with same circuitry/layout but different RDL comes in 2 bump maps to create efficiency in building larger BoW modules

- A common bump ordering to be used for BoW Slice Base/Fast/Turbo
- **BoW Slice**: 16 Data + 2 Clock ports
  - Configurable to be output clock or input clock pads when connected to non-Turbo Slices
- Data/Control Mode
  - To indicate Data or Control on Signal bumps
- Vertical stacking of Bow Slices increases throughput per die edge
- BoW Slice Maximum Throughput
  - BoW Base :16x4Gbps/pad = 64Gbps
  - BoW Fast:
    - 16x16Gbps/pad= 256Gbps
  - BoW Turbo: 16x2x16Gbps/pad=512Gbps

### **BoW Slice Specifications**

| Parameter                                   | Parameter                                                                                                                                                                      |  |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Single Supply Voltage                       | 0.70V-0.90V (+/-5%)                                                                                                                                                            |  |
| Baud rate/bump                              | Base: 1-4Gbaud, Fast: 1-16Gbaud, Turbo: 1-16Gbaud                                                                                                                              |  |
| Max Throughput/mm (stackable up to 3x)      | Base: 64Gbps, Fast: 256Gbps, Turbo: 512Gbps ← 1x Stacked Base: 128Gbps, Fast: 512Gbps, Turbo: 1024Gbps ← 2x Stacked Base: 192Gbps, Fast: 768Gbps, Turbo: 1536Gbps ← 3x Stacked |  |
| BoW Slice Dimensions<br>(Bump Pitch=130um)  | Height:330um, Chip Edge: 1170um                                                                                                                                                |  |
| Energy Efficiency (14nm)                    | < 0.7pJ/bit                                                                                                                                                                    |  |
| Trace length                                | Base: 10mm, Fast: 50mm, Turbo: 50mm                                                                                                                                            |  |
| Latency                                     | < 30/Gbaud ( <3ns @10Gbaud)                                                                                                                                                    |  |
| Slice Bump Allocation (excludes global AUX) | 16 Data, 2 Clock, 1 Mode, 4 Ground, 4 Power                                                                                                                                    |  |
| BER Target                                  | <1E-15 (No ECC)                                                                                                                                                                |  |
| ESD / CDM protection                        | 400V/100V                                                                                                                                                                      |  |

## **BoW Compatibility with AIB**



**AIB Signal Types** 



**Dual Mode Interfaces** 

- From logical viewpoint, BoW to use similar Interface Signals as AIB, with following differences:
  - Master Interface to also be the Clock Master in Data mode
  - Sideband Controls limited to 32 registers for both Master and Slave
- AUX signals are not duplicated like AIB and are independently hardwired
- BoW to provide Dual mode Master and Slave option
- 16 bit bus vs. 20+ bit bus

## BoW Master/Slave Clock Scheme



## SDR & DDR Clocking



## Transmit SDR & DDR I/O Mapping



## Receive SDR & DDR I/O Mapping









DDR

11

## **BoW Sideband Control Signals**

| Signal name                                      | Signal function                    | Bits           | Signal<br>origin<br>(far-side<br>or MAC) |
|--------------------------------------------------|------------------------------------|----------------|------------------------------------------|
| Calibration (Section 3.2.3)                      |                                    |                |                                          |
| ms_osc_transfer_en<br>sl_osc_transfer_en         | Oscillator calibration complete    | 1              | FS                                       |
| ms_tx_dcc_cal_done<br>sl_tx_dcc_cal_done         | TX DCC calibration complete        | 1              | FS                                       |
| ms_rx_transfer_en<br>sl_rx_transfer_en           | RX calibration complete            | 1              | FS                                       |
| ms_rx_dll_dcc_lock_req<br>sl_rx_dll_dcc_lock_req | Start RX calibration               | 1              | MAC                                      |
| ms_rx_dll_lock<br>sl_rx_dll_lock                 | RX DLL locked                      | 1              | FS                                       |
| ms_tx_transfer_en<br>sl_tx_transfer_en           | TX calibration complete            | 1              | FS                                       |
| ms_tx_dll_dcc_lock_req<br>sl_tx_dll_dcc_lock_req | Start TX calibration               | 1              | MAC                                      |
| User Defined                                     |                                    |                |                                          |
| External Control                                 | Defined by protocol or application | MS: 8<br>SL: 8 | FS or MAC                                |
| Other                                            |                                    |                |                                          |
| Reserved                                         |                                    | MS/SL:9        | NA                                       |

- BoW uses the same 7 fixed Sideband Controls as AIB, 16 external controls for both Master & Slave, and 9 Reserved controls
- Limit sideband clock to 200M-400MHz
  - Fewer controls bits relaxes clock frequency
- The functions of the fixed sideband controls are the same as AIB with same source:
  - Far-side (link-partner)
  - MAC (local)

#### BoW Async and Sideband Control Communication



- Bumps are valuable in BoW solution. So need to efficiently use every bump. To avoid burning extra bumps for control signals, BoW uses Data bump for Controls in Cal mode
- Mod is a Shared Open-drain bump between connected Master/Slave Slices
  - Mod=1→ BoW slice is in Data mode
  - Mod=0→ BoW Slice goes into Calibration mode and the bumps functions changes as follows:

S2: ms\_mac\_rdy,

S3: ms\_adapter\_rstn

S4/S11: ms\_sbnd\_Clk/Clkb,

S9: ms\_sr\_data

S10: ms\_sr\_load

S6/S13: sl\_sbnd\_Clk/Clkb,

S7: sl\_adapter\_rstn

S8: sl\_mac\_rdy

S14: sl\_sr\_data

S15: sl\_sr\_load

#### **Test & Calibration Options**

- 1149.1 Legacy/1149.6 High-speed (JTAG scan)
- IEEE 1500 (HBM type systems)
- At-Speed Self Test
  - Serial At-Speed PRBS Self Loopback: Tx(Port-N\_Chiplet-A) → Rx(Port-N\_Chiplet-A)
  - High-Coverage Wafer Test Screen (pre-package) → loopback traces on ATE load board
- At-Speed System Test/Compliance
  - External PRBS Loopback: Tx(Port-N\_Chiplet-A) → Rx(Port-M\_Chiplet-B)
  - Eye Monitor for At-Speed Test/Compliance: Tx(Port-N\_Chiplet-A) → Rx(Port-M\_Chiplet-B)
    - Per data bump: Measure Errors Rate for each phase and voltage threshold
- Calibrations
  - PLL/DLL Lock
  - Rx Phase Lock
  - DCC Calibration (optional)

#### Summary

- Concept proven in 14nm Silicon (Hybrid easy to port to other nodes)
- Over 1Tbps/mm chip edge over 50mm organic substrate (pitch= 130um)
  - Full-duplex Throughput/bump up to 32Gbps (2x16Gbps)
- Small area per port of <0.02mm<sup>2</sup>
- Less than 0.7pJ/bit in 14nm (<0.5pJ/bit estimate in 7nm)</li>
- Single power supply 0.7V-0.9V: Compatible with synthesized logic circuits
- Easy and quick to port into other process nodes (16G NRZ vs 112G PAM4)
- Backward compatibility
  - A Chiplet with Bow Turbo interoperates with other BoW interfaces
- High Wafer-level test coverage per Chiplet to improve final product yield

#### Target Specification Milestone

- We Are Targeting:
  - Spec Draft V0.5 by Hot Interconnect Workshop in August
  - Spec Draft V0.9 by next ODSA Workshop in September
  - Spec Draft V1.0 by ODSA Workshop in November

#### Why BoW Specification Makes Sense

- There are multiple versions of die-die interfaces being built by multiple companies today
  - Open standard makes sense to converge design points
- A common and simple parallel interface at relatively low data rate enables chips in older nodes to be integrated into the system
  - Example: Critical for RF component integration
- Master/Slave mode simplifies clock generation requirements
  - Also helpful for legacy technology implementations
- Expandability of the same pins in higher data rates with BoW
   Fast and Turbo allows for chiplets to meet multiple applications
  - Example: Streaming data with several RF chips

Same die switching accelerator data with NIC, FPGA





#### How You Can Help

#### Seeking volunteer(s) for

- Foundries to support implementation of BoW PHY IP in their high-end process nodes (esp Finfet)
- Development of the BoW PHY IP in multiple process nodes (28nm to 7nm and even 5nm)
  - Availability of the BoW in wide range of process nodes will encourage larger and faster adoption
- Development of PHY Interface to different Standards and Protocols
  - BoW PHY needs a separate dedicated interface layer developed for each transmission standard having different coding and partitioning
    - Example: PCIe/CXL, Ethernet, etc



## **BACK UP**

## BoW Turbo: IO Block Diagram

