OCP FTI · Future Technologies Initiative

Scaling AI Clusters at Neoclouds

2026 Focus Survey Results  ·  March 2026  ·  Working Draft — Workgroup Distribution
18Responses
16Organizations
6Neocloud Operators
4Regions
12Active Contributors
Key Signal: Interoperability (BMC/BIOS/firmware/telemetry) scores highest on both importance AND perceived OCP leverage — making it the clearest immediate contribution target. Click any tile to expand.
1
HW Mgmt &
Interop
4.6 / 4.2
2
Benchmarking &
Performance
4.2 / 3.8
3
Networking
Patterns
4.1 / 4.1
4
Rack & Power
30–100kW
3.7 / 4.2
5
Power Sourcing
& Distribution
4.1 / 3.7
6
DC Facilities
& Cooling
3.9 / 3.5
7
Sustainability
& Efficiency
3.9 / 3.4
8
Storage for
AI Workloads
3.6 / 3.7
Scores: Importance / OCP Leverage (avg 1–5 scale). Click any tile to expand detail.
HW Management · Interoperability — BMC / BIOS / Firmware / Telemetry
Importance: 4.6  ·  OCP Leverage: 4.2  ·  Primary OCP Contribution Target
Why it matters
Ranked #1 challenge by the most respondents. Denvr ×2, Scaleway, FarmGPU, Vertiv, Opengear, and Fujitsu all flag BMC/BIOS firmware inconsistency as a Day-2 ops blocker. Coordinated vendor pressure is needed.
OCP Alignment
OCP HW Mgmt ProjectOpenBMC (LF)OCP S.A.F.E. FWDMTF Redfish
Proposed Deliverable
Interoperability checklist: minimum BMC/BIOS/telemetry requirements vendors must meet for neocloud deployment. Gap-ticket proposals into OCP HW Mgmt Project. 60–90 day quick win: interop pain-point roundup doc.
Who Will Contribute
Denvr Dataworks (lead), Scaleway, FarmGPU, Opengear, Vertiv
Benchmarking + Performance Assurance
Importance: 4.2  ·  OCP Leverage: 3.8  ·  Most original neocloud contribution — gap exists
Why it matters
All five neocloud operators cite this. No existing OCP benchmark playbook for neoclouds. Operators need repeatable NCCL/app-level reporting to build customer trust. "Benchmark standardization" is Denvr's #1 end-of-2026 win.
OCP Alignment
Open Cluster Designs for AIOpen Systems for AI SIGap — no current OCP spec
Proposed Deliverable
Benchmarking playbook: repeatable NCCL-style + app-level tests, KPI definitions, reporting template. Target Q3 2026. Open-source harness repo from FarmGPU + Denvr.
Who Will Contribute
Denvr Dataworks ×2 (lead), FarmGPU, Crusoe, Lambda, Fujitsu
Networking Patterns + Tuning — RoCE / IB / Congestion Control
Importance: 4.1  ·  OCP Leverage: 4.1  ·  Strong alignment with existing OCP Networking work
Why it matters
RoCEv2 fabric tuning and congestion control are top operational pain points. Denvr, FarmGPU, Lambda, Fujitsu, and Netpreme all willing to share topology and tuning lessons.
OCP Alignment
OCP Networking ProjectONIEUltra Ethernet Consortium
Proposed Deliverable
Neocloud fabric pattern guide: RoCEv2 tuning, congestion control config patterns, reference topologies for 30–100kW clusters. Coordinate with OCP Networking to avoid duplication.
Who Will Contribute
FarmGPU, Denvr, Netpreme, Fujitsu, Scaleway
Rack & Power · Open Pod Patterns (30–100kW)
Importance: 3.7  ·  OCP Leverage: 4.2  ·  Highest OCP leverage — validate existing specs from neocloud lens
Why it matters
OCP leverage rated highest here (4.2). Active specs exist at hyperscale (ORv3 HPR, Diablo 400 sidecar) but are underspecified for the 30–100kW neocloud scale. Gap analysis is this workstream's role.
OCP Alignment
ORv3 HPR SpecDiablo 400 SidecarOpen Data Center for AI SI
Proposed Deliverable
30–100kW rack/pod envelope reference with modular building blocks. Gap analysis: where ORv3 and Diablo don't address neocloud scale. Contribute gap tickets to OCP Rack & Power.
Who Will Contribute
Lambda, Crusoe, Denvr Dataworks, LightningAI
Power Sourcing + Distribution (AC → HVDC)
Importance: 4.1  ·  OCP Leverage: 3.7  ·  Amplify operator requirements into OCP DC Power work
Why it matters
Hitachi Energy ×2, Crusoe, LightningAI, Scaleway, and AirTrunk flag power availability and AC→HVDC distribution as pressing. Neoclouds face grid constraints and co-generation tradeoffs hyperscale specs don't address.
OCP Alignment
OCP Rack & PowerDC Power Sub-projectDiablo 400 SidecarCurrent/OS partnership
Proposed Deliverable
Coordinate neocloud power architecture patterns into OCP DC Power sub-project. Hitachi Energy as domain contributor. Energy sourcing strategies as guidance document (not a formal spec).
Who Will Contribute
Hitachi Energy (lead), Crusoe, LightningAI, Scaleway, AirTrunk
DC Facilities · Cooling Readiness (AI-Ready Colo)
Importance: 3.9  ·  OCP Leverage: 3.5  ·  Checklist is a quick win — no formal spec required
Why it matters
8/18 respondents selected as a 60–90 day quick win. Traditional co-lo facilities lack AI prerequisites: power density, liquid cooling distribution, busbar infrastructure. Checklist format enables immediate operator use.
OCP Alignment
OCP Ready™ for AIOCP DCFDeschutes CDU Spec
Proposed Deliverable
AI-ready colo requirements checklist: power density thresholds, cooling distribution requirements, busbar specs, install prerequisites. Decision tree for evaluating co-lo sites.
Who Will Contribute
AirTrunk, Vertiv, LightningAI, Lambda, Scaleway
Sustainability + Efficiency — PUE / WUE / Circularity
Importance: 3.9  ·  OCP Leverage: 3.4  ·  Data sharing role — coordinate with OCP Sustainability
Why it matters
Crusoe (stranded energy), Scaleway (EMEA), and AirTrunk (embodied carbon) bring neocloud sustainability data not in hyperscale-focused OCP work. Lower OCP leverage — workstream role is data aggregation, not spec authorship.
OCP Alignment
OCP Sustainability ProjectNet Zero Innovation HubEmbodied Carbon WS
Proposed Deliverable
PUE/WUE benchmarks and anonymized efficiency metrics from neocloud operators. Feed into OCP Sustainability project. Coordinate and share — not a standalone spec.
Who Will Contribute
Crusoe, Scaleway, AirTrunk, Denvr Dataworks
Storage Patterns for AI — GDS / NVMe-oF / Pipeline
Importance: 3.6  ·  OCP Leverage: 3.7  ·  Niche subgroup — real gap in OCP coverage
Why it matters
Ranked lower overall but FarmGPU, Denvr ×2, Scaleway show specific interest. GPUDirect Storage (GDS) and NVMe-oF patterns for AI training pipelines are not covered by OCP's existing Open Vault/Knox storage work — a genuine gap.
OCP Alignment
OCP Storage ProjectGap — GDS/NVMe-oF not addressed
Proposed Deliverable
NVMe-oF and GDS pattern documentation for AI training/inference pipelines. Submit as gap ticket to OCP Storage Project. Operate as a small subgroup with FarmGPU + Denvr leading.
Who Will Contribute
FarmGPU (lead), Denvr Dataworks ×2, Scaleway
Importance Score — avg 1–5, all respondents
Interoperability
4.6
Benchmarking
4.2
Networking
4.1
Power Distribution
4.1
Facilities + Cooling
3.9
Sustainability
3.9
Rack/Pod Patterns
3.7
Storage for AI
3.6
OCP Leverage Score — "How much can OCP help?"
Rack/Pod Patterns
4.2
Interoperability
4.2
Networking
4.1
Benchmarking
3.8
Storage for AI
3.7
Power Distribution
3.7
Facilities + Cooling
3.5
Sustainability
3.4
Deliverable Priority — avg rank, 1 = highest
01
Neocloud Reference Design Patterns
Rack/pod/fabric templates for 30–100kW clusters
02
Facilities + Power Readiness Guide
AI-ready colo requirements + decision checklist
03
Interoperability Checklist
Firmware / BMC / telemetry vendor requirements
04
Gap Map: Neocloud Needs → OCP Projects
Where to fix, where to contribute
05
Benchmarking Playbook
Repeatable tests + reporting template for customer trust
06
Best-Practices Whitepaper
Vendor-neutral: what works + pitfalls
07
Procurement / Supply-Chain Playbook
What to standardize to simplify buying
08
Case Studies + TCO / Efficiency Metrics
Anonymized patterns + numbers
Preferred Output Formats
Short whitepaper / guidance doc
14
Checklists + decision trees
11
Reference designs + BoM template
10
Open-source repo (configs/scripts)
6
OCP contribution proposal
5
Recorded talks / workshops
4
Data Sharing Willingness
Maybe (topic/NDA dependent)
9
Yes — anonymized metrics
4
Yes — attributed (non-anon)
3
No — review/advise only
2
How to read this: Maps each workstream priority to active OCP contributions and projects. Identifies where to contribute, where to consume existing work, and where gaps exist that only neoclouds can fill.
Neocloud Priority → OCP Project + Contribution Map
#Workstream PriorityImportanceOCP LeverageKey OCP Projects / SpecsNeocloud Role
1 Interoperability — BMC / Firmware / Telemetry 4.6 4.2 HW MgmtS.A.F.E. FWOpenBMC Contribute: interop checklist + gap tickets
2 Benchmarking + Performance Assurance 4.2 3.8 Open Cluster Designs for AIOpen Sys for AI SI Create: neocloud benchmark playbook (new)
3 Networking Patterns + Tuning 4.1 4.1 OCP NetworkingONIEUEC Contribute: neocloud fabric guide → Networking
4 Rack/Pod Patterns (30–100kW) 3.7 4.2 ORv3 HPRDiablo 400OD4AI SI Validate + gap-fill: sub-100kW operator lens
5 Power Sourcing + Distribution 4.1 3.7 Rack & PowerDC Power Sub-projCurrent/OS Amplify: operator requirements → DC Power spec
6 Facilities + Cooling Readiness 3.9 3.5 OCP Ready™ for AIOCP DCFDeschutes CDU Quick win: colo checklist (no spec needed)
7 Sustainability + Efficiency 3.9 3.4 OCP SustainabilityNet Zero Hub Feed data → OCP Sustainability project
8 Storage for AI (GDS / NVMe-oF) 3.6 3.7 OCP Storage ProjectOpen Vault / Knox Gap: GDS/NVMe-oF — subgroup candidate
Top Quick Wins Requested — 60–90 Day Horizon
Glossary + neocloud 101 onboarding pack
10
AI-ready colo requirements checklist
8
Interop pain point roundup (BMC/firmware)
8
Cluster benchmarking report template
6
Vendor/partner sourcing map
6
30–100kW rack/pod reference envelope
3
What NOT to Spend Time On in 2026
"Very large physical designs (10K+ GPU training) — largely specified by NVIDIA reference architectures." — Denvr
"Deciding what is the 'best' cooling solution (DLC vs immersion vs air) — that's prescriptive." — FarmGPU
"Too big a project scope — start with quick wins to get the ball rolling." — Scaleway
"Focusing solely on optimization — build resilient architectures first." — Lambda
"Complaints." — Hitachi Energy
End of 2026 — What Would Be a Clear Win?
"More OCP adoption in neocloud companies — lower TCO, better performance, faster time to market." — FarmGPU
"At least two whitepapers published." — Scaleway
"Benchmark standardization." — Denvr
"Clear path to faster AI cluster deployment with composable DC infrastructure spec changes." — Lambda
"A clear pathway for AI deployment in the next 4 years." — AirTrunk
Participating Organizations — 18 responses, 16 orgs
FarmGPU
Denvr Dataworks ×2
Crusoe ×2
Lambda
LightningAI (Voltage Park)
Scaleway
Vertiv
Hitachi Energy ×2
Fujitsu
Netpreme
Meinberg USA
Opengear
AirTrunk (APAC)
Optimetrix (APAC)
Edinburgh Intl DC Facility
Neocloud Operator Vendor / OEM Colo / DC Operator Research / Other
Participation Intent
Active contributor
7
Lead / co-lead subgroup
5
Occasional feedback
4
Listen / learn
3
Preferred Meeting Cadence
Monthly
9
Bi-weekly
9
Top Pressing Challenges — Aggregate, multi-select (n=18)
Interoperability & vendor lock-in
11
Management & automation (Day-2)
10
Networking architecture (RoCE/IB)
10
Benchmarking & performance
8
Power availability / grid
8
High-power rack & pod patterns
7
Facilities readiness (AI-ready colo)
6
Security & compliance
4
Supply chain constraints
4
Sustainability & efficiency
3