Data-Center-Engineering
Meta
Tue Oct 15 2024
OCP Summit 2024: The open future of networking hardware for AI
At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters.
Meta’s open AI hardware vision
At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community.
Tue Sep 10 2024
Simulator-based reinforcement learning for data center cooling optimization
We’re sharing more about the role that reinforcement learning plays in helping us optimize our data centers’ environmental controls.
Wed Sep 04 2024
Read Meta’s 2024 Sustainability Report
[.
Mon Aug 26 2024
RETINAS: Real-Time Infrastructure Accounting for Sustainability
We are introducing a new metric— real-time server fleet utilization effectiveness —as part of the RETINAS initiative to help reduce emission...
AI-Research
Tue Aug 20 2024
Aparna Ramani discusses the future of AI infrastructure
Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even...
Mon Aug 05 2024
RoCE networks for distributed AI training at scale
AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for traini...
DCPerf: An open source benchmark suite for hyperscale compute applications
We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud de...
Wed Jun 12 2024
Maintaining large-scale AI capacity at Meta
Meta is currently operating many data centers with GPU training clusters across the world.
Tue Mar 12 2024
Building Meta’s GenAI Infrastructure
Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters.
Connectivity
Wed Feb 07 2024
Simple Precision Time Protocol at Meta
While deploying Precision Time Protocol (PTP) at Meta, we’ve developed a simplified version of the protocol (Simple Precision Time Protocol ...
Mon Apr 17 2023
A fine-grained network traffic analysis with Millisampler
What the research is: Millisampler is one of Meta’s latest characterization tools and allows us to observe, characterize, and debug network...