Data-Center-Engineering
Meta
Tue Sep 10 2024
Simulator-based reinforcement learning for data center cooling optimization
We’re sharing more about the role that reinforcement learning plays in helping us optimize our data centers’ environmental controls.
Wed Sep 04 2024
Read Meta’s 2024 Sustainability Report
[.
Mon Aug 26 2024
RETINAS: Real-Time Infrastructure Accounting for Sustainability
We are introducing a new metric— real-time server fleet utilization effectiveness —as part of the RETINAS initiative to help reduce emission...
AI-Research
Tue Aug 20 2024
Aparna Ramani discusses the future of AI infrastructure
Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even...
Mon Aug 05 2024
RoCE networks for distributed AI training at scale
AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for traini...
DCPerf: An open source benchmark suite for hyperscale compute applications
We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud de...
Wed Jun 12 2024
Maintaining large-scale AI capacity at Meta
Meta is currently operating many data centers with GPU training clusters across the world.
Tue Mar 12 2024
Building Meta’s GenAI Infrastructure
Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters.
Connectivity
Wed Feb 07 2024
Simple Precision Time Protocol at Meta
While deploying Precision Time Protocol (PTP) at Meta, we’ve developed a simplified version of the protocol (Simple Precision Time Protocol ...
Mon Apr 17 2023
A fine-grained network traffic analysis with Millisampler
What the research is: Millisampler is one of Meta’s latest characterization tools and allows us to observe, characterize, and debug network...