Data-Center-Engineering
Meta
Tue Oct 15 2024
OCP Summit 2024: The open future of networking hardware for AI
At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters.
Meta’s open AI hardware vision
At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community.
AI-Research
Mon Aug 05 2024
RoCE networks for distributed AI training at scale
AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for traini...
ML-Applications
Wed Jul 10 2024
Taming the tail utilization of ads inference at Meta scale
Tail utilization is a significant system issue and a major factor in overload-related failures and low compute utilization.
Networking-and-Traffic
Thu Mar 21 2024
Threads has entered the fediverse
Threads has entered the fediverse! As part of our beta experience, now available in a few countries, Threads users aged 18+ with public prof...
Wed Mar 20 2024
Optimizing RTC bandwidth estimation with machine learning
Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Met...
Connectivity
Wed Feb 07 2024
Simple Precision Time Protocol at Meta
While deploying Precision Time Protocol (PTP) at Meta, we’ve developed a simplified version of the protocol (Simple Precision Time Protocol ...
Wed Nov 15 2023
Watch: Meta’s engineers on building network infrastructure for AI
Meta is building for the future of AI at every level – from hardware like MTIA v1, Meta’s first-generation AI inference accelerator to publi...
Thu Sep 07 2023
Using Chakra execution traces for benchmarking and network performance optimization
Meta presents Chakra execution traces, an open graph-based representation of AI/ML workload execution, laying the foundation for benchmarkin...
Data-Infrastructure
Arcadia: An end-to-end AI system performance simulator
We’re introducing Arcadia, Meta’s unified system that simulates the compute, memory, and network performance of AI training clusters.
Thu Jun 29 2023
Meta’s Evenstar is transitioning to OCP to accelerate open RAN adoption
Meta is transferring its IP for Evenstar, a program to accelerate the adoption of open RAN technologies, to the Open Compute Project (OCP).
Mon Apr 17 2023
A fine-grained network traffic analysis with Millisampler
What the research is: Millisampler is one of Meta’s latest characterization tools and allows us to observe, characterize, and debug network...