AI-Research
Meta
Wed Apr 10 2024
Introducing the next-gen Meta Training and Inference Accelerator
[.
ML-Applications
Wed Mar 20 2024
Optimizing RTC bandwidth estimation with machine learning
Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Met...
Data-Infrastructure
Mon Mar 18 2024
Logarithm: A logging engine for AI training workflows and services
Systems and application logs play a key role in operations, observability, and debugging workflows at Meta.
Tue Mar 12 2024
Building Meta’s GenAI Infrastructure
Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters.
Mon Jan 29 2024
Improving machine learning iteration speed with faster application build and packaging
Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of t...
DevInfra
Thu Jan 18 2024
Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta
At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime.
Thu Jan 11 2024
How Meta is advancing GenAI
What’s going on with generative AI (GenAI) at Meta? And what does the future have in store? In this episode of the Meta Tech Podcast, Meta e...
Tue Dec 19 2023
AI debugging at Meta with HawkEye
HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning ...
Wed Nov 15 2023
Watch: Meta’s engineers on building network infrastructure for AI
Meta is building for the future of AI at every level – from hardware like MTIA v1, Meta’s first-generation AI inference accelerator to publi...
Wed Oct 18 2023
How Meta is creating custom silicon for AI
Olivia Wu, Meta’s Technical Lead for Infra Silicon, discusses the design and development of Meta’s first-generation AI inference accelerator...
Thu Sep 07 2023
Using Chakra execution traces for benchmarking and network performance optimization
Meta presents Chakra execution traces, an open graph-based representation of AI/ML workload execution, laying the foundation for benchmarkin...
Arcadia: An end-to-end AI system performance simulator
We’re introducing Arcadia, Meta’s unified system that simulates the compute, memory, and network performance of AI training clusters.
Thu Aug 24 2023
Code Llama: Meta’s state-of-the-art LLM for coding
Mon Aug 14 2023
Meta Connect 2023: September 27 – 28
Wed Aug 09 2023
Scaling the Instagram Explore recommendations system
Explore is one of the largest recommendation systems on Instagram.
Thu May 18 2023
MSVP is Meta’s first video processing ASIC
Meta introduces its first-generation AI inference accelerator