CoFeed

Posts about #ml-applications from the top innovative companies, all in one place.

Data-Infrastructure

Meta

Tue Nov 19 2024

Sequence learning: A paradigm shift for personalized ads recommendations

AI plays a fundamental role in creating valuable connections between people and advertisers within Meta’s family of apps.

Read Post

Data-Center-Engineering

Meta

Tue Oct 15 2024

OCP Summit 2024: The open future of networking hardware for AI

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters.

Read Post

Data-Center-Engineering

Meta

Tue Oct 15 2024

Meta’s open AI hardware vision

At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community.

Read Post

AI-Research

Meta

Thu Oct 03 2024

How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions

Data for Good at Meta is open-sourcing the data used to train our AI-powered population maps.

Read Post

Data-Center-Engineering

Meta

Tue Sep 10 2024

Simulator-based reinforcement learning for data center cooling optimization

We’re sharing more about the role that reinforcement learning plays in helping us optimize our data centers’ environmental controls.

Read Post

AI-Research

Meta

Fri Aug 23 2024

How PyTorch powers AI training and inference

Learn about new PyTorch advancements for LLMs and how PyTorch is enhancing every aspect of the LLM lifecycle.

Read Post

AI-Research

Meta

Thu Aug 22 2024

Inside the hardware and co-design of MTIA

In this talk from AI Infra @ Scale 2024, Joel Colburn, a software engineer at Meta, technical lead Junqiang Lan, and software engineer Jack ...

Read Post

AI-Research

Meta

Wed Aug 21 2024

Bringing Llama 3 to life

Llama 3 is Meta’s most capable openly-available LLM to date and the recently-released Llama 3.

Read Post

AI-Research

Meta

Tue Aug 20 2024

Aparna Ramani discusses the future of AI infrastructure

Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even...

Read Post

AI-Research

Meta

Wed Aug 14 2024

How Meta animates AI-generated images at scale

We launched Meta AI with the goal of giving people new ways to be more productive and unlock their creativity with generative AI (GenAI).

Read Post

AI-Research

Meta

Mon Aug 05 2024

RoCE networks for distributed AI training at scale

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for traini...

Read Post

ML-Applications

Meta

Thu Jul 18 2024

Meet Caddy – Meta’s next-gen mixed reality CAD software

What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom? Meet the team behind Caddy, a n...

Read Post

DevInfra

Meta

Tue Jul 16 2024

AI Lab: The secrets to keeping machine learning engineers moving fast

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers.

Read Post

ML-Applications

Meta

Wed Jul 10 2024

Taming the tail utilization of ads inference at Meta scale

Tail utilization is a significant system issue and a major factor in overload-related failures and low compute utilization.

Read Post

Data-Infrastructure

Meta

Wed Jul 10 2024

Meta’s approach to machine learning prediction robustness

Meta’s advertising business leverages large-scale machine learning (ML) recommendation models that power millions of ads recommendations per...

Read Post

Data-Infrastructure

Meta

Mon Jun 24 2024

Leveraging AI for efficient incident response

We’re sharing how we streamline system reliability investigations using a new AI-assisted root cause analysis system.

Read Post

Data-Center-Engineering

Meta

Wed Jun 12 2024

Maintaining large-scale AI capacity at Meta

Meta is currently operating many data centers with GPU training clusters across the world.

Read Post

AI-Research

Meta

Wed Apr 10 2024

Introducing the next-gen Meta Training and Inference Accelerator

Read Post

ML-Applications

Meta

Wed Mar 20 2024

Optimizing RTC bandwidth estimation with machine learning

Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Met...

Read Post

Data-Infrastructure

Meta

Mon Mar 18 2024

Logarithm: A logging engine for AI training workflows and services

Systems and application logs play a key role in operations, observability, and debugging workflows at Meta.

Read Post

AI-Research

Meta

Tue Mar 12 2024

Building Meta’s GenAI Infrastructure

Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters.

Read Post

ML-Applications

Meta

Mon Jan 29 2024

Improving machine learning iteration speed with faster application build and packaging

Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of t...

Read Post

DevInfra

Meta

Thu Jan 18 2024

Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta

At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime.

Read Post

AI-Research

Meta

Thu Jan 11 2024

How Meta is advancing GenAI

What’s going on with generative AI (GenAI) at Meta? And what does the future have in store? In this episode of the Meta Tech Podcast, Meta e...

Read Post

Data-Infrastructure

Meta

Tue Dec 19 2023

AI debugging at Meta with HawkEye

HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning ...

Read Post

ML-Applications

Meta

Wed Nov 15 2023

Watch: Meta’s engineers on building network infrastructure for AI

Meta is building for the future of AI at every level – from hardware like MTIA v1, Meta’s first-generation AI inference accelerator to publi...

Read Post

AI-Research

Meta

Wed Oct 18 2023

How Meta is creating custom silicon for AI

Olivia Wu, Meta’s Technical Lead for Infra Silicon, discusses the design and development of Meta’s first-generation AI inference accelerator...

Read Post

ML-Applications

Meta

Thu Sep 07 2023

Using Chakra execution traces for benchmarking and network performance optimization

Meta presents Chakra execution traces, an open graph-based representation of AI/ML workload execution, laying the foundation for benchmarkin...

Read Post

Data-Infrastructure

Meta

Thu Sep 07 2023

Arcadia: An end-to-end AI system performance simulator

We’re introducing Arcadia, Meta’s unified system that simulates the compute, memory, and network performance of AI training clusters.

Read Post

AI-Research

Meta

Thu Aug 24 2023

Code Llama: Meta’s state-of-the-art LLM for coding

Read Post

AI-Research

Meta

Mon Aug 14 2023

Meta Connect 2023: September 27 – 28

Read Post

ML-Applications

Meta

Wed Aug 09 2023

Scaling the Instagram Explore recommendations system

Explore is one of the largest recommendation systems on Instagram.

Read Post

ML-Applications

Meta

Thu May 18 2023

MSVP is Meta’s first video processing ASIC

Read Post

AI-Research

Meta

Thu May 18 2023

Meta introduces its first-generation AI inference accelerator

Read Post