Data-Infrastructure
Meta
Mon Mar 18 2024
Logarithm: A logging engine for AI training workflows and services
Systems and application logs play a key role in operations, observability, and debugging workflows at Meta.
Tue Dec 19 2023
AI debugging at Meta with HawkEye
HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning ...
infrastructure
Pinterest
Tue Nov 07 2023
Running Unified PubSub Client in Production at Pinterest
Tue Oct 31 2023
Automating data removal
Meta’s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing unused data types.
Tue Oct 24 2023
Automating dead code cleanup
Meta’s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing dead code.
Tue Oct 17 2023
Automating product deprecation
Systematic Code and Asset Removal Framework (SCARF) is Meta’s unused code and data deletion framework.
Thu Sep 07 2023
Arcadia: An end-to-end AI system performance simulator
We’re introducing Arcadia, Meta’s unified system that simulates the compute, memory, and network performance of AI training clusters.
Tue Aug 29 2023
Scheduling Jupyter Notebooks at Meta
At Meta, Bento is our internal Jupyter notebooks platform that is leveraged by many internal users.
Tue May 16 2023
Building and deploying MySQL Raft at Meta
We’re rolling out MySQL Raft with the aim to eventually replace our current MySQL semisynchronous databases.