DevInfra

27 posts

We’re sharing details about Glean, Meta’s open source system for collecting, deriving, and working with facts about source code. In this blog post we’ll talk about why a system like Glean is important, explain the rationale for Glean’s design, and run through some of the ways we’re using Glean to supercharge our developer tooling at [...] Read More... The post Indexing code at scale with Glean appeared first on Engineering at Meta.

12/19/2024

Meta has been on a years-long undertaking to translate our entire Android codebase from Java to Kotlin. Today, despite having one of the largest Android codebases in the world, we’re well past the halfway point and still going. We’re sharing some of the tradeoffs we’ve made to support automating our transition to Kotlin, seemingly simple [...] Read More... The post Translating Java to Kotlin at Scale appeared first on Engineering at Meta.

12/18/2024

Ten years after the introduction of PEP 484, we surveyed the current state of the Python type system and the tools developers are using. Read More... The post Typed Python in 2024: Well adopted, yet usability challenges persist appeared first on Engineering at Meta.

12/9/2024

At Meta, we’re always looking for ways to enhance the productivity of our engineers and developers. But how exactly do you measure developer productivity? On this episode of the Meta Tech Podcast Pascal Hartig (@passy) sits down with Sarita and Moritz, two engineers at Meta who have been working on Diff Authoring Time (DAT) – a [...] Read More... The post Diff Authoring Time: Measuring developer productivity at Meta appeared first on Engineering at Meta.

10/25/2024

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters. We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP. We look forward to continued collaboration with OCP to open designs for racks, servers, storage [...] Read More... The post OCP Summit 2024: The open future of networking hardware for AI appeared first on Engineering at Meta.

10/15/2024

At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community. These innovations include a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components.  By sharing our designs, we hope to inspire collaboration and foster innovation. If you’re passionate about building [...] Read More... The post Meta’s open AI hardware vision appeared first on Engineering at Meta.

10/15/2024

This episode of the Meta Tech Podcast is all about Bento, Meta’s internal distribution of Jupyter Notebooks, an open-source web-based computing platform. Bento allows our engineers to mix code, text, and multimedia in a single document and serves a wide range of use cases at Meta from prototyping to complex machine learning workflows. Pascal Hartig [...] Read More... The post Inside Bento: Jupyter Notebooks at Meta appeared first on Engineering at Meta.

9/17/2024

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB.  AI Lab prevents TTFB regressions [...] Read More... The post AI Lab: The secrets to keeping machine learning engineers moving fast appeared first on Engineering at Meta.

7/16/2024

The history of Rust at Meta goes all the way back to 2016, when we first started using it for source control. Today, it has been widely embraced at Meta and is one of our primary supported server-side languages (along with C++, Python, and Hack). But that doesn’t mean there weren’t any growing pains. Aida [...] Read More... The post The key to a happy Rust/C++ relationship appeared first on Engineering at Meta.

6/25/2024

We’re sharing how we streamline system reliability investigations using a new AI-assisted root cause analysis system. The system uses a combination of heuristic-based retrieval and large language model-based ranking to speed up root cause identification during investigations. Our testing has shown this new system achieves 42% accuracy in identifying root causes for investigations at their [...] Read More... The post Leveraging AI for efficient incident response appeared first on Engineering at Meta.

6/24/2024

At Meta, Bento, our internal Jupyter notebooks platform, is a popular tool that allows our engineers to mix code, text, and multimedia in a single document. Use cases run the entire spectrum from what we call “lite” workloads that involve simple prototyping to heavier and more complex machine learning workflows. However, even though the lite [...] Read More... The post Serverless Jupyter Notebooks at Meta appeared first on Engineering at Meta.

6/10/2024

In recent years, Meta’s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency.  We’re sharing how we’ve achieved this, in part, by leveraging Velox, Meta’s open source execution engine, as well as work ahead as we continue to rethink our data management systems.  Data is at [...] Read More... The post Composable data management at Meta appeared first on Engineering at Meta.

5/22/2024

Systems and application logs play a key role in operations, observability, and debugging workflows at Meta. Logarithm is a hosted, serverless, multitenant service, used only internally at Meta, that consumes and indexes these logs and provides an interactive query interface to retrieve and view logs. In this post, we present the design behind Logarithm, and [...] Read More... The post Logarithm: A logging engine for AI training workflows and services appeared first on Engineering at Meta.

3/18/2024

Andres Suarez and Michael Bolin, two software engineers at Meta, join Pascal Hartig (@passy) on the Meta Tech Podcast to discuss the ins and outs of DotSlash, a new open source tool from Meta. DotSlash takes the pain out of distributing binaries and toolchains to developers. Instead of committing large, platform-specific executables to a repository, [...] Read More... The post How DotSlash makes executable deployment simpler appeared first on Engineering at Meta.

2/26/2024

We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox, Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, [...] Read More... The post Aligning Velox and Apache Arrow: Towards composable data management appeared first on Engineering at Meta.

2/20/2024

By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta? Meta engineer Pascal Hartig (@passy) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the [...] Read More... The post Meta loves Python appeared first on Engineering at Meta.

2/12/2024

We’ve open sourced DotSlash, a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations. With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote [...] Read More... The post DotSlash: Simplified executable deployment appeared first on Engineering at Meta.

2/6/2024

At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime. The outcome? Up to 40 percent time to first batch (TTFB) improvements, along with a 20 percent reduction in Jupyter kernel startup times. This advancement facilitates swifter experimentation capabilities and elevates the [...] Read More... The post Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta appeared first on Engineering at Meta.

1/18/2024

HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning (ML) workflow that powers ML-based products. HawkEye supports recommendation and ranking models across several products at Meta. Over the past two years, it has facilitated order of magnitude improvements in the time spent debugging production issues. [...] Read More... The post AI debugging at Meta with HawkEye appeared first on Engineering at Meta.

12/19/2023

Meta has a very large monorepo, with many  different programming languages. To optimize build and performance, we developed our own build system called Buck, which was first open-sourced in 2013.  Buck2 is the recently open-sourced successor. In our internal tests at Meta, we observed that Buck2 completed builds approximately 2x as fast as Buck1. Below [...] Read More... The post 5 Things you didn’t know about Buck2 appeared first on Engineering at Meta.

10/23/2023