Data

28 posts

Here we explain how we made our data pipeline scale to 700 million events per second while becoming more resilient than ever before. We share some math behind our approach and some of the designs of

Constantin Pan1/27/2025

Spotify

Data Data-Science Machine-Learning

Building Confidence: A Case Study in How to Create Confidence Scores for GenAI Applications

TL;DR Getting a response from GenAI is quick and straightforward. But what about the... The post Building Confidence: A Case Study in How to Create Confidence Scores for GenAI Applications appeared first on Spotify Engineering.

alexandrawei12/12/2024

Cloudflare

Logs Data Log-Push

Cloudflare incident on November 14, 2024, resulting in lost logs

On November 14, 2024, Cloudflare experienced a Cloudflare Logs outage, impacting the majority of customers using these products. During the ~3.5 hours that these services were impacted, about 55% of the logs we normally send to customers were not sent and were lost. The details of what went wrong and why are interesting both for customers and practitioners.

Jamie Herre11/26/2024

Cloudflare

Logs Data Log-Push

Cloudflare incident on November 14, 2024, resulting in lost logs

Jamie Herre11/26/2024

Spotify

Data Machine-Learning machine-learning

How We Generated Millions of Content Annotations

With the fields of machine learning (ML) and generative AI (GenAI) continuing to rapidly evolve and expand, it has become [...] The post How We Generated Millions of Content Annotations appeared first on Spotify Engineering.

alexandrawei10/21/2024

Spotify

Data Data-Science People Platform engineering-culture

Are You a Dalia? How We Created Data Science Personas for Spotify’s Analytics Platform

On Spotify’s Analytics Platform, we’re dedicated to building products that empower data practitioners to discover, analyze, and share insights — [...] The post Are You a Dalia? How We Created Data Science Personas for Spotify’s Analytics Platform appeared first on Spotify Engineering.

Spotify Engineering9/5/2024

Spotify

Data Data-Science Design Platform engineering-culture

Unlocking Insights with High-Quality Dashboards at Scale

We have a lot of dashboards at Spotify. Our Insight teams and analysts from across the company are constantly whipping [...] The post Unlocking Insights with High-Quality Dashboards at Scale appeared first on Spotify Engineering.

Spotify Engineering8/28/2024

Spotify

Data Data-Science Platform

Data Platform Explained Part II

Check out Data Platform Explained Part I, where we started sharing the journey of building a data platform, its building [...] The post Data Platform Explained Part II appeared first on Spotify Engineering.

Spotify Engineering5/28/2024

Spotify

Data Data-Science experimentation

Fixed-Power Designs: It’s Not IF You Peek, It’s WHAT You Peek at

TL;DR Sometimes we cannot estimate the required sample size needed to power an experiment before starting it. To alleviate this [...] The post Fixed-Power Designs: It’s Not IF You Peek, It’s WHAT You Peek at appeared first on Spotify Engineering.

Spotify Engineering5/15/2024

Spotify

Data Data-Science Infrastructure Platform

Data Platform Explained

As engineers working at Spotify, we frequently find ourselves explaining our robust data platform to fellow professionals who are contemplating [...] The post Data Platform Explained appeared first on Spotify Engineering.

Spotify Engineering4/2/2024

Spotify

Data Data-Science Platform experimentation

Risk-Aware Product Decisions in A/B Tests with Multiple Metrics

TL;DR We summarize the findings in our recent paper, Schultzberg, Ankargren, and Frånberg (2024), where we explain how Spotify’s decision-making [...] The post Risk-Aware Product Decisions in A/B Tests with Multiple Metrics appeared first on Spotify Engineering.

Spotify Engineering3/5/2024

infrastructure Data

Deployment of Exabyte-Backed Big Data Components

Co-authors: Arjun Mohnot, Jenchang Ho, Anthony Quigley, Xing Lin, Anil Alluri, Michael Kuchenbecker LinkedIn operates one of the world’s largest Apache Hadoop big data clusters. These clusters are the backbone for storing and processing extensive data volumes, empowering us to deliver essential features and services to members, such as personalized recommendations, enhanced search functionality, and valuable insights. Historically, deploying code changes to Hadoop big data clusters has been complex. As workloads and clusters grow, operational overhead becomes even more challenging, […]

Anuj Maurice12/19/2023

Cloudflare

AI Data Developers Machine-Learning MLops Hardware

ML Ops Platform at Cloudflare

To help our team continue to innovate efficiently, our MLOps effort has collaborated with Cloudflare’s data scientists to implement the following best practices

Keith Adler http://blog.cloudflare.com/author/keith/12/7/2023

Spotify

Data Data-Science

Recursive Embedding and Clustering

Large sets of diverse data present several challenges for clustering, but through a novel approach that combines dimensionality reduction, recursion, and supervised machine learning, we’ve been able to obtain strong results. The post Recursive Embedding and Clustering appeared first on Spotify Engineering.

Spotify Engineering12/5/2023

Spotify

Data Data-Science Machine-Learning Open-Source engineering-culture engineering-leadership machine-learning open-source

Introducing Voyager: Spotify’s New Nearest-Neighbor Search Library

For the past decade, Spotify has used approximate nearest-neighbor search technology to power our personalization, recommendation, and search systems. The post Introducing Voyager: Spotify’s New Nearest-Neighbor Search Library appeared first on Spotify Engineering.

Spotify Engineering10/25/2023

Spotify

Data Data-Science experimentation

How to Accurately Test Significance with Difference in Difference Models

When we want to determine the causal effect of a product or business change at Spotify, A/B testing is the gold standard. However, in some cases, it’s not possible to run A/B tests. For example, when the intervention is an exogenous shock we can’t control, such as the COVID pandemic. Or when using experimental control [...] The post How to Accurately Test Significance with Difference in Difference Models appeared first on Spotify Engineering.

Spotify Engineering9/28/2023

Spotify

Data Data-Science experimentation

Encouragement Designs and Instrumental Variables for A/B Testing

At Spotify, we run a lot of A/B tests. Most of these tests follow a standard design, where we assign users randomly to control and treatment groups, and then observe the difference in outcomes between these two groups. Usually, the control group, also known as the “holdout” group, retains the current experience, while the treatment [...] The post Encouragement Designs and Instrumental Variables for A/B Testing appeared first on Spotify Engineering.

Spotify Engineering8/24/2023

Spotify

Data Data-Science experimentation

Experimentation at Spotify: Three Lessons for Maximizing Impact in Innovation

As companies mature, it’s easy to believe that the core experience and most user needs have been resolved, and all that’s left to work toward are the marginal benefits, the cherries on top. Cherries on top might add delight and panache, but they rarely cause fundamental shifts in performance and success. And as a business, [...] The post Experimentation at Spotify: Three Lessons for Maximizing Impact in Innovation appeared first on Spotify Engineering.

Spotify Engineering8/16/2023

Spotify

Data Data-Science Platform engineering-leadership experimentation

Coming Soon: Confidence — An Experimentation Platform from Spotify

TL;DR: Spotify is releasing a new commercial product for software development teams: a version of our homegrown experimentation platform that we’re calling Confidence. Based on everything we’ve learned over the last 10+ years about what it takes to enable experimentation at scale, the platform makes it easy for teams to set up, run, coordinate, and [...] The post Coming Soon: Confidence — An Experimentation Platform from Spotify appeared first on Spotify Engineering.

Spotify Engineering8/3/2023

Spotify

Data Data-Science Platform experimentation

Bringing Sequential Testing to Experiments with Longitudinal Data (Part 2): Sequential Testing

In Part 1 of this series, we introduced the within-unit peeking problem that we call the “peeking problem 2.0”. We showed that moving from single to multiple observations per unit in analyses of experiments introduces new challenges and pitfalls with regards to sequential testing. We discussed the importance of being clear about the distinctions between [...] The post Bringing Sequential Testing to Experiments with Longitudinal Data (Part 2): Sequential Testing appeared first on Spotify Engineering.

Spotify Engineering7/25/2023