Data
27 postsTL;DR Getting a response from GenAI is quick and straightforward. But what about the... The post Building Confidence: A Case Study in How to Create Confidence Scores for GenAI Applications appeared first on Spotify Engineering.
On November 14, 2024, Cloudflare experienced a Cloudflare Logs outage, impacting the majority of customers using these products. During the ~3.5 hours that these services were impacted, about 55% of the logs we normally send to customers were not sent and were lost. The details of what went wrong and why are interesting both for customers and practitioners.
On November 14, 2024, Cloudflare experienced a Cloudflare Logs outage, impacting the majority of customers using these products. During the ~3.5 hours that these services were impacted, about 55% of the logs we normally send to customers were not sent and were lost. The details of what went wrong and why are interesting both for customers and practitioners.
With the fields of machine learning (ML) and generative AI (GenAI) continuing to rapidly evolve and expand, it has become [...] The post How We Generated Millions of Content Annotations appeared first on Spotify Engineering.
On Spotify’s Analytics Platform, we’re dedicated to building products that empower data practitioners to discover, analyze, and share insights — [...] The post Are You a Dalia? How We Created Data Science Personas for Spotify’s Analytics Platform appeared first on Spotify Engineering.
We have a lot of dashboards at Spotify. Our Insight teams and analysts from across the company are constantly whipping [...] The post Unlocking Insights with High-Quality Dashboards at Scale appeared first on Spotify Engineering.
Check out Data Platform Explained Part I, where we started sharing the journey of building a data platform, its building [...] The post Data Platform Explained Part II appeared first on Spotify Engineering.
TL;DR Sometimes we cannot estimate the required sample size needed to power an experiment before starting it. To alleviate this [...] The post Fixed-Power Designs: It’s Not IF You Peek, It’s WHAT You Peek at appeared first on Spotify Engineering.
As engineers working at Spotify, we frequently find ourselves explaining our robust data platform to fellow professionals who are contemplating [...] The post Data Platform Explained appeared first on Spotify Engineering.
TL;DR We summarize the findings in our recent paper, Schultzberg, Ankargren, and Frånberg (2024), where we explain how Spotify’s decision-making [...] The post Risk-Aware Product Decisions in A/B Tests with Multiple Metrics appeared first on Spotify Engineering.
Co-authors: Arjun Mohnot, Jenchang Ho, Anthony Quigley, Xing Lin, Anil Alluri, Michael Kuchenbecker LinkedIn operates one of the world’s largest Apache Hadoop big data clusters. These clusters are the backbone for storing and processing extensive data volumes, empowering us to deliver essential features and services to members, such as personalized recommendations, enhanced search functionality, and valuable insights. Historically, deploying code changes to Hadoop big data clusters has been complex. As workloads and clusters grow, operational overhead becomes even more challenging, […]
To help our team continue to innovate efficiently, our MLOps effort has collaborated with Cloudflare’s data scientists to implement the following best practices
Large sets of diverse data present several challenges for clustering, but through a novel approach that combines dimensionality reduction, recursion, and supervised machine learning, we’ve been able to obtain strong results. The post Recursive Embedding and Clustering appeared first on Spotify Engineering.
For the past decade, Spotify has used approximate nearest-neighbor search technology to power our personalization, recommendation, and search systems. The post Introducing Voyager: Spotify’s New Nearest-Neighbor Search Library appeared first on Spotify Engineering.
When we want to determine the causal effect of a product or business change at Spotify, A/B testing is the gold standard. However, in some cases, it’s not possible to run A/B tests. For example, when the intervention is an exogenous shock we can’t control, such as the COVID pandemic. Or when using experimental control [...] The post How to Accurately Test Significance with Difference in Difference Models appeared first on Spotify Engineering.
At Spotify, we run a lot of A/B tests. Most of these tests follow a standard design, where we assign users randomly to control and treatment groups, and then observe the difference in outcomes between these two groups. Usually, the control group, also known as the “holdout” group, retains the current experience, while the treatment [...] The post Encouragement Designs and Instrumental Variables for A/B Testing appeared first on Spotify Engineering.
As companies mature, it’s easy to believe that the core experience and most user needs have been resolved, and all that’s left to work toward are the marginal benefits, the cherries on top. Cherries on top might add delight and panache, but they rarely cause fundamental shifts in performance and success. And as a business, [...] The post Experimentation at Spotify: Three Lessons for Maximizing Impact in Innovation appeared first on Spotify Engineering.
TL;DR: Spotify is releasing a new commercial product for software development teams: a version of our homegrown experimentation platform that we’re calling Confidence. Based on everything we’ve learned over the last 10+ years about what it takes to enable experimentation at scale, the platform makes it easy for teams to set up, run, coordinate, and [...] The post Coming Soon: Confidence — An Experimentation Platform from Spotify appeared first on Spotify Engineering.
In Part 1 of this series, we introduced the within-unit peeking problem that we call the “peeking problem 2.0”. We showed that moving from single to multiple observations per unit in analyses of experiments introduces new challenges and pitfalls with regards to sequential testing. We discussed the importance of being clear about the distinctions between [...] The post Bringing Sequential Testing to Experiments with Longitudinal Data (Part 2): Sequential Testing appeared first on Spotify Engineering.
Co-Authors: Sumedh Sakdeo, Lei Sun, Sushant Raikar, Stanislav Pak, and Abhishek Nath Introduction At LinkedIn, we build and operate an open source data lakehouse deployment to power Analytics and Machine Learning workloads. Leveraging data to drive decisions allows us to serve our members with better job insights, and connect the world’s professionals with each other. Open source data lakehouse deployments are built on the foundations of compute engines (like Apache Spark, Trino, Apache Flink), distributed storage (HDFS, cloud blob stores), and metadata catalogs / table formats […]