The Causal Roundup #1

Anu Sharma
Tue Sep 28 2021

Mind over data at Netflix

The Causal Roundup is a biweekly review of the best articles on causality. Covering topics from experimentation to causal inference, the Statsig team brings to you work from leaders who are building the future of product decision making.

Mind over data 📈

This month Netflix started a blog series about how they make decisions using A/B tests. Instead of restricting decision making to executives and experts, experimentation gives all their employees the opportunity to vote with their actions.

Now, you might already have a dozen analytics tools to track scores of metrics but without a causal chain, you’re still broadly shooting in the dark. Netflix shows this beautifully in a follow up post on A/B testing using a hypothetical product launch with upside down cover art! This blurb made me weak in the knees…

Articulating the causal chain between the product change and changes in the primary decision metric, and monitoring secondary metrics along this chain, helps us build confidence that any movement in our primary metric is the result of the causal chain we are hypothesizing, and not the result of some unintended consequence of the new feature (or a false positive).

Pursuit of True North 🧭

If in pursuit of your destination, you plunge ahead, heedless of obstacles, and achieve nothing more than to sink in a swamp…What’s the use of knowing True North?
— Lincoln

One of the challenges with improving long term metrics such as engagement is that these metrics are hard to move and often require long drawn out experiments. This paper from LinkedIn in 2019 describes how they overcome this challenge.

LinkedIn proposes using a surrogate metric that predicts the long term (north star) metric. As surrogate metrics rarely predict the north star metric perfectly, the paper discusses how to adjust A/B testing to ensure experiment results are trustworthy. For example, LinkedIn aims to improve their hiring products with a true north metric called confirmed hires (CH), which measures members who found jobs using LinkedIn products. However, the CH metric suffers from long lag times. To address this, the team introduces a surrogate metric called predicted confirmed hires (PCH), which leverages several signals including job segments, time of application, quality of application, and so on. The paper also neatly provides practical guidelines for choosing good surrogate metrics such as sensitivity to a wide range of input variables that are worth experimenting.

‘Criminally underused in tech’ 🚨

This past summer, Ujwal Kharel described a great example of how Roblox measured the impact of Avatar Shop on community engagement. They couldn’t run an experiment as it isn’t possible to just turn off the Avatar Shop for some users.

Causal inference was the way to go. They still needed an instrumental variable that’s (i) strongly associated with the treatment variable (Avatar Shop engagement) and (ii) associated with the outcome (community engagement) only via the treatment variable.

The fun part is that they found the instrumental variable from an experiment they ran a year ago. Ujwal’s view is that teams throw away experiment results too easily and begs to dig deeper to find evidence that’s interesting.

I can’t wait to read more from the Roblox Tech Blog for more 🥰

Watch this space for more updates, stories, and practical tips on finding causality in user behavior and growing product adoption. Follow the Statsig blog to get the biweekly update!

Try Statsig Today

Explore Statsig’s smart feature gates with built-in A/B tests, or create an account instantly and start optimizing your web and mobile applications. You can also schedule a live demo or chat with us to design a custom package for your business.


Recently published

My Summer as a Statsig Intern


This summer I had the pleasure of joining Statsig as their first ever product design intern. This was my first college internship, and I was so excited to get some design experience. I had just finished my freshman year in college and was still working on...

Read more

Long-live the 95% Confidence Interval


The 95% confidence interval currently dominates online and scientific experimentation; it always has. Yet it’s validity and usefulness is often questioned. It’s called too conservative by some [1], and too permissive by others. It’s deemed arbitrary...

Read more

Realtime Product Observability with Apache Druid


Statsig’s Journey with Druid This is the text version of the story that we shared at Druid Summit Seattle 2022. Every feature we build at Statsig serves a common goal — to help you better know about your product, and empower you to make good decisions for...

Read more

Quant vs. Qual


💡 How to decide between leaning on data vs. research when diagnosing and solving product problems Four heuristics I’ve found helpful when deciding between data vs. research to diagnose + solve a problem. Earth image credit of Moncast Drawing. As a PM, data...

Read more

The Importance of Default Values


Have you ever sent an email to the wrong person? Well I have. At work. From a generic support email address. To a group of our top customers. Facepalm. In March of 2018, I was working on the games team at Facebook. You may remember that month as a tumultuous...

Read more

CUPED on Statsig


Run experiments with more speed and accuracy We’re pleased to announce the rollout of CUPED for all our customers. Statsig will now automatically use CUPED to reduce variance and bias on experiments’ key metrics. This gives you access to a powerful experiment...

Read more

We use cookies to ensure you get the best experience on our website.

Privacy Policy