CUPED on Statsig

Platform

Developers

Resources

Pricing

Platform

Developers

Resources

OVERVIEW

ROLES

Build fast with Be Significant
Our exclusive startup program

GETTING STARTED

Documentation

Documentation to get you started with implementation

Walkthrough Guides

Guides to get you started with Statsig in no time

SDKs and APIs

Explore REST API and SDKs in more than 20 frameworks

Integrations

Learn more about connecting Statsig to your existing tools

A/B Testing Calculator

Looking for a place to start your A/B Test? Try out our calculator

How Statsig Works

Get under the hood and check out how Statsig scales with you

Open Source Code

All our SDKs and supporting libraries are Open Source and regularly updated

Product Updates

We ship fast, to help you ship faster. Check out all our product updates

System Status

Want to understand how reliable Statsig is as a service? Take a look at our system status

LEARN & CONNECT

Blog

Peak Velocity is our blog where we cover the latest in data

Support

Need help getting set up or have questions about our product?

Customer Stories

Find out how leading companies are using Statsig to grow

Events

Find out about our online and offline events and RSVP to them

Build vs Buy

Compare building an in-house platform vs. buying

Contact Sales

Want to connect with someone from the Sales team?

FEATURED BLOGS

Feature Flags Liberated

Gating features is a core part of the development process. And with Statsig, it's free.

How AI Companies Use Statsig

The best AI companies use Statsig to accelerate growth. Learn how you can do the same.

What is Product Observability?

Product observability means being able to monitor, control, and gain insight into all of your features.

Platform

Developers

Resources

Pricing

OVERVIEW

Statsig Blog

Peak Velocity is our blog where we cover the latest in experimentation and more

Feature Management

Ship faster and more confidently

Experimentation

Run 100s of randomized, multivariate experiments

Data Warehouse

Run experiments natively, in your warehouse

Analytics

Actionable intelligence at your fingertips

ROLES

Build fast with Be Significant
Our exclusive startup program

GETTING STARTED

Documentation

Documentation to get you started with implementation

Walkthrough Guides

Guides to get you started with Statsig in no time

SDKs and APIs

Explore REST API and SDKs in more than 20 frameworks

Integrations

Learn more about connecting Statsig to your existing tools

A/B Testing Calculator

Looking for a place to start your A/B Test? Try out our calculator

How Statsig Works

Get under the hood and check out how Statsig scales with you

Open Source Code

All our SDKs and supporting libraries are Open Source and regularly updated

Product Updates

We ship fast, to help you ship faster. Check out all our product updates

System Status

Want to understand how reliable Statsig is as a service? Take a look at our system status

LEARN & CONNECT

Blog

Peak Velocity is our blog where we cover the latest in data

Support

Need help getting set up or have questions about our product?

Customer Stories

Find out how leading companies are using Statsig to grow

Events

Find out about our online and offline events and RSVP to them

Build vs Buy

Compare building an in-house platform vs. buying

Contact Sales

Want to connect with someone from the Sales team?

FEATURED BLOGS

Experiments with Generative AI

We built a generative AI app in reactJS using OpenAI’s API and Statsig. Here’s what we learned:

Experimentation Platforms

The decision to build versus buy an experimentation and feature flagging platform is not an easy one.

CUPED Explained

CUPED is an implementation that uses pre-experiment data to explain the variance in the result data.

CUPED on Statsig

Thu Jul 07 2022

Run experiments with more speed and accuracy

We’re pleased to announce the rollout of CUPED for all our customers. Statsig will now automatically use CUPED to reduce variance and bias on experiments’ key metrics. This gives you access to a powerful experiment tool being used at top tech companies like Microsoft, Facebook, Airbnb and more.

CUPED is a technique where we leverage pre-experimental data to reduce variance and pre-exposure bias in experiment results. This can significantly shrink confidence intervals and p-values, reduce the required sample sizes (and correspondingly durations) needed to run experiments, and will ultimately help you to move more quickly and with more confidence.

How this will help you

Waiting for experiments to gather sufficient sample size can be a major blocker to companies at any scale. Because of this, various techniques have been developed to reduce the variance (or “noise”) in experimental data, and reduce the amount of time teams spend waiting for results.

CUPED (Controlled-experiment Using Pre-Experiment Data) is one of these techniques, publicized for this use case by Microsoft in 2013. In our implementation, we use users’ metrics from the period just before an experiment launched to adjust their metrics for that experiment. This can help to reduce variance and bias, which we’ll jump into below.

Variance Reduction

In the hypothetical example below, we run an experiment that increases our mean metric value from 4 (control) to 6 (variant). However, because of the noise in the population there’s a lot of uncertainty in this estimate, and it’s unclear if the data is showing a real difference.

Applying CUPED improves the precision of our mean estimates, and makes it clearer that the populations are different. For the experimenter, this means smaller confidence intervals and lower p-values at the same effect size.

This works by using pre-experiment data to “explain” some of the observed variance. Instead of treating users’ metric outcomes as only depending on their experiment group, we treat each user’s metrics as being based on both their experiment group and their own past behavior. The more users’ past behavior is correlated with their future behavior, the more variance we can explain away in this way.

Bias Correction

Randomization doesn’t result in perfectly balanced groups. In rare cases you will see large differences between groups even before an experiment started, which is referred to as pre-exposure bias. In addition to variance reduction, CUPED is also a way to reduce pre-exposure bias.

In the chart below, we see an experiment whose results would look fairly neutral. However, it’s clear that users saw a meaningful lift after they were exposed to the variant — they just started off with much lower values. (It’s worth mentioning that a bias this extreme is very rare!)

CUPED can shrink this gap. Users who had higher-than-average pre-exposure data will have their metrics adjusted downwards, and vice versa. Accordingly, if the average user in the Variant group had a lower pre-experiment metric than the average user in Control, CUPED will close this gap by reducing the average Control user’s metric and increasing the average Variant user’s metric.

If we apply the adjustment to both the pre and post-exposure values to demonstrate, the CUPED-adjusted values might look like the chart below, closing some of the gap between the pre-exposure metrics.

biased metric results after cuped correction

Now the results will reflect the reality that this experiment drove an increase for users in the test group!

Our Approach

Following Statsig’s values, we’re shipping this quickly so that you can ship quickly. We’re using a simple and battle-tested version of this approach, where we use a user’s metric value from before an experiment startsto reduce the variance on thatsame metric in your experiment results.

We’ll be monitoring this as it rolls out to understand how effective this is and if there’s any next steps we can take to level up our approach.

Early Results

We don’t want you to have to wait for a full-fledged white paper before we give you a peek under the hood, but we also don’t want to only give you vanity stats on this tool. Here’s a brief rundown of the results we’ve seen in testing.

Sample Size / Variance Reduction

To understand the impact this will have on sample sizes, we looked at relevant experiment metrics (see note below) from 300+ customers to see how much CUPED would reduce required sample sizes (and experiment run times) due to lower metric variance:

Based on those results, we’re confident that CUPED will help people to move faster, with more confidence. It isn’t a magic bullet which makes every experiment run twice as fast, but we can already see that the value is there on real customer experiments.

This drop in required sample size corresponds to more precise results for the same sample size. An experiment where CUPED would reduce the required sample size by ~50% would shrink confidence intervals by ~30% if you kept the original sample size — leading to more Statistically significant results.

Significance

On the same set of experiments, we saw that about 9% of experiments had a key metric turn out to be statisticallysignificant* after applying CUPED, where it was previously not statistically significant. This is because of some combination of variance reduction and reducing a pre-exposure bias that previously made the experiment look less impactful than it was.

Conversely, about 2% of experiments saw a previously statistically significant result turn out to not be statistically significant. This is an important result! The likely explanation is that experiments which were previously false positives due to pre-exposure bias are now correctly categorized — meaning those results are more accurate.

How to work with CUPED

The effectiveness of CUPED varies across user populations and metrics. It’s worth knowing what to look for!

The more stable a metric tends to be for the same user over time, the more CUPED can reduce variance and pre-experiment bias
CUPED utilizes
pre-exposure
data for users, so experiments on new users or newly logged metrics won’t be able to leverage this technique
Getting in the habit of setting up key metrics and starting to track metrics before an experiment starts will help you to get the most out of CUPED on Statsig

It’s common for growth-stage companies to be running lots of experiments focused on new user acquisition and conversion. We also have lots of companies which are new on the platform and tend not to have all of their logging set up. As companies mature in their usage and adoption of Statsig, we expect to see CUPED become more and more effective.

*For the analysis above, we limited our dataset to experiments where over 50% of the users in the experiments could have CUPED adjustments applied. This filters out experiments with mostly new users, or metric results on metrics which don’t have sufficient history to calculate pre-experiment data.

Let’s Ship!

CUPED-adjusted results will be available in your key metric results across all experiments in the new results card in the coming days. Look for the “CUPED” flag above your key metrics.

Give it a try, check out the docs, and let us know how it’s working for you!

Featured

Statsig for startups

Statsig offers a generous program for early-stage startups who are scaling fast and need a sophisticated experimentation platform.

Stay ahead of the curve

Get experimentation insights in your inbox!

Permalink: https://www.statsig.com/blog/cuped-on-statsig

Try Statsig Today

Get started for free. Add your whole team!

Try for Free

Platform

Developers

Resources

Statsig Blog

See All Features

Feature Management

Experimentation

Data Warehouse

Analytics

Engineering

Dev Ops

Data Science

Product Management

Artificial Intelligence

Gaming

B2B Saas

E-Commerce

Build fast with Be Significant Our exclusive startup program

Documentation

Walkthrough Guides

SDKs and APIs

Integrations

A/B Testing Calculator

How Statsig Works

Open Source Code

Product Updates

System Status

Blog

Support

Customer Stories

Events

Build vs Buy

Contact Sales

Feature Flags Liberated

How AI Companies Use Statsig

What is Product Observability?

Platform

Developers

Resources

Pricing

Statsig Blog

See All Features

Feature Management

Experimentation

Data Warehouse

Analytics

Engineering

Dev Ops

Data Science

Product Management

Artificial Intelligence

Gaming

B2B Saas

E-Commerce

Build fast with Be Significant Our exclusive startup program

Documentation

Walkthrough Guides

SDKs and APIs

Integrations

A/B Testing Calculator

How Statsig Works

Open Source Code

Product Updates

System Status

Blog

Support

Customer Stories

Events

Build vs Buy

Contact Sales

Experiments with Generative AI

Experimentation Platforms

CUPED Explained

Back to blog home

CUPED on Statsig

Craig Sexauer

Run experiments with more speed and accuracy

How this will help you

Our Approach

Early Results

Build fast with Be Significant
Our exclusive startup program

Build fast with Be Significant
Our exclusive startup program