How to plan test duration when using CUPED

Wed Aug 14 2024

Picture this: You're planning an experiment and are well-versed in concepts like Minimum Detectable Effect (MDE), significance level, statistical power, and sample size determination.

You understand that failing to plan the test duration can lead to underpowered tests and inflated false positive rates due to peeking. Recently, you've been introduced to CUPED, an advanced statistical method that reduces KPI variance, resulting in more sensitive tests (lower MDE) or shorter test durations (lower sample size). You prefer the latter.

If this description fits you, you’re in the right place 😊

In this post we’ll learn how to use CUPED in the planning phase to reduce sample size while maintaining statistical power and MDE. Pretty wild, right?

What is test planning and why is it important?

Test planning is crucial in controlled experiments to ensure accurate and reliable results. Proper planning of sample size helps avoid underpowered tests, which can fail to detect real effects. Proper planning also reduces the risk of inflated false positive rates when not using sequential testing methods, because when you do not determine the experiment sample size you tend to peek at the results more than once. Here's the formula for calculating sample size for a t-test for two means:

$$ n = \left(\frac{\sigma \left(Z_{\alpha/2} + Z_{\beta}\right)}{\Delta}\right)^2 $$

n: Sample size for each group, assuming equal allocation
\( Z_{\alpha/2} \): Z-value for the desired significance level, (assuming two-tailed hypothesis)
\( Z_{\beta} \): Z-value for the desired power
\( \Delta \): Minimum Detectable Effect (MDE) in absolute terms, which is the difference between two means
\( \sigma \): Standard deviation

Notice that the variance, \(\sigma^2\), is directly proportional to sample size. Therefore, reducing variance by 40% means you need 40% less sample size. For instance, if you have Original Sample Size = n and Reduced Variance = 0.6\(\sigma^2\), then your new required sample size will be: 0.6n, then your new required sample size will be: 0.6n.

What is CUPED and why is it important?

CUPED (Controlled Utilization of Pre-Existing Data) is a technique designed to leverage pre-existing data to reduce variance in experiment KPIs, thereby enhancing test sensitivity. By adjusting for variability unrelated to the experiment, CUPED reduces noise in the data, making it easier to detect the true effect of the experiment.

To illustrate, let's consider a business metric \( \overline{Y} \), such as revenue. In a traditional t-test, we would compare the average revenue of the control group with that of the treatment group.

With CUPED, we introduce a new variable, \( \overline{Y}_{\text{CUPED}} \), by adjusting the original metric \( \overline{Y} \) using pre-existing data \( \overline{X} \). This data \( \overline{X} \) is scaled by a constant \( \theta \), which must be determined. Instead of comparing the average revenues in a t-test, we compare the \( \overline{Y}_{\text{CUPED}} \) values using a modified variance.

The intuition behind CUPED is that the total variance of the business metric \( \overline{Y} \) can be decomposed into two components: the portion attributed to the variance of the control variate \( \overline{X} \) and the portion explained by other unknown factors.

effect of CUPED on group mean distributions

Here's the CUPED formula:

\[ \overline{Y}_{\text{CUPED}} = \overline{Y} - \theta \times \overline{X} \]

Where:
\(\overline{Y}\): The business metric average during the experiment
\(\overline{X}\): Pre-existing data (e.g., business metric average from a pre-experiment period)
\(\theta\): The constant. Variance reduction optimality is achieved when \(\theta = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}\)

The variance of \(\overline{Y}_{\text{CUPED}}\) is reduced by the square of the Pearson correlation between \(\overline{X}\) and \(\overline{Y}\):

\[ \text{Var}\left(\overline{Y}_{\text{CUPED}}\right) = \text{Var}(\overline{Y})\left(1 - \rho^2\right) \]

Where \(\rho\) is the Pearson correlation coefficient. For instance, if \(\rho = 0.9\), variance is reduced by \(0.9^2 = 0.81\). Reducing the sample size by 80% is a huge achievement, and it indeed happens in practice. The higher the correlation, the bigger the variance reduction. This technique is powerful, especially when testing on existing users with rich historical data.

Example:

In reality, we don't know the true values of the variables, so we must estimate them.

Suppose you have KPI data for a website's page views. In the pre-experiment period, the average page views per user \(\overline{X}\) is 1000. During the experiment period, the page views per user \(\overline{Y}\) is 1100. The sample covariance between \(\overline{X}\) and \(\overline{Y}\) is 200, and the sample variance of \(\overline{X}\) is 250.

Estimate \(\theta\):

\[ \hat{\theta} = \frac{200}{250} = 0.8 \]

Using CUPED, the adjusted KPI for the experiment period would be:

\[ \overline{Y}_{\text{CUPED}} = 1100 - 0.8 \times 1000 = 300 \]

CUPED is particularly useful in situations where there is a lot of variability in the business metric that is not related to the treatment effect, but can be explained by pre-existing data. By using pre-existing data to adjust for this variability, CUPED can help to isolate the true effect of the treatment, leading to more reliable and accurate results. This is especially important in A/B testing, where the goal is to detect small differences between treatment and control groups.

There’s no definitive rule for determining when pre-existing data is sufficient for CUPED. In my view, a Pearson correlation above 0.5 (or lower than -0.5), which corresponds to a 25% reduction in sample size, serves as a good rule of thumb.

Moreover, CUPED can be used in conjunction with other experimental techniques to further improve the efficiency and accuracy of the test. For example, it can be combined with sequential testing, which also reduces test duration by sequentially analyzing the data and applying an optional stopping rule.

Another advantage of CUPED is that it can be implemented relatively easily in most experimental settings. The required pre-existing data is often readily available, and the calculations needed to implement CUPED are straightforward and can be performed using standard statistical software packages.

How to plan a test using the CUPED technique

Integrating CUPED into test planning involves the following steps:

  1. Calculate the Non-CUPED Sample Size: Use the regular t-test sample size formula.
  2. Determine Pearson Correlation: Check the correlation between pre-experiment data \(X\) and expected experiment data \(Y\) using historical data. For example, compare data from the last two weeks with the two weeks prior.
  3. Adjust Sample Size: Reduce the calculated sample size by the factor of \(\rho^2\).

Example Procedure:

  • Suppose the non-CUPED sample size is 1000.
  • Historical sampled data shows an estimated Pearson correlation of 0.9 between \(X\) and \(Y\).
  • Calculate the variance reduction factor: \(0.9^2 = 0.81\).
  • Adjust the sample size: \(1000 \times (1 - 0.81) = 190\).

Thus, the required sample size using CUPED is 190.

Takeaways

Planning is essential for accurate and reliable A/B testing, and CUPED is a powerful technique that can significantly reduce test durations by reducing variance. By understanding and applying CUPED during the planning phase, you can maintain statistical power and MDE while requiring fewer samples.

CUPED not only enhances the sensitivity of your tests but also helps in resource optimization by reducing the number of subjects needed for experiments. This makes it an invaluable tool in the arsenal of any data scientist or analyst involved in experimental design and analysis.

Today, CUPED is widely implemented in A/B testing platforms like Eppo and Statsig, further demonstrating its value in streamlining and improving the efficiency of experimental processes.

As you incorporate CUPED into your test planning, you'll find that your experiments become more efficient, reliable, and impactful.

And if you'd like to start implementing CUPED into your own experiments, create a free Statsig account and dive in.

Create a free account

You're invited to create a free Statsig account! Get started today with 2M free events. No credit card required, of course.
an enter key that says "free account"

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy