You understand that failing to plan the test duration can lead to underpowered tests and inflated false positive rates due to peeking. Recently, you've been introduced to CUPED, an advanced statistical method that reduces KPI variance, resulting in more sensitive tests (lower MDE) or shorter test durations (lower sample size). You prefer the latter.
If this description fits you, you’re in the right place 😊
In this post we’ll learn how to use CUPED in the planning phase to reduce sample size while maintaining statistical power and MDE. Pretty wild, right?
Test planning is crucial in controlled experiments to ensure accurate and reliable results. Proper planning of sample size helps avoid underpowered tests, which can fail to detect real effects. Proper planning also reduces the risk of inflated false positive rates when not using sequential testing methods, because when you do not determine the experiment sample size you tend to peek at the results more than once. Here's the formula for calculating sample size for a t-test for two means:
n: Sample size for each group, assuming equal allocation
Notice that the variance,
CUPED (Controlled Utilization of Pre-Existing Data) is a technique designed to leverage pre-existing data to reduce variance in experiment KPIs, thereby enhancing test sensitivity. By adjusting for variability unrelated to the experiment, CUPED reduces noise in the data, making it easier to detect the true effect of the experiment.
To illustrate, let's consider a business metric
With CUPED, we introduce a new variable,
The intuition behind CUPED is that the total variance of the business metric
Here's the CUPED formula:
Where:
The variance of
Where
Example:
In reality, we don't know the true values of the variables, so we must estimate them.
Suppose you have KPI data for a website's page views. In the pre-experiment period, the average page views per user
Estimate
Using CUPED, the adjusted KPI for the experiment period would be:
CUPED is particularly useful in situations where there is a lot of variability in the business metric that is not related to the treatment effect, but can be explained by pre-existing data. By using pre-existing data to adjust for this variability, CUPED can help to isolate the true effect of the treatment, leading to more reliable and accurate results. This is especially important in A/B testing, where the goal is to detect small differences between treatment and control groups.
There’s no definitive rule for determining when pre-existing data is sufficient for CUPED. In my view, a Pearson correlation above 0.5 (or lower than -0.5), which corresponds to a 25% reduction in sample size, serves as a good rule of thumb.
Moreover, CUPED can be used in conjunction with other experimental techniques to further improve the efficiency and accuracy of the test. For example, it can be combined with sequential testing, which also reduces test duration by sequentially analyzing the data and applying an optional stopping rule.
Another advantage of CUPED is that it can be implemented relatively easily in most experimental settings. The required pre-existing data is often readily available, and the calculations needed to implement CUPED are straightforward and can be performed using standard statistical software packages.
Integrating CUPED into test planning involves the following steps:
Thus, the required sample size using CUPED is 190.
Planning is essential for accurate and reliable A/B testing, and CUPED is a powerful technique that can significantly reduce test durations by reducing variance. By understanding and applying CUPED during the planning phase, you can maintain statistical power and MDE while requiring fewer samples.
CUPED not only enhances the sensitivity of your tests but also helps in resource optimization by reducing the number of subjects needed for experiments. This makes it an invaluable tool in the arsenal of any data scientist or analyst involved in experimental design and analysis.
Today, CUPED is widely implemented in A/B testing platforms like Eppo and Statsig, further demonstrating its value in streamlining and improving the efficiency of experimental processes.
As you incorporate CUPED into your test planning, you'll find that your experiments become more efficient, reliable, and impactful.
And if you'd like to start implementing CUPED into your own experiments, create a free Statsig account and dive in.