Products

Solutions

Resources

Docs Pricing

Products

Solutions

Resources

Products

Solutions

Resources

Designing controlled experiments to test correlated metrics

Fri Mar 28 2025

What are correlated metrics?

Correlation occurs when there is a relationship between the values of some variable with the values of some other bariable. This is contrasted by independence, where there is no relationship between the values of two variables.

Correlation can be described as being weak or strong depending on how poorly or well one measurement can predict the value of the other. This is often measured by the Pearson Correlation Coefficient, which describes the linear relationship between two variables. For variables X and Y this measure is:

\[ \textit{corr}(X, Y) = \frac{\textit{cov}(X, Y)}{\sigma_X \sigma_Y} \]

Correlated metrics in an experiment

In an experimental context, you might care about several different types of correlated metrics:

Metric families, metrics that measure the same/similar phenomena - e.g. a total spend per user, total revenue per buyer, and a 0/1 indicator for purchasing
Surrogate metrics, metrics that are a leading indicator or another metric - e.g. total spend in the first 7 days a user is active may be predictive of total spend in the following 6 months
Intrinsic metrics, metrics that tend to take certain values among a certain “type” of experimental unit - e.g. total spend in the first 7 days may be correlated with the median salary in a users zip code

These all can serve different functions in an experiment

Metric families

Experiments are not only helpful in determining what to do but also help determine why interventions are effective. Metric Families can help you more deeply understand the why behind changes in user behaviors in your product.

Let’s say that your change increased overall revenue per user. The mechanism for this change could be by improving the rate of users who make purchases or by increasing the amount most users spend. These metrics are correlated, but they can help flesh out the whole picture for what is happening behind a given experimental result.

Surrogate metrics

Often times the most important business metrics are calculated over a long period of time and can be hard to move over the duration of an experiment. Surrogate Metrics are used to predict the impact of the high-level long-term business metric from data points that are known earlier in an experiment.

Let’s say that your change increased user spend in the first 7 days they’re active on the platform, which is positively correlated with user spend in the next 6 months. This may be a good predictive metric (or part of a more complicated prediction model) for estimating a more long term impact.

Note that usage of Surrogate Metrics also requires for prediction error to be accounted for in reading experimental restuls.

Intrinsic metrics

No one wants to make an experiment decision based on a false positive. Intrinsic metrics which are related at the unit level but aren’t impacted the same by an experimental treatment can help serve as a guardrail for detecting false positives.

In an AA test, where no real intervention has taken place, correlated metrics will tend to be incorrect at the same time. This is why intrinsic metrics can make good guardrails.

Let’s say that your change increased overall revenue per user, and revenue per user is typically related to the median salary in their zip code. If your change also increased the median salary in a users zip code, that’s an indication that this result may be a false positive. That’s because there’s not a reasonable mechanism for an online experiment treatment to cause a user to move to a new zip code or for their entire zipcode’s salaries to increase.

Independence assumptions in multiple comparison corrections

Correlated metrics can be troublesome are when trying to use a multiple comparison correction. Many multiple comparison corrections like the Bonferroni Correction and Benjamini-Hochberg Procedure assume measurements are independent of each other.

When metrics are positively correlated, multiple comparison corrections will overly-penalize the addition of more metrics. These methods still provide an upper bound for the possible error rates, there’s just more loss of statistical power than necessary in these methods.

Conversely, when negatively correlated, the guaranteed error rates from different multiple comparison corrections will not actually be valid.

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.

Grab a Demo

Permalink: https://www.statsig.com/blog/designing-controlled-experiments-correlated-metrics

Products

Solutions

Resources

Products

Solutions

Resources

Docs

Pricing

Back to Blog home

Liz Obermaier

Designing controlled experiments to test correlated metrics

What are correlated metrics?

Correlated metrics in an experiment

Metric families

Surrogate metrics

Intrinsic metrics

Independence assumptions in multiple comparison corrections

Request a demo

Recent Posts

Optimizing cloud compute costs with GKE and compute classes

Pablo Beltran

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan

Speeding up A/B tests with discipline

Yuzheng Sun, PhD