Calculate exact relative metric deltas with Fieller intervals

Tue Jun 10 2025

Fieller's method calculates relative metric deltas more accurately than the Delta Method, and they're available as a feature in Statsig now.

When you're interpreting experimental results, it’s often more intuitive to look at relative changes rather than absolute ones. For example, instead of saying, "This experiment improved page load latency by 100 ms," it's often more helpful to compare this change to the baseline and say something like, "This experiment decreased page load latency by 15%."

This is because relative metrics abstract away raw units, which lets you more easily understand the practical impact of product changes. To calculate this relative lift, we have to work with a ratio:

\[ \textit{RelativeLift} = \frac{X_T - X_C}{X_C} \]

The challenge with this is that since both the numerator and denominator are random variables with their own variance, we can’t simply use the standard confidence interval methodology.

That's where Fieller intervals come in. Fieller intervals are an exact, analytic solution for the lower and upper bounds of a ratio’s confidence interval, and they're now available for relative metric deltas in Statsig.

Introducing Fieller intervals in Statsig

Statsig now supports Fieller intervals for calculating confidence intervals around relative metric deltas. Compared to the Delta Method, Fieller intervals are a more accurate representation of your experiment results. They're especially useful when the numerator and denominator are noisy, or if the denominator is small.

We recommend enabling Fieller Method Calculations in your organization settings for new experiments going forward. When you turn on this setting, it will only impact experiments created after the setting change. Read our docs for more details about this change.

What's the difference between Fieller intervals and the Delta Method?

The Delta Method is an approximation for the variance of a ratio between two variables, which is then used to establish a confidence interval. Fieller intervals, on the other hand, are an exact solution for the confidence interval itself, which gives you more accurate and precise information about your experiment results. When your sample size is sufficiently large, the Delta Method produces the same confidence interval as the Fieller interval.

Another benefit of Fieller intervals is that it addresses an edge case that can happen with the Delta Method. When the absolute lift’s confidence interval doesn’t cross zero, but the relative lift interval does, it can make the result confusing. It's fairly rare, but it occurs when:

  • the number of units in the control group is relatively small, and

  • the denominator is relatively noisy (but still statistically distinct from 0)

When this happens, we recommend relying on the absolute lift’s confidence interval to conclude statistical significance. You can also compare the p-value to the specified α to understand whether or not a product change is statistically significant. This means that the magnitude of the percent change is unstable, and although we’re confident there’s a real effect, we’re less certain about the relative size of that effect.

In most cases though, Fieller interval results are very similar to results from the Delta Method. Since Fieller Intervals are more accurate, we recommend that you opt into using this methodology. New Statsig customers will automatically be opted in to this setting.

How are Fieller intervals calculated?

1. Determine if a Fieller interval is well-defined

Before proceeding to apply Fieller's Theorem, we need to check that the denominator of the relative lift metric is significantly distinct from 0. We do this by calculating the parameter g:

\[ g = \frac{Z_{\alpha/2}^2 \cdot \mathrm{var}(X_C)}{(n_C - 1) \cdot \overline{X_C}^2} \]

Where:

  • \( Z_{\alpha/2} \) is the critical value associated with the desired confidence level
  • \( \mathrm{var}(X_C) \) is the variance of the control group metric values
  • \( n_C \) is the number of units in the control group
  • \( \overline{X_C} \) is the mean of the control group metric values

When g < 1, the control mean is significantly different from 0, and we can use Fieller intervals.

2a. Apply Fieller's interval formula

Since the control and test group results are independent of each other, covariance terms in Fieller's Theorem can be dropped.

\[ CI(\%\Delta \overline{X}) = \frac{1}{1-g} \left( \frac{\overline{X_T}}{\overline{X_C}} - 1 \pm \frac{Z_{\alpha/2}}{\sqrt{n_C} \cdot \overline{X_C}} \sqrt{(1-g) \cdot \frac{\mathrm{var}(X_T)}{n_T(n_T-1)} + \frac{\overline{X_T} \cdot \mathrm{var}(X_C)}{\overline{X_C} \cdot n_C (n_C - 1)}} \right) \]

2b. For edge cases

In rare cases (<5% of observed metric comparisons on Statsig), g ≥ 1, which means that the control group’s mean is not statistically distinguishable from 0.

When the control group’s mean is not statistically different from zero, the denominator of our relative lift calculation is unstable. This means that the confidence interval for the percent difference between test and control is unbounded.

When this happens, we surface the relative lift observed during the experiment.

\[ \%\Delta \overline{X} = \frac{\overline{X_T} - \overline{X_C}}{\overline{X_C}} \]

What does this change mean for existing Statsig users?

Notation and reporting style changes

The main difference is that Fieller intervals are asymmetrical, whereas the Delta Method produces symmetrical confidence intervals. This is related to the fact that Fieller intervals give you more exact confidence intervals.

For example, this is what a result will look like with Fieller intervals:

fieller1

With the Delta Method, the common notation is to report a change with the notation "X%±Y%". Fieller intervals are instead notated with the exact interval, such as [-75.2%, +8.2%]. We believe adjusting this reporting model is worth the accuracy achieved by using Fieller intervals.

That said, if you prefer the old reporting style, you can always switch to the absolute delta view using the dropdown on your experiment scorecard:

fieller4

Edge case reporting

As mentioned in our calculations section, there are some cases where the control group’s mean is not statistically distinguishable from 0. When this occurs, we’ll use the same color coding for a non-statistically significant result, statistically significant positive result, or statistically significant negative result, depending on the p-value associated with the change and the direction of the difference. For example:

fieller3

We strongly recommend customers use Fieller intervals as they’re more accurate. In many cases, the results will be effectively the same, but especially if you’re running experiments with small sample sizes or noisy denominators, Fieller intervals give you more reliable reporting.

Further reading

Talk to the pros, become a pro

We're on standby to help you start using your data to make better decisions.
isometric cta: People

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy