Hochberg procedure: controlling false discoveries in experiments

Sat Aug 31 2024

Have you ever run multiple experiments at once and found yourself overwhelmed by conflicting results? Or maybe you've wondered if some of those "significant" findings are actually just false alarms. You're not alone. When testing multiple hypotheses simultaneously, the risk of false positives can skyrocket, making it tricky to trust your data.

At Statsig, we're passionate about helping teams make confident, data-driven decisions. That's why we're diving into the challenges of multiple comparisons in experiments and exploring how methods like the Hochberg procedure can help control false discoveries without stifling innovation.

The challenge of multiple comparisons in experiments

In the world of experimentation, testing several hypotheses at once can increase the risk of false positives, also known as Type I errors. This happens when a null hypothesis is incorrectly rejected, suggesting a significant effect where none actually exists. False discoveries can mislead product development and data-driven decision-making, potentially leading teams astray and wasting valuable resources.

Controlling false discoveries is crucial to ensure your experimental results are reliable and actionable. Without proper control, the likelihood of encountering false positives rises with the number of hypotheses tested. This phenomenon is known as the multiple comparisons problem and is a common concern in various fields, from online experiments to genomics research.

Traditional methods for controlling Type I errors, like the Bonferroni correction, can be overly conservative, especially when dealing with a large number of hypotheses. This approach may lead to missed opportunities for improvement because it reduces the power to detect true effects. Alternatively, the Benjamini-Hochberg procedure offers a more balanced approach by controlling the False Discovery Rate (FDR) instead of the Family-Wise Error Rate (FWER).

At Statsig, we've integrated these methodologies into our platform, allowing you to choose the method that best fits your experimentation needs.

Introducing the Hochberg procedure for controlling false discoveries

Another powerful method for managing false discoveries in multiple hypothesis testing is the Hochberg procedure. It controls the family-wise error rate (FWER) using a step-up approach, sequentially comparing ordered p-values to adjusted significance levels. This allows you to identify true effects while maintaining stringent control over Type I errors.

Compared to the more conservative Bonferroni correction, the Hochberg procedure offers increased statistical power. By reducing the risk of missing real discoveries (Type II errors), it provides a valuable tool for balancing error control with the detection of genuine effects. This is particularly relevant in fields like genomics or neuroimaging, where large datasets are common and the cost of false negatives can be high.

To apply the Hochberg procedure, you:

  1. Order your p-values from smallest to largest.

  2. Starting from the largest p-value, compare each p-value to an adjusted significance level.

  3. Reject hypotheses based on this comparison to control the FWER.

This step-up approach adapts the stringency of the adjustment based on the strength of the evidence against the null hypothesis. By considering the rank of each p-value, it often leads to a higher number of true discoveries while still maintaining strict control over false positives.

When deciding between the Hochberg procedure and other methods like the Benjamini-Hochberg procedure, consider your goals and the nature of your data. If controlling the FWER is a top priority and you have a moderate number of tests, the Hochberg procedure might be the way to go. However, if you're willing to tolerate a slightly higher rate of false positives in exchange for increased power, the Benjamini-Hochberg procedure’s control of the FDR could be more appropriate.

Applying the Hochberg procedure in practice

So, how do you actually apply the Hochberg procedure? Here's a step-by-step guide:

  1. Order your p-values from smallest to largest: p(1), p(2), ..., p(m).

  2. Starting from the largest p-value, for each p-value p(i):

    • Calculate the adjusted significance level: α / (m - i + 1), where:

      • α = desired significance level (e.g., 0.05)

      • m = total number of hypotheses

      • i = rank of the p-value

    • Compare p(i) to the adjusted α.

    • If p(i) ≤ adjusted α, reject the null hypothesis for p(i) and all smaller p-values.

Let's walk through an example. Suppose you have 5 hypotheses with the following ordered p-values: 0.01, 0.03, 0.05, 0.07, and 0.10. Using the Hochberg procedure with α = 0.05:

Rank (

i

)

P-value

Adjusted α

5

0.10

0.05 / (5 - 5 + 1) = 0.05

4

0.07

0.05 / (5 - 4 + 1) = 0.025

3

0.05

0.05 / (5 - 3 + 1) = 0.0167

2

0.03

0.05 / (5 - 2 + 1) = 0.0125

1

0.01

0.05 / (5 - 1 + 1) = 0.01

Starting from the largest p-value (0.10):

  • p(5) = 0.10 > adjusted α = 0.05 → Do not reject

  • p(4) = 0.07 > adjusted α = 0.025 → Do not reject

  • p(3) = 0.05 > adjusted α = 0.0167 → Do not reject

  • p(2) = 0.03 > adjusted α = 0.0125 → Do not reject

  • p(1) = 0.01 ≤ adjusted α = 0.01Reject null hypothesis for p(1)

In this case, we only reject the null hypothesis for the first p-value (0.01). This process ensures strict control over the family-wise error rate, making the Hochberg procedure more powerful than the Bonferroni correction while still limiting false positives.

By understanding and correctly applying the Hochberg procedure, you can make informed decisions based on your experimental data. It helps balance the risk of false positives with the ability to detect true effects. This is particularly useful in scenarios with multiple comparisons, like online experiments or multi-outcome analyses.

At Statsig, we're committed to helping you navigate these statistical challenges, so you can focus on building great products.

Comparing Hochberg with other methods and best practices

The Hochberg procedure is less conservative than the Bonferroni correction but more stringent than the Benjamini-Hochberg procedure. It offers more power while still maintaining strong control over Type I errors. In contrast, the Benjamini-Hochberg procedure controls the False Discovery Rate (FDR), allowing for a higher proportion of false positives among significant results.

When choosing a method, consider your experimental goals and the consequences of errors. If you prioritize avoiding false positives, the Bonferroni correction or Hochberg procedure might be more appropriate. But if you aim to maximize discoveries while tolerating some false positives, the Benjamini-Hochberg procedure could be a better fit.

Balancing discovery and reliability in multiple hypothesis testing requires a clear understanding of your objectives and the implications of errors. Designing and executing controlled tests properly is crucial for maintaining the integrity of your results. This involves ensuring data quality, identifying outliers, and validating assumptions.

Interpreting results accurately is key to making informed decisions. Be cautious of heterogeneous treatment effects and potential biases that can skew outcomes. Visualizing p-value distributions can help identify issues with test assumptions or data characteristics. When faced with complex issues, consulting a statistician can provide valuable insights.

At Statsig, we provide tools and expertise to help you design robust experiments and interpret results thoughtfully. We're here to help you effectively manage Type I errors while maximizing the impact of your findings.

Closing thoughts

Navigating the complexities of multiple hypothesis testing doesn't have to be daunting. The Hochberg procedure offers a balanced way to control false positives without sacrificing too much statistical power. Whether you choose the Hochberg procedure, the Bonferroni correction, or the Benjamini-Hochberg procedure, understanding these methods helps you make more confident decisions based on your data.

At Statsig, we're here to support you in making sense of your experimental results, so you can confidently iterate and scale your experimentation programs. If you're interested in learning more, check out our resources on controlling Type I errors and the Benjamini-Hochberg procedure.

Happy experimenting!

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy