Products

Solutions

Resources

Docs Pricing

Products

Solutions

Resources

Products

Solutions

Resources

What are hold-out groups? How to use them in A/B testing

Wed Dec 04 2024

Ever wondered how companies decide which new features to roll out and which ones to ditch? A lot of it comes down to A/B testing, but there's a secret weapon that often flies under the radar—hold-out groups. These groups help teams understand the true impact of their product changes over time.

In this post, we'll dive into what hold-out groups are, why they matter, and how to set them up effectively. By the end, you'll have a solid grasp of how to use hold-out groups to make smarter, data-driven decisions for your product.

Understanding hold-out groups in A/B testing

In A/B testing, hold-out groups are users who are intentionally kept away from any product changes. This approach lets us clearly compare the behavior of users who experience updates with those who don't. Hold-out groups act as a global control, providing a baseline to measure the long-term impact of product experimentation.

Unlike traditional control groups, which are only excluded from specific changes, hold-out users don't experience any new updates at all. This isolation is crucial for spotting negative side effects or user fatigue that might develop over time. By keeping an eye on key metrics like engagement, retention, and revenue, we can identify trends and interactions between experiments that might not show up in short-term results.

To maintain the integrity of hold-out groups, it's essential to keep them completely separate. Using feature flags and consistent user segmentation ensures that hold-out users don't accidentally experience changes. When running multiple experiments, you can choose between mutually exclusive or shared hold-out groups, depending on your strategy and goals.

Setting up a hold-out test involves randomly selecting a small slice of your user base—typically around 5-10%—and excluding them from any product changes. By comparing the performance of hold-out and treatment groups, you can evaluate the cumulative impact of your experiments over time. Statistical analysis and techniques like Monte Carlo simulations help estimate the true effect of changes, balancing short-term gains with long-term considerations.

Tools like Statsig can simplify this process, making it easier to manage hold-out groups and analyze results effectively.

Setting up effective hold-out tests

So, how do you set up a successful hold-out test? Start by creating three groups: control, test, and hold-out. A common allocation might be 45% control, 45% test, and 10% hold-out. This setup ensures a balanced comparison while minimizing the number of users who miss out on new features.

Feature flags are crucial here. By implementing feature flags, you can make sure hold-out users don't receive any new features, allowing for an accurate assessment of the cumulative impact over time.

Consistent user segmentation is key to avoiding biases and data pollution. Assign users to specific groups based on stable identifiers like user IDs, and maintain these assignments throughout the test period. This practice helps prevent issues like cookie churn or selection bias that can undermine your results.

When running multiple experiments simultaneously, consider whether to use mutually exclusive or shared hold-out groups. Mutually exclusive hold-outs provide a clean baseline for each experiment, while shared hold-outs allow you to measure the combined impact of multiple changes. Choose the approach that best aligns with your testing goals and resources.

Analyzing and interpreting hold-out test results

Once your hold-out test is up and running, it's time to dive into the data. Monitor key metrics like engagement, retention, and revenue over time. This analysis can uncover the real impact of product changes and any unintended consequences. Techniques like stratified sampling and help ensure your hold-out group accurately represents your user base and aid in estimating impacts accurately, even when results aren't statistically significant.

Interpreting the results requires a keen eye for trends and patterns. If the hold-out group consistently outperforms the exposed group, it might indicate that the changes are having negative long-term effects on user behavior. On the other hand, if the exposed group shows sustained improvements over the hold-out group, it suggests that the changes are making a positive impact.

However, be cautious of factors like and , which can cause initial gains to fade over time. That's why ongoing monitoring is so important. Platforms like Statsig can provide these statistical tools at your fingertips, simplifying the analysis process.

Collaborating with cross-functional teams is essential when analyzing hold-out test results. Product managers, designers, and engineers should work together to identify the root causes of any observed differences and determine the best course of action. This collaboration might involve tweaking the changes, rolling them back, or exploring alternative solutions.

Integrating hold-out testing into product development

Bringing hold-out testing into your agile development cycles ensures that test results inform sprint planning and product roadmaps. This practice fosters a data-driven culture and mitigates risks when deploying new features. Coordinating hold-out tests across teams requires clear communication and shared objectives.

To effectively integrate hold-out testing, consider the following:

Align hold-out tests with sprint goals and feature releases
Use feature flags to manage user exposure to changes
Regularly review hold-out test results during sprint retrospectives

By making hold-out testing a core part of your development process, you can validate product decisions and reduce the risk of unintended long-term consequences. Collaboration between product, engineering, and data teams is crucial for designing effective hold-out tests and interpreting results.

Fostering a culture of experimentation and data-driven decision-making takes buy-in from stakeholders across the organization. Regularly communicating the value of hold-out testing and sharing insights from experiments can help build support for this approach. Tools like Statsig can streamline the implementation and management of hold-out tests, making it easier to adopt this practice within your workflow.

Closing thoughts

Hold-out groups are a powerful tool in A/B testing, offering a clear lens to understand the true impact of your product changes over time. By carefully setting up and analyzing hold-out tests, you can make more informed, data-driven decisions that enhance your product and delight your users.

If you're keen to learn more, check out Statsig's resources on hold-out testing or explore this insightful article on online experiments. Embracing hold-out groups can truly transform your approach to product development.

Hope you found this helpful!

Permalink: https://www.statsig.com/perspectives/holdout-groups-ab-testing

Products

Solutions

Resources

Products

Solutions

Resources

Docs

Pricing

Back to Perspectives home

The Statsig Team

What are hold-out groups? How to use them in A/B testing

Understanding hold-out groups in A/B testing

Setting up effective hold-out tests

Analyzing and interpreting hold-out test results

Integrating hold-out testing into product development

Closing thoughts

Recent Posts

Optimizing cloud compute costs with GKE and compute classes

Pablo Beltran

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan

Speeding up A/B tests with discipline

Yuzheng Sun, PhD