Holdout testing: The key to validating product changes

Mon Jul 01 2024

Holdout testing is a powerful experimentation technique that enables you to validate the effectiveness of your product changes over an extended period.

By withholding a small group of users from experiencing the changes, you can compare their behavior and metrics against those who received the updates. This approach provides valuable insights into the long-term effects of your product decisions.

What is holdout testing?

Holdout testing is a type of experimentation that involves excluding a small subset of users from receiving a product change or feature update. The purpose of holdout testing is to measure the long-term impact of the change by comparing the behavior and metrics of the holdout group against the group that received the update.

Unlike traditional A/B testing, which focuses on short-term metrics and immediate user reactions, holdout testing allows you to assess the long-term effects of your product changes. By monitoring the holdout group over an extended period, typically weeks or months, you can gain a more comprehensive understanding of how the change influences user behavior and key metrics.

One of the key benefits of holdout testing is its ability to detect any negative long-term consequences that may not be apparent in short-term A/B tests. For example, a new feature might show positive results in an A/B test but lead to user fatigue or disengagement over time. Holdout testing helps you identify such issues and make informed decisions about whether to fully roll out the change or make further iterations.

Additionally, holdout testing is particularly useful when you have multiple experiments running concurrently. By maintaining a consistent holdout group, you can isolate the impact of each individual experiment and avoid potential interactions or confounding factors. This ensures that your product decisions are based on reliable and accurate data.

Setting up a holdout test

To set up a holdout test, start by creating an experiment in your experimentation platform. Name the experiment clearly to indicate it's a holdout test. Add a third "holdout" variant to the standard "control" and "treatment" variants.

Next, determine the holdout group size. A good rule of thumb is to allocate 10% of users to the holdout group. This provides a large enough sample for statistical significance while minimizing impact on the user experience.

Implement the experiment in your product by adding the necessary logic to show the treatment only to the "treatment" group. The "control" and "holdout" groups should see the unchanged version. Launch the experiment and let it run until it reaches statistical significance.

Once the experiment reaches significance, the real holdout test begins. If the "treatment" wins, roll it out to 90% of users while keeping the "holdout" at 10%. Continue monitoring key metrics for several weeks to ensure no long-term negative effects emerge.

Filter your analytics tools to compare the "treatment" and "holdout" groups. Look at metrics like:

  • Pages per session and errors per session

  • Average session duration and conversion rate

  • Real user behavior via session replays

If no issues arise after a few weeks, you can confidently roll out the "treatment" to all users. If problems do appear, you can quickly roll back using the feature flag. The holdout group enables this safe rollout process.

Holdout tests are a powerful tool for validating product changes. By setting aside a small group of users, you can monitor the long-term effects of your experiments. This ensures your improvements truly move the needle without unintended consequences.

Measuring long-term effects with holdouts

Holdout tests enable you to measure the long-term impact of product changes. By keeping a small group unexposed to the change, you can compare their behavior to the larger group that did see it. This comparison helps identify any delayed or unintended consequences.

To analyze holdout data, track key metrics like engagement, retention, and revenue over an extended period. Look for divergences between the holdout and exposed groups that may indicate a lagging effect of the change. For example, a new feature might boost short-term engagement but lead to higher churn rates months later.

Session-level metrics like pages per visit, time spent, and conversion rate are particularly useful for spotting these trends. By comparing how these metrics evolve differently for holdout users, you can assess the full impact of your experiment. Be sure to also monitor error rates and customer support inquiries to catch any issues that arise post-launch.

Interpreting holdout test results requires patience and nuance. Some effects may take weeks or months to materialize, so avoid drawing conclusions prematurely. When you do spot a significant divergence, dig deeper to understand the underlying cause. Analyze user segments, review session replays, and gather qualitative feedback to inform your next steps.

Holdout tests are a powerful tool for validating product changes, but they require careful planning and analysis. By tracking the right metrics over a sufficient timeframe, you can gain confidence that your experiments truly improve the user experience. And if you do uncover negative long-term effects, holdouts give you a built-in rollback mechanism to mitigate the impact. Managing user expectations is crucial in holdout testing. Users in the holdout group may feel left out or frustrated if they don't receive the new features. Clearly communicate the purpose and duration of the holdout test to mitigate negative reactions.

Balancing statistical significance and business goals can be tricky in long-term holdout tests. While achieving statistical significance is important, it's equally vital to consider the practical implications of the test results. Regularly assess the test's progress and make data-driven decisions that align with your business objectives.

To maintain test integrity over extended periods, implement robust monitoring and alert systems. These systems should detect any anomalies or issues that may compromise the holdout test's validity. Regularly review the test setup and data to ensure the experiment remains unbiased and representative of your user base.

Holdout test duration is another critical factor to consider. The length of the test should be sufficient to capture long-term effects but not so long that it becomes impractical or hinders product development. Work closely with your data science and product teams to determine the optimal duration for your specific holdout test.

When analyzing holdout test results, be cautious of confounding factors that may skew the data. External events, seasonality, or other experiments running concurrently can influence user behavior and impact the holdout test's outcomes. Use advanced statistical techniques, such as multivariate analysis or regression models, to isolate the true impact of the tested feature.

Iterative holdout testing can be an effective approach for complex features or significant product changes. Instead of running a single, lengthy holdout test, consider breaking it down into smaller, iterative tests. This allows you to gather feedback, make adjustments, and refine the feature before exposing it to the entire user base.

Finally, communicate the holdout test results clearly and transparently to stakeholders. Provide context around the test's objectives, methodology, and limitations. Use visualizations and storytelling techniques to make the insights accessible and actionable for non-technical audiences. By fostering a culture of experimentation and data-driven decision-making, you can maximize the value of holdout testing in your organization.

Questions?

Questions? We've got answers. Drop us a line and we'll get you whatever information you need.
isometric cta: Support

Integrating holdout testing into your product development cycle

Incorporating holdout tests into agile development is straightforward. Include them as part of your regular experimentation cadence, running alongside other A/B tests. Treat holdout test results as a key input during sprint planning and retrospectives.

Use holdout test insights to validate product decisions and guide your roadmap. If a holdout test reveals issues, prioritize fixes in upcoming sprints. Positive results from holdout testing can greenlight further iteration on successful features.

Foster a data-driven culture by evangelizing holdout testing across your organization. Educate teams on the importance of validating changes with long-term holdout tests. Celebrate wins uncovered through holdout testing to reinforce the practice.

Holdout tests are a powerful tool for mitigating risk when shipping new features. By exposing a small user segment to the old experience, you maintain a clean baseline for comparison. This allows you to confidently assess the full impact of your changes.

Regularly monitor holdout test results throughout your development cycle. Set up automated alerts to notify you of any significant divergences between holdout and treatment groups. This enables proactive identification and resolution of potential issues.

When planning new features, consider how you'll measure success with holdout testing. Define clear metrics and goals upfront, aligning with your overall product strategy. This sets the stage for effective holdout tests down the line.

Effective communication is key to getting buy-in for holdout testing. Help stakeholders understand how holdout tests contribute to a high-quality, data-informed product. Share results broadly to build trust in the process.

Remember, holdout testing is an ongoing practice—not a one-time event. Continuously assess your experimentation program to ensure you're running impactful holdout tests. Iterate on your approach based on learnings and feedback.

By weaving holdout testing into the fabric of your development process, you create a culture of continuous improvement. Teams adopt an experimentation mindset, embracing data as a guide for product decisions. This unlocks innovation while minimizing risk.

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.
request a demo cta image

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy