Group testing metrics are essential tools for monitoring and evaluating the performance of A/B tests. These metrics provide valuable insights into the safety and effectiveness of your experiments, helping you make data-driven decisions. By carefully selecting and tracking the right group testing metrics, you can ensure that your A/B tests are delivering the desired results without negatively impacting your users or business.
There are three main types of group testing metrics: goal metrics, guardrail metrics, and secondary metrics. Goal metrics are the primary metrics used to measure the success of your experiment, aligning with your overall business objectives and experiment hypothesis. For example, if you're testing a new checkout flow, your goal metric might be the conversion rate.
Guardrail metrics, on the other hand, are used to monitor for any negative impacts on critical business metrics. These metrics act as an early warning system, alerting you to potential issues that could harm your product or user experience. Examples of guardrail metrics include revenue, user retention, and customer satisfaction scores.
Secondary metrics are additional metrics that provide a more comprehensive view of your experiment's impact. While not the primary focus, these metrics can offer valuable insights into user behavior and help you better understand the effects of your A/B test. Examples of secondary metrics include engagement rates, time spent on site, and user feedback.
By carefully selecting and monitoring a combination of goal, guardrail, and secondary metrics, you can gain a holistic understanding of your A/B test's performance and make informed decisions based on the data.
Additional metrics tracked to gain deeper insights into experiment impact
Include user experience, engagement, and revenue metrics
Provide a more holistic view of the experiment's effects
Subset of secondary metrics monitored for significant negative impact
Inferiority tests applied to detect if treatment is worse than control
Crucial for identifying regressions that could undermine experiment success
Metrics that validate the experiment's quality and integrity
Examples: sample ratio mismatch tests, pre-exposure bias tests
Ensure the experiment results are reliable and trustworthy
To create a robust group testing metrics framework, follow these steps:
Define your decision rule:
Treatment must be superior on at least one success metric
Treatment must be non-inferior on all guardrail metrics
No success, guardrail, or deterioration metrics should show significant deterioration
Quality tests should not invalidate the experiment's integrity
Determine the number of metrics:
Let S = number of success metrics
Let G = number of guardrail metrics
Let D = additional deterioration metrics
Let Q = number of quality tests
Set your risk levels:
α = intended false-positive rate for the overall decision
β = intended false-negative rate for the overall decision
γ = intended false-positive rate for deterioration and quality tests
Apply corrections to control false-positive and false-negative risks:
Use γ for all deterioration and quality tests
Use α/S for superiority tests on success metrics
Use α/G for non-inferiority tests on guardrail metrics
Use β for all non-inferiority and superiority tests
By following this approach, you can design group testing metrics that align with your goals, monitor potential negative impacts, and ensure the reliability of your experimentation results. This comprehensive strategy empowers you to make data-driven decisions with confidence, optimizing your product for success.
Align metrics with business goals and user journey stages. This ensures you track indicators that matter for overall success. Consider metrics reflecting the impact on specific user journey stages, from acquisition to retention.
Balance primary and secondary metrics. Identify a main success indicator while supplementing with secondary metrics for a comprehensive view. Examples of primary metrics include revenue and conversion rate; secondary metrics may include click-through rate and average session duration.
Define and configure metrics in your chosen platform. Platforms like Eppo allow you to design and track crucial metrics for your group testing. This includes revenue, profit margins, conversion rate, and more.
Integrate your experimentation platform with data sources for accurate tracking. Eppo integrates with your existing data sources, bringing all group testing metrics into a single platform. This enables monitoring metrics over time and analyzing across user segments.
Regularly review experiment results and metrics. Dedicate time to analyze the impact of your group tests on key metrics. Identify winning variations and areas for improvement.
Make data-driven decisions based on group testing insights. Use the evidence from your experiments to inform product decisions with confidence. Implement winning variations and continue iterating based on data.
Monitor guardrail metrics closely. Keep a watchful eye on your guardrail metrics throughout the group testing process. If metrics hit concerning thresholds, pause tests and investigate potential issues.