Products

Solutions

Resources

Docs Pricing

Products

Solutions

Resources

Products

Solutions

Resources

Pricing

Skye Scofield

Head of Marketing, Statsig

EXPERIMENTATION

Top 8 common experimentation mistakes and how to fix them

Thu Jul 18 2024

I recently down with Allon Korem, CEO of Bell Statistics, and Tyler VanHaren, Software Engineer at Statsig, to discuss some of the most frequent mistakes companies can make in A/B testing and experimentation! I've summarized the discussion and outlined the 8 common experimentation mistakes and how to fix them.

1. Data integrity: Ensure that your allocation point is consistent and verify your distributions using chi-squared tests to detect sample ratio mismatches.

Data integrity is crucial for accurate A/B testing, but it’s often mishandled. Tyler pointed out a common mistake in the setup phase, where inconsistencies in recording user experiences lead to sample ratio mismatch (SRM). This happens when the intended 50/50 test shows a 60/40 distribution due to underreporting or technical issues.

See our blog on Sample Ratio Mismatch

2. Skepticism and Vigilance: Regularly check data integrity over different segments and time periods to identify inconsistencies early.

Allon emphasized the importance of being skeptical about data integrity. He recounted an instance where a friend's test results seemed suspicious, showing no initial difference between groups, followed by a sudden gap. This highlights the necessity of continuously monitoring data over time.

3. Proper Metrics: Collaborate with data science teams to ensure metrics are correctly defined and measured, focusing on meaningful business-driven KPIs.

Choosing and accurately measuring the right metrics is essential. Tyler mentioned issues where specific user groups, like logged-out users, skew data due to improper representation.

4. Statistical Methods: Use t-tests for means and z-tests for proportions in most cases. Ensure your statistical tests are relevant to your hypotheses.

Using improper statistical methods can lead to misleading results. Allon discussed the pitfalls of not performing statistical tests or using inappropriate tests like the Mann-Whitney U test for mean comparisons.

5. Peeking: Use sequential testing approaches to manage peeking. Tools like Statsig provide inflated confidence intervals for early data to mitigate premature conclusions.

Peeking at data during a test inflates the false positive rate. Tyler highlighted the human temptation to peek, driven by curiosity or early signs of performance changes. Mitigrating the impact of data peeking in double-bling experimentation

6. Underpowered Tests: Plan tests meticulously using power analysis calculators to ensure you have sufficient data to detect the expected changes.

Running underpowered tests is common due to insufficient sample sizes. Allon noted that improper planning often leads to tests that can't detect meaningful changes.

7. Handling Outliers: Use Windsorization to cap extreme values rather than removing outliers entirely, maintaining the integrity of your data.

Outliers can distort test results. While it's important to manage outliers to avoid false positives, Allon advised against removing them outright.

8. Cultural Challenges: Foster a culture that encourages upfront hypothesis formulation and continuous learning from experimentation.

Beyond technical issues, cultural challenges can hinder effective experimentation. Tyler stressed the importance of building a culture of hypothesis-driven testing and quick, consistent execution.

By addressing these common testing mistakes, companies can significantly improve the accuracy and reliability of their A/B tests. These steps will help you make more informed decisions and drive better business outcomes. Feel free to reach out with any questions or comments. Let's continue the conversation on how to enhance your testing strategies!

Get started now!

Get started for free. Add your whole team!

Permalink: https://www.statsig.com/blog/top-8-common-experimentation-mistakes-how-to-fix

Products

Solutions

Resources

Products

Solutions

Resources

Docs

Pricing

Back to Blog home

Skye Scofield

Top 8 common experimentation mistakes and how to fix them

Get started now!

Recent Posts

Sink, swim, or scale: What startups teach us about launching AI

Alexey Komissarouk, Yuzheng Sun, PhD

Optimizing cloud compute costs with GKE and compute classes

Pablo Beltran

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan