We recently hosted a virtual meetup featuring Allon Korem, CEO of Bell, and Ronny Kohavi, a widely respected thought leader and expert in experimentation.
The virtual meetup was conducted in Hebrew, so for those who aren’t fluent in the language, we have provided a summary of their conversation below. The conversation focused on three key areas essential to effective A/B testing in organizations:
Infrastructure: Building the foundation for effective experimentation is crucial. This involves setting up the right tools, processes, and systems that enable teams to run experiments seamlessly.
Experimentation culture: Establishing an organizational culture that embraces experimentation is vital for continuous growth and improvement. This involves encouraging teams to take calculated risks, learn from outcomes, and use data to drive decisions.
Learning from failures: One of the critical aspects of successful experimentation is the ability to learn from failures. Analyzing failed experiments and understanding the root causes can provide valuable insights that lead to better decision-making and future success.
Ronny shared several insights from his experience, especially during his tenure at Microsoft:
Experience at Microsoft: He highlighted that one of the biggest factors for faster shipping was having better PMs who could effectively manage and prioritize experiments and features.
Success rate of ideas: Only 33% of ideas were successful, and overcoming the cultural challenge of accepting this low success rate was crucial. Organizations must understand that failure is an integral part of the experimentation process.
Crawl, walk, run, fly model: Organizations can be at different levels of maturity in their experimentation journey. It's essential to recognize where you are and focus on progressing rather than aiming to achieve the highest level immediately.
Balancing short- and long-term goals: Optimizing solely for revenue can be short-sighted. It is important to balance short-term wins with long-term strategic goals to ensure sustainable growth.
Ronny emphasized that not every organization needs to aim for the "Fly" level in every category. For smaller organizations, the focus should be on making consistent progress rather than reaching the highest level right away.
Large organizations may have the resources to achieve "Fly" in multiple areas, but this is not feasible for everyone.
They also addressed common misconceptions and pitfalls in experimentation:
Misinterpreting p-values: A common mistake is interpreting p-values as the probability of success. For example, a p-value of 0.01 does not mean there is a 99% chance of success. Understanding the correct interpretation of statistical data is crucial for making informed decisions.
Case study: Only 12% of 1,000 experiments succeeded, illustrating the harsh reality that many organizations must accept—a high failure rate is normal.
Building features with a high failure expectation: Given the low success rate, it's advisable to build features on a small scale first, such as for one platform (desktop, Android, or iOS), and then expand based on positive results.
The discussion highlighted that A/B testing is widely regarded as the most scientific method for experimentation in Israel, with many companies testing every feature they release. However, the focus should not only be on defensive testing (non-inferiority testing), which is used to ensure no metric is harmed beyond a threshold.
While useful in certain situations, such as code refactoring, this approach is not generally recommended for growth-driven experimentation.
Ronny discussed the issue of sample ratio mismatch (SRM), where a very low p-value (below 1/1000) might indicate unreliable results. Potential causes include bot traffic, and simply rerunning the experiment is unlikely to fix SRM issues. It's crucial to investigate the underlying causes to ensure accurate and reliable results.
Ronny encouraged organizations to use experimentation tools to supplement existing processes. Setting up effective experimentation requires a significant investment, and having the right tools can streamline the process and improve outcomes.
During the Q&A session, several important topics were discussed:
Bayesian vs. frequentist approaches: They covered the differences between these two statistical approaches and their applications in experimentation.
Multivariable vs. multivariate testing: Allon mentioned that most companies don't engage in multivariable testing. Ronny added that multivariable testing is more suitable for offline, one-time scenarios, whereas multivariate testing is better for quickly testing multiple variables in a live environment.
A/A tests as best practice: Running A/A tests is strongly recommended to validate experimentation setups and ensure the reliability of testing infrastructure.
Asymmetric allocation: This approach can provide advantages to the larger group in an experiment, optimizing for specific outcomes.
The discussion provided a comprehensive overview of best practices and insights for effective A/B testing and experimentation. By embracing a culture of experimentation, understanding the value of learning from failures, and leveraging the right tools and methodologies, organizations can optimize their decision-making processes and drive continuous growth.
Catch the full conversation in Hebrew, below:
A short list of reasons why a great experimentation tool is a horrible idea.
How we optimized Pod Disruption Budgets in Kubernetes to reduce resource waste and improve rolling updates for service deployments handling live traffic.
Statsig's AI Prompt Experiments allow you to run experiments for AI-powered products and gain real-time insights into what's working and what's not.
Master data-driven product development with Statsig. Simplify experimentation, make informed decisions, and accelerate your product's growth—all without complex coding.
Debunk the myth that you can never accept the null hypothesis and learn when you should by exploring the key differences between Fisher’s and Neyman-Pearson’s frameworks.
Use our customizable, detailed cost comparison tool and flexible pricing assumptions to find out which platform reigns supreme.