Platform

Resources

Docs Blog Pricing

Platform

Resources

Platform

Resources

Pricing

Lior Barak

Software Engineer, Statsig

EXPERIMENTATION

I want it that way: Building experimentation infrastructure and culture with Ronny Kohavi

Thu Sep 12 2024

You can always learn something from people with lots of experience.

We recently hosted a virtual meetup featuring Allon Korem, CEO of Bell, and Ronny Kohavi, a widely respected thought leader and expert in experimentation.

The virtual meetup was conducted in Hebrew, so for those who aren’t fluent in the language, we have provided a summary of their conversation below. The conversation focused on three key areas essential to effective A/B testing in organizations:

Infrastructure: Building the foundation for effective experimentation is crucial. This involves setting up the right tools, processes, and systems that enable teams to run experiments seamlessly.
Experimentation culture: Establishing an organizational culture that embraces experimentation is vital for continuous growth and improvement. This involves encouraging teams to take calculated risks, learn from outcomes, and use data to drive decisions.
Learning from failures: One of the critical aspects of successful experimentation is the ability to learn from failures. Analyzing failed experiments and understanding the root causes can provide valuable insights that lead to better decision-making and future success.

Key insights from Ronny Kohavi

Ronny shared several insights from his experience, especially during his tenure at Microsoft:

Experience at Microsoft: He highlighted that one of the biggest factors for faster shipping was having better PMs who could effectively manage and prioritize experiments and features.
Success rate of ideas: Only 33% of ideas were successful, and overcoming the cultural challenge of accepting this low success rate was crucial. Organizations must understand that failure is an integral part of the experimentation process.
Crawl, walk, run, fly model: Organizations can be at different levels of maturity in their experimentation journey. It's essential to recognize where you are and focus on progressing rather than aiming to achieve the highest level immediately.
Balancing short- and long-term goals: Optimizing solely for revenue can be short-sighted. It is important to balance short-term wins with long-term strategic goals to ensure sustainable growth.

Should every organization aim for "fly" in every category?

Ronny emphasized that not every organization needs to aim for the "Fly" level in every category. For smaller organizations, the focus should be on making consistent progress rather than reaching the highest level right away.

Large organizations may have the resources to achieve "Fly" in multiple areas, but this is not feasible for everyone.

Discussion on failures

They also addressed common misconceptions and pitfalls in experimentation:

Misinterpreting p-values: A common mistake is interpreting p-values as the probability of success. For example, a p-value of 0.01 does not mean there is a 99% chance of success. Understanding the correct interpretation of statistical data is crucial for making informed decisions.
Case study: Only 12% of 1,000 experiments succeeded, illustrating the harsh reality that many organizations must accept—a high failure rate is normal.
Building features with a high failure expectation: Given the low success rate, it's advisable to build features on a small scale first, such as for one platform (desktop, Android, or iOS), and then expand based on positive results.

Experimentation culture in Israel

The discussion highlighted that A/B testing is widely regarded as the most scientific method for experimentation in Israel, with many companies testing every feature they release. However, the focus should not only be on defensive testing (non-inferiority testing), which is used to ensure no metric is harmed beyond a threshold.

While useful in certain situations, such as code refactoring, this approach is not generally recommended for growth-driven experimentation.

Sample ratio mismatch (SRM)

Ronny discussed the issue of sample ratio mismatch (SRM), where a very low p-value (below 1/1000) might indicate unreliable results. Potential causes include bot traffic, and simply rerunning the experiment is unlikely to fix SRM issues. It's crucial to investigate the underlying causes to ensure accurate and reliable results.

Encouragement for experimentation tools

Ronny encouraged organizations to use experimentation tools to supplement existing processes. Setting up effective experimentation requires a significant investment, and having the right tools can streamline the process and improve outcomes.

Q&A highlights

During the Q&A session, several important topics were discussed:

Bayesian vs. frequentist approaches: They covered the differences between these two statistical approaches and their applications in experimentation.
Multivariable vs. multivariate testing: Allon mentioned that most companies don't engage in multivariable testing. Ronny added that multivariable testing is more suitable for offline, one-time scenarios, whereas multivariate testing is better for quickly testing multiple variables in a live environment.
A/A tests as best practice: Running A/A tests is strongly recommended to validate experimentation setups and ensure the reliability of testing infrastructure.
Asymmetric allocation: This approach can provide advantages to the larger group in an experiment, optimizing for specific outcomes.

The discussion provided a comprehensive overview of best practices and insights for effective A/B testing and experimentation. By embracing a culture of experimentation, understanding the value of learning from failures, and leveraging the right tools and methodologies, organizations can optimize their decision-making processes and drive continuous growth.

Catch the full conversation in Hebrew, below:

Get started now!

Get started for free. Add your whole team!

Permalink: https://www.statsig.com/blog/building-experimentation-infrastructure-and-culture-ronny-kohavi

Platform

Resources

Platform

Resources

Docs

Blog

Pricing

Back to Blog home

Morgan Scalzo

Lior Barak

I want it that way: Building experimentation infrastructure and culture with Ronny Kohavi

You can always learn something from people with lots of experience.

Key insights from Ronny Kohavi

Should every organization aim for "fly" in every category?

Discussion on failures

Experimentation culture in Israel

Sample ratio mismatch (SRM)

Encouragement for experimentation tools

Q&A highlights

Get started now!

Recent Posts

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan

Speeding up A/B tests with discipline

Yuzheng Sun, PhD

You can have it all: Parallel testing with A/B tests

Allon Korem, Oryah Lancry-Dayan