A/B testing platforms have become highly efficient, automating much of the mundane work involved in setting up, calculating, and reporting on experiments.
This shift has had a two-sided effect. On one hand, it has freed data scientists from repetitive tasks, but on the other hand, it has left many of us asking: What should we focus on now to provide more value and advance our careers?
That’s why we’ve invited Ronny Kohavi, one of the most respected experts in experimentation, to host a webinar on why he believes data science jobs have become even more interesting with the rise of A/B testing platforms.
While the full video is available below, I’d like to share some key takeaways from the event in this blog.
What’s the current state of experimentation automation since the release of the Trustworthy Online Controlled Experiments (the "Hippo" book)?
How does this automation impact the role of data scientists?
What should data scientists focus on moving forward?
How can we secure leadership buy-in for experimentation efforts?
If these topics interest you, feel free to check out his deck. I won’t repeat his full answers here; instead, I’ll summarize the top three takeaways and share a few noteworthy excerpts and their context.
The most obvious benefit of automation is that it frees data scientists from low-value, repetitive tasks. With A/B testing largely automated, they can focus on higher-level responsibilities, like developing business insights and strategies.
The indirect benefit, however, is even more impactful. Once an organization adopts an experimentation culture, teams are humbled by real data and become more open to accepting evidence that disproves their assumptions.
Data scientists often complain that leaders only value data that supports their preconceived notions, but the true power of data lies in changing decisions.
Building this kind of data-driven culture requires running more experiments.
Ronny shared a revealing experience from his time at Amazon and Microsoft, which he also shared in the Acquired Minisode.
At Amazon, he discovered that over 50% of experiments "failed"—meaning they didn’t produce the desired outcome. When he joined Microsoft, he noticed there was a strong belief that all ideas would succeed. When asked why, the response was, "We have better PMs."
Today, most people running large-scale experiments know that about 80% of hypotheses won’t succeed. However, this reality is tough for people who aren’t directly involved in experimentation, as they prefer to live in a bubble where their ideas are always right. Breaking through this mindset is key to implementing scalable experimentation.
When Ronny published Trustworthy Online Controlled Experiments in 2020, the level of automation available today didn’t exist.
Now, platforms like Statsig offer comprehensive solutions, covering everything from assignment and metrics calculation to advanced features like sequential testing, Sample Ratio Mismatch (SRM) monitoring, CUPED, and differential impact detection.
In the past, companies needed to build these tools in-house for reliable and sophisticated testing. Today, platforms like Statsig are often more powerful and cost-effective than most internal solutions, making experimentation automation more accessible than ever.
"Most experiments fail to improve the metrics they were designed to improve."
Kohavi highlighted a counterintuitive reality: the majority of experiments don't yield the expected positive results.
Rather than viewing this as a setback, he emphasized that this failure rate is a natural part of innovation. It reflects the complexity of predicting user behavior and underscores the importance of experimentation as a tool for learning and discovery.
"If the people at the top don't respect the data, things are gonna be tough."
Addressing organizational culture, Kohavi pointed out the critical need for leadership to embrace data-driven decision-making. He referenced the Semmelweis reflex—the tendency to reject new information that contradicts established beliefs.
Without executive buy-in, even the most robust data can fail to drive meaningful change.
"Design for multiple paths. Evaluate them, learn to fail fast on some of them, learn to double down on the others."
Advocating for agility, Kohavi suggested moving away from rigid long-term plans. Instead, organizations should explore various avenues, rapidly test ideas, and focus resources on those that show real promise. This approach accelerates innovation while also minimizing the risks associated with untested assumptions.
"One of the more interesting tasks for data scientists is to translate the results—the numbers and statistics—into a story."
As automation handles more routine tasks, the role of data scientists is evolving.
Kohavi stressed the importance of storytelling in making data accessible and actionable for stakeholders. By crafting narratives around data, scientists can better influence decisions and drive strategic initiatives.
"An engineer that improves server performance by ten milliseconds more than pays for his or her fully loaded annual costs."
Illustrating the tangible impact of small improvements, Kohavi shared this powerful example. It highlights how even minor enhancements in performance can lead to significant cost savings, highlighting the value of continuous optimization and experimentation.
The bottom line is that experimentation automation is great for data scientists, and we should embrace it.
If you want to get started, here is the slide deck that speaks to how to design a scalable experimentation system and culture within your organization.
Connect with me on LinkedIn and follow me on YouTube. Let’s fight this together!
Standard deviation and variance are essential for understanding data spread, evaluating probabilities, and making informed decisions. Read More ⇾
We’ve expanded our SRM debugging capabilities to allow customers to define custom user dimensions for analysis. Read More ⇾
Detect interaction effects between concurrent A/B tests with Statsig's new feature to ensure accurate experiment results and avoid misleading metric shifts. Read More ⇾
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and what’s next! Read More ⇾
A guide to reporting A/B test results: What are common mistakes and how can you make sure to get it right? Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾