When Amazon launched Home Services, the team was convinced that most people want to schedule home installations in the mornings, evenings, or weekends. This naturally constrained the number of available time slots, and stretched delivery dates months into the future. Customers weren’t pleased and happiness was far from guaranteed.
Frustrated and out of ideas, the team decided to experiment by offering the next available time slot by default (customers could still choose another slot if they wanted to). It turned out that most customers wanted the installation as soon as possible and took the first available slot. As orders clocked up, technicians completed more installations and delivered higher quality service based on customer satisfaction. Happiness began to trend upwards.
Experiments enable us to form a deeper understanding of a problem. While the desire to understand is universal, using experimentation to understand a problem is not a natural human instinct.
In intentionally seeking data that disproves one’s deepest convictions, experimentation defines the culture of the teamwhere it takes hold. From the scientific revolution in Europe to the rise of Netflix, Amazon, and Facebook, experimentation has enabled unparalleled growth. Large companies such as Amazon and Facebook haven’t become experimental after they’ve grown big. They’ve grown big because they tested their convictions every step of the way.
This mindset of testing one’s convictions, even when deeply rooted in instinct, is valuable at every stage of a company regardless of product-market fit. It’s essential before product-market fit, when we’re still learning, assimilating, and applying our understanding to push deeper into a problem. We might hit upon a great idea by sheer luck, but usually it’s because we’ve spent countless iterations trying different things.
Tristan, CEO and founder of dbt, rejects experimentation maximalism as a product creator. His experience with experimentation is 100% on the money. A/B testing sucks today. The tooling is painful and error-prone, driving teams to spend countless hours and sweat on experimentation without revealing any earth-shattering insight. A successful A/B test might reveal 1-bit of information if you’re lucky, taking you from ‘being 50/50 on which variant to being sure of one’.
What big tech companies have solved is running experiments at industrial scale. They have reduced the marginal economic and human cost of running an experiment drops to nearly zero. Amazon is huge today, but this is Jeff Bezos speaking in 2005, a week after they launched Prime, on how to maximize experimentation by reducing the cost of experimentation.
Ubiquitous, low cost experimentation enables every engineer to ask, ‘Why shouldn’t I A/B test this? Maybe I’ll learn something new.’¹. Learning how users respond (or how complex systems respond) to successive software updates compounds the engineer’s ability to detect signals and fix issues as early as possible. Without this feedback loop, mistakes and poor assumptions morph beyond recognition over time and become progressively harder to root cause.
Good experimentation infrastructure compounds learning and discovery throughout the company, arming everyone with data to make smarter decisions. Every engineer in the team aligns and owns the vision instead of blindly taking instructions via the roadmap.
While I violently agree with Tristan that A/B testing sucks today, I disagree that it is overused. If anything, the lack of good tools limits use of experimentation in most companies beyond big tech.
People often trip on the idea that big companies can enjoy the benefits of experimentation because they can simply optimize what they’ve built. I have nothing against optimization, but if people thought this way in Amazon, they’d never have created any new products or businesses.
Amazon launched Amazon Auctions in 1999, zShops later the same year, and Amazon Marketplace in late 2000, all iterations to build a marketplace. In helping Target and Marks & Spencer build on top of Amazon’s e-commerce engine, Amazon recognized the need to untangle the mess that was the e-commerce platform into well-documented APIs. “So very quietly around 2000, we became a services company with really no fanfare,” Andy Jassy described in an interview. At a summer retreat in 2003, the team progressively began to recognize what they’d become good at: running reliable, scalable, cost-effective data centers. “In retrospect it seems fairly obvious, but at the time I don’t think we had ever really internalized that,” Jassy explained the first steps that Amazon took towards web services.
No one sits in a corner to come up with breakthrough product insights. Similarly, no one relies on experimentation alone to come up with fundamentally new and great products. However, the path to invention that combines experimentation, learning, and reflection is often lost in the prevailing folklore.
Henry Ford is famously quoted as saying, “If I had asked people what they wanted, they would have said faster horses.” Yet, Ford didn’t invent cars to replace horses. He experimented relentlessly to make cars more cost-effective.
Cars had been around for decades before the Model T. In fact, Ford’s 1908 Model T was the 20th iteration over a five year period that began with the production of Model A in 1903. Ford’s vision was mass production. Introducing consistently interchangeable auto-parts was the first big step. The moving assembly line was a further ‘optimization’ to reduce the time workers spent walking around the shop floor to procure and fit components into a car. “Progress through cautious, well-founded experiments,” was Ford’s motto.
There’s no rule that live experiments are the only way to create great products. On a lucky day, our vision and planned roadmap are outcomes of prior exploration and learning. Focus is essential to solve most problems, and roadmaps help us focus further exploration in a given space. While a series of experiments may be structured to focus on a specific problem, occasionally these experiments break out a new dimension of the problem and take us back to the drawing board.
It’s great to have a vision, but it’s essential to keep your eyes open. Experimentation is a critical tool to remain open minded even as the world shifts around you.
There’s also no rule that experiments must move an existing KPI. As our understanding of the problem space evolves, we must rewrite the metrics to best reflect our current understanding of the problem.
When I was in EC2, we tracked normalized instance hours as the primary usage metric, but in EC2 Spot we tracked utilization of unused instances as the primary metric². Compared to an auction based pricing model that was designed to maximize revenue, we flipped the problem over to reduce waste. We now ran experiments to solve the problem as we defined it and described our vision in terms of metrics that we created. Today, all three cloud providers have ‘Spot’ instances.
You’re a believer in the culture of experimentation and you have appropriate tooling. What does experimentation look like in practice?
Start by measuring: Turn every new feature into an A/B test - with feature gates this should be a no brainer for any engineering team.
Build shared context: Train your team to recognize unexpected patterns and go deeper into these ‘issues’ jointly as a team.
Take calculated risks: Breakdown large roadmap projects into small, incremental experiments. Roll back effortlessly when things don’t pan out.
Invest in timely and accurate data: No engineer should need to be told that her work isn’t showing results. Pave the way for everyone in the team to arrive at the same conclusions about the product.
Iterate fast: Try out more ideas to improve the rate at which you generate better ideas.
A previous post goes into more detail on these. As a product creator, I also exchange notes with other creators on what’s working for them³. Tim, our data science lead, writes extensively about not needing large sample sets for A/B tests. My colleague Vineeth writes often about democratizing experimentation with best practices.
At Statsig, we’re doing our small bit to make experimentation simple and rewarding for every team, big or small. We focus on the infrastructure and tooling so you can focus on learning and iterating. Building a culture of experimentation is in your hands and we’re behind you every step of the way.
[1] This forward reasoning is the basis of the paper Why ask Why? by Andrew Gelman Guido Imbens: ‘We do not try to answer “Why” questions; rather, “Why” questions motivate “What if” questions that can be studied using standard statistical tools such as experiments, observational studies, and structural equation models.’
[2] If somehow EC2 got better at ordering exactly what was needed, we’d be out of business.
[3] One way to find practitioners from teams of all sizes is in the growing Statsig Experimentation Community.
Detect interaction effects between concurrent A/B tests with Statsig's new feature to ensure accurate experiment results and avoid misleading metric shifts. Read More ⇾
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and what’s next! Read More ⇾
A guide to reporting A/B test results: What are common mistakes and how can you make sure to get it right? Read More ⇾
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾