Platform

Resources

Docs Blog Pricing

Platform

Resources

Platform

Resources

You don't need large sample sizes to run A/B tests

Thu Dec 12 2024

Small companies’ secret advantage in experimentation

A/B testing is a well-established and powerful product development tool that has become best practice amongst big tech companies. Yet many small and medium-sized companies aren’t running A/B Tests. When asked why, they say “we aren’t Facebook/Google/Amazon, we just don’t have enough users”. Sadly, this oversimplification and misunderstanding of statistics is blocking a lot of companies from even trying the industry’s most powerful product growth tool.

Companies like Facebook, Microsoft, AirBnB, Spotify, and Netflix are the face of rapid product growth through experimentation. They run thousands of simultaneous experiments on millions of users, with microscopic wins that are worth big money (See Bing’s Shades of Blue Experiment). For the rest of us, A/B testing is an amazing but unrelatable academic exercise outside the world of startups.

What’s often ignored is that while big companies use A/B testing to optimize billion-user products like Google Search and Facebook Newsfeed, they use the same tools to build zero-to-one products, starting with as little as a thousand users and growing that into millions. I spent 5 years as a Data Scientist at Facebook and oversaw A/B tests on a wide range of features and products, helping launch and grow many new products, each starting with tests on mere “thousands” of users. A/B testing is a pillar of data-driven product development irrespective of the product’s size. And despite what you may hear, small products actually have a huge statistical advantage that’s never discussed.

A/B testing is a pillar of data-driven product development irrespective of the product’s size

Startups’ secret weapon in A/B testing

Startup companies have a major advantage in experimentation: huge potential and upside. Startups don’t play the micro-optimization game; they don’t care about a +0.2% increase in click-through rates. Instead, they hunt for big wins like a 40% increase in feature adoption, or a 15% increase in signup rates. Startups have a lot of low-hanging fruit and huge opportunities. The statistical term is “effect size” and it’s actually more important than sample size in determining an experiment’s statistical power.

standard ab test with 5- baseline conversion rate example

Data generated for a standard A/B test over 7 days on a 5% baseline conversion rate metric using a one-sided t-test (5% significance level and 80% power).

The above chart shows experiments with equal statistical power. Your success will vary depending on your specific experiment but it’s clear you do not need millions of users to measure meaningful metric lifts as small as 5%. This doesn’t even include the multitude of tricks you can use to boost statistical power (Bayesian stats, CUPED variance reduction, or extending the duration of your test).

Bayesian A/B test calculator

Statsig's Bayesian calculator is a quick way to determine the chances that a test variant beats a control variant.

Bayes Me!

Google vs a startup: Who has more statistical power?

Google routinely runs large search tests on a massive scale, for example 100,000,000 users where a+0.1% effect on topline metrics is a big win. Meanwhile a typical startup with just 10,000 users may be hunting for a +15% win. Which experiment has more statistical power? You may think with 10,000 times less users, the startup has no chance. But the Z-score equation which measures statistical significance has a square root on sample size: 10,000 times less users is only 100 times less statistical power. Meanwhile the startup is looking for a 150x bigger effect, resulting in a net effect of 50% more statistical power! Contrary to popular belief, startups typically have a better chance of running a successful A/B test.

Z-scores determine whether experimental results are statistically-significant.

Big effects are seldom obvious

Small companies chasing big changes may be tempted to take a shortcut and skip out on A/B testing to save time. After all, low-hanging fruit seems obvious and we should be able to accurately measure top-line effects on our dashboards right? This is a mistake for many reasons:

Changes are unpredictable in direction and magnitude. A/B testing has an interesting effect of humbling product builders and producing unexpected surprises. At Microsoft, roughly 1/3rd of all experiments had negative results. Product improvements on early-stage products tend to be high risk and high reward, and it’s critical you have a rigorous way to measure and evaluate each of them rather than have them killed by highly-paid executive. One of our customers with less than 100 daily active users had ignored straightforward improvements thinking they were just +10% wins, but when tested they were revealed to be 300% wins +/- 100%.
Ecosystem effects are complex. While topline impacts might be anticipated by the experiment’s hypothesis, what’s far less obvious is secondary and ecosystem effects. One of our clients with ~1k DAU launched a new user badging experiment, a classic strategy for mobile apps. While this did indeed increase discoverability of a couple of new features what blew their mind was the dozens of unexpected side-effects showing +50% engagement wins on indirectly related features. The company immediately learned a lot about their users and came up with dozens of additional ideas.
Dashboards are inferior causality measurement tools. We’ve all stared at dashboard movements attempting to “read the tea leaves” trying to identify what caused what. This is universal and while you may think launching a big feature will show up immediately on a dashboard, in practice, the root cause of any movement is far less certain. Was that boost due to our marketing campaign? A competitor’s announcement? Is seasonality masking something? Product builders are excellent at coming up with plausible explanations for why metrics may have moved, but would be fools to bet their house on them.

A/B testing is the gold standard when it comes to measuring causality and bringing evidence and numbers to the front. While it’s true it takes time to collect and measure the results, the value teams receive in knowing (and not hoping for) the exact impact and improvements is critical for making your product’s successful.

Implementing A/B testing early in a company has a way of anchoring and establishing a data-driven culture. People focus less on debates, and more on evaluating and interpreting data. Egos become marginalized, and ideas become free, not squashed.

Experimentation for small businesses

Statsig is the leading experimentation platform for small-to-medium businesses that are new to experimentation.

Tell me more!

a computer with a speech bubble in front of it

Advanced approaches to increase statistical power and enrich results

For folks pushing the limits of experimentation, there are a variety of advanced approaches to experimentation that can overcome traditional limitations of AB testing.

CUPED (Controlled Experiment Using Pre-Experiment Data) leverages historical metrics to reduce noise and refine your estimates.
Stratified Sampling ensures balanced and reliable comparisons across small, skewed user segments.
Differential Impact can reveal directional and qualitative findings, providing a little more color to the results of your experiment.
Session replays, user surveys, custom queries, and product analytics can yield a more holistic picture than just experiments alone. Combining some of these tools with experimentation can reveal not just what happened, but understand why.

A/B Testing is more accessible than ever

While A/B Testing used to require highly specialized expertise and dedicated teams, it is far more accessible now than it was a decade ago. Feature gating/flagging/toggling is now a standard tool in modern software development, allowing you to control rollouts of features/tests. There are blogs and books devoted to A/B testing, and online communities of analytics-minded folks. While it's near impossible to cover what’s out there, Trustworthy Online Controlled Experiments is a good place to start.

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.

Grab a Demo

Permalink: https://www.statsig.com/blog/you-dont-need-large-sample-sizes-ab-tests

Platform

Resources

Platform

Resources

Docs

Blog

Pricing

Back to Blog home

Tim Chan

You don't need large sample sizes to run A/B tests

Small companies’ secret advantage in experimentation

Startups’ secret weapon in A/B testing

Bayesian A/B test calculator

Google vs a startup: Who has more statistical power?

Big effects are seldom obvious

Experimentation for small businesses

Advanced approaches to increase statistical power and enrich results

A/B Testing is more accessible than ever

Request a demo

Recent Posts

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan

You can have it all: Parallel testing with A/B tests

Allon Korem, Oryah Lancry-Dayan

Speeding up A/B tests with discipline

Yuzheng Sun, PhD

Move forward: The A/B testing mindset guide

Israel Ben Baruch

Experimentation and AI: 4 trends we’re seeing

Skye Scofield, Sid Kumar

From SEVs to self-serve: How we GitOps’d our infra with Pulumi & Argo CD

Tyrone Wong, Karan Luthra