Platform

Resources

Docs Blog Pricing

Platform

Resources

Platform

Resources

5 features to 10x experiment velocity

Thu Sep 15 2022

Big tech techniques running experiments at industrial scale

Many companies want to 10x their experimentation velocity. Here are 5 techniques from sophisticated experimenters that help you do this—

Feature Rollouts: auto-measure new feature impact with an a/b test
Parameters: remove experiment variants in code to iterate faster
Layers: remove hardcoded experiment references from code
CUPED: use statistical techniques to get results faster
Holdouts: measure cumulative impact/progress without grunt work

jeff bezos quote double your experiments to double your inventivenes

1. Feature Rollouts automatically turned into A/B tests

Most tooling is painful and error-prone. This makes teams spend countless hours and sweat on experimentation—limiting what gets tested. Companies that do this… understand the value of experimenting, but get a fraction of the value they should.

an iceberg diagram symbolizing product building

Many of the largest and most successful tech companies have figured out how to run experiments at an industrial scale. They make it easy for individual teams to measure the impact of each new feature on the company or organization's KPIs. This superpower brings data into the decision-making process, preventing endless debates and meetings.

Modern products ship new features behind a feature gate so they can control who sees features.

When there’s a partial rollout to a set of equivalent users, that is enough for Statsig to turn that into an A/B test. In this example, Statsig compares metrics for users Passing (10% rollout, Test) with those failing (90% not yet rolled out — Control).

Rollout to 10% of users in the US and Canada

The image below shows an example of a Pulse Report that shows a lift in metrics between Control and Test.

a statsig pulse report demonstrating a lift in metrics

Statsig "Pulse" metrics showing lift between Control and Test

Using Statsig feature gates to rollout new features removes the cognitive load of turning every rollout into an experiment—while still giving you observability into the impact the rollout has.

2. Parameters, not experiment variants in code

The legacy way to implement experiments is to have a bunch of if-then-else blocks in your code to handle each variant.

A more agile way to implement an experiment is to simply retrieve the button color from the experiment in Statsig.

You can now restart the experiment with a new set of colors to test, without touching shipped code. You can even increase the number of variants—test three colors instead of two—just by changing the config in Statsig.

When you’re working with mobile apps, the difference between the two approaches is night and day. You can rerun experiments even on older app versions without waiting for new code with a new if-then-else statement to hit the app stores. No more waiting for users to upgrade to the latest version of the app before you start to collect data!

The best in-house, next-gen experimentation systems use similar approaches. Read how Uber does something similar to unlock agility with their experimentation (Architecture section)

3. Layers

Experiment Parameters help you move faster. When you want to move even faster, hard-coded Experiment names become a bottleneck. What if you could ship another experiment without updating your code?

Layers enable this. Layers are typically used to run mutually exclusive experiments. They are also used to remove direct references to experiment names in code.

In the example below, elements on the app’s home screen are set up as parameters on the “Home Screen” layer—button_color, button_text and button_icon. The app simply retrieves parameters from this layer, without any awareness of experiments on the home screen.

If there are no experiments active in the layer, the default layer parameters apply. In the example below, there are three experiments active — with users split between them (mutual isolation). These experiments can control all or a subset of the layer parameters.

You can complete old experiments and start new experiments without touching the client app at all.

4. CUPED

Controlled-experiment Using Pre-Experiment Data is a technique to reduce variance and bias in results. Think of it as noise-reduction - we look at noise in metrics before the experiment started to reduce noise in results.

Looking across hundreds of customers—it reduces the sample sizes and durations for over half the key metrics measured in experiments. Learn more about our CUPED implementation. There are other statistical techniques including winsorization (limiting outlier values) that are also applied, but they typically don’t have as big an impact.

5. Automatic Holdouts

Team or product-level holdouts are powerful tools to measure the cumulative impact of features and experiments you’ve shipped over a holdout period (often ~6 months). You can tease apart the impact of external factors (e.g. your competitor going out of business) and seasonality (atypical events including holidays, unusual news cycles or weather) from the impact driven by your feature launches. You can also measure long-term effects and quantify subtle ecosystem changes.

Mature product teams use long-term holdouts. These can be expensive for engineers to set up—everyone creating a feature or an experiment needs to be aware of and respect this holdout.

On Statsig — creating a global holdout automatically applies them to new features gates and experiments. People creating them don’t have to do any manual work to check the Holdout.

Learn more about how the feature works and best practices around managing holdouts.

Questions?

This isn’t an exhaustive list. e.g. 6. Want to run hundreds of multi-armed bandits where you trust the system to pick a winner based on an optimization function? There’s Autotune. e.g. 7. Want to look at key metrics in near real-time? There’s Event Explorer. 8. Want to spin a quick new metric the same day, for a new feature you’re building? We’ve got you. 9. Reuse the data team-approved canonical metrics for your company from your warehouse? You can do that. 10. Want feature teams to self serve slicing data by OS, Country, Free vs Paid or another dimension you choose so they’re not blocked behind a data team crafting manual queries? Yes.

There are many more of these…

We created Statsig to close the experimentation gap between sophisticated experimenters and others. Feel free to reach out to talk about other ideas that accelerate experimentation!

Thanks to Tore

Permalink: https://www.statsig.com/blog/features-to-10x-experiment-velocity

Platform

Resources

Platform

Resources

Docs

Blog

Pricing

Back to Blog home

Vineeth Madhusudanan

5 features to 10x experiment velocity

Big tech techniques running experiments at industrial scale

1. Feature Rollouts automatically turned into A/B tests

2. Parameters, not experiment variants in code

3. Layers

4. CUPED

5. Automatic Holdouts

Questions?

Recent Posts

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan

Speeding up A/B tests with discipline

Yuzheng Sun, PhD

You can have it all: Parallel testing with A/B tests

Allon Korem, Oryah Lancry-Dayan