A common challenge in A/B testing arises when we want to track customers from logged-out to logged-in states. Consider, for example, testing a new variant of the account creation flow. This begins with logged-out customers, but we also want to monitor key metrics and events once they complete the flow and log in. Later on, we may even want to place them in new experiments as logged-in users. This is possible by selecting the right type on Unit ID to fit the needs of each experiment.
Proper Unit ID logging is crucial for obtaining valid results from online experiments. A Unit ID is a unique identifier for every entity in our experiment, which we use to track the group they belong to and metrics of interest. A common example is User ID, often tied to an account created by the customer. When User ID doesn’t exist or is not available for all relevant metrics, a Stable Device ID can be a good alternative. This post explains how to select the right Unit ID type for your experiment.
Comparing key metrics across test and control groups requires joining 2 different sets of logs: Exposures and events.
Exposures: Every time we run a check to determine which experience is delivered to a user (e.g.: test or control), we create a log containing:
Events: Any user actions and events that an app or website owner chooses to log: Purchase, Logout, Page Loaded, etc. Typically, these logs include:
To gain insights, we need to combine these 2 data sets and compare the event logs across the different experiment groups. This is only possible if the exposures and events contain the same type of unit ID.
Consider an experiment aimed at increasing upgrades to a premium tier of our product. Since we’re targeting existing customers, we already have a well defined User ID for each one. The logging works like this:
Now consider an experiment with a different objective: We want to test some changes to our landing page, trying to increase the number of users that create an account . We must decide which version of the landing page to show a user before they have a chance to create an account, meaning we don’t yet have a User ID at the time of the exposure check.
Instead, we create a Stable Device ID for them, which we use to log their experiment exposure. Conveniently, we can also log this same ID with any client side events from this device, such as Create Account. This now looks very similar to the example above. We can tell exactly how many devices in each group created an account after seeing the landing page.
We can go even further. If our treatment is great for increasing sign ups, but gains us less engaged customers, we want to know that. Beyond account creation, we care about their engagement during their first week: How many times did they log in? Which features do they use?
All of these logged-in actions will contain the User ID for the new account, but keep in mind our exposures are logged with Device ID only. We anticipated this and continue logging the Stable Device ID for all client events even after User ID becomes available. Since we set this up as a device-level experiment, we can produce a full suite of metrics for logged-out and logged-in events based on Stable Device ID.
That’s completely doable, as long as we decide up front which unit type to use for each experiment. In Experiment 2 above, customers that were targeted based on Stable Device ID will also have a User ID associated with their logged-in events. At that point, they could become part of Experiment 1 and produce valid metrics for our upgrade experiment based on User ID.
User ID is often the preferred Unit ID type , but these are some scenarios where device-level experiments are a better option:
Drawbacks and limitations of device-level experiments:
Explore Statsig’s smart feature gates with built-in A/B tests, or create an account instantly and start optimizing your web and mobile applications. You can also schedule a live demo or chat with us to design a custom package for your business.
This summer I had the pleasure of joining Statsig as their first ever product design intern. This was my first college internship, and I was so excited to get some design experience. I had just finished my freshman year in college and was still working on...
The 95% confidence interval currently dominates online and scientific experimentation; it always has. Yet it’s validity and usefulness is often questioned. It’s called too conservative by some , and too permissive by others. It’s deemed arbitrary...
Statsig’s Journey with Druid This is the text version of the story that we shared at Druid Summit Seattle 2022. Every feature we build at Statsig serves a common goal — to help you better know about your product, and empower you to make good decisions for...
💡 How to decide between leaning on data vs. research when diagnosing and solving product problems Four heuristics I’ve found helpful when deciding between data vs. research to diagnose + solve a problem. Earth image credit of Moncast Drawing. As a PM, data...
Have you ever sent an email to the wrong person? Well I have. At work. From a generic support email address. To a group of our top customers. Facepalm. In March of 2018, I was working on the games team at Facebook. You may remember that month as a tumultuous...
Run experiments with more speed and accuracy We’re pleased to announce the rollout of CUPED for all our customers. Statsig will now automatically use CUPED to reduce variance and bias on experiments’ key metrics. This gives you access to a powerful experiment...