Statsig ID Resolution: Unlock the power of new user experiments

Thu Oct 19 2023

Craig Sexauer

Data Scientist, Statsig

The following describes a feature available to Statsig Warehouse Native users.

Businesses live or die based on their ability to get new users to start using, and keep using their product.

Whether this comes from organic traffic, word of mouth, or marketing campaigns, being able to understand how well acquisition is going is of the utmost importance for any business.

Many platforms offer solutions for measuring conversion, but conversion paints a very incomplete picture. At the end of the day, businesses care about metrics that are tied to a logged-in ID, such as revenue, LTV predictions, or retention. Unfortunately, tying these metrics to logged-out IDs for experiment analysis is often manual, inconsistent, and time-consuming.

We’re pleased to announce a new feature in Statsig Warehouse Native which allows you to handle this case without any extra code or analysis.

You can simply configure a secondary ID for your experiments, and you can analyze experiments across both ID types—even making metrics like funnels that start on a logged-out event like a page visit, and finish on a logged-in event like completing a trial and purchasing a subscription.

The challenge of dual identifiers

When users engage with a platform, they often do so in two capacities:

  1. Logged-out: new users will usually be logged out when they land on your website or buy-flow; they’re usually tagged with a generic identifier, such as a cookie or a Statsig StableID.

  2. Logged-in: Once a user signs up, they receive a specific userID. This ID is usually used for metrics like revenue, retention, and product engagement

For businesses, key metrics are usually calculated based on the Logged-in ID. Therefore, to run an impactful experiment, experimenters often need to analyze using the logged-out identifier but evaluate based on associated logged-in metrics.

Historically, bridging this gap has been a manual and tedious process, often involving ad-hoc queries, joins, deduplications, and expensive pipelines.

Introducing ID Resolution

Statsig's ID Resolution is a robust solution to the aforementioned problem. It offers a simple, centralized, and consistent approach to connecting identifiers across the boundary of user states. If your company runs signup experiments, this feature can make measuring your impact both easy and trustworthy.

How does Statsig ID Resolution work?

Simplicity at its core

Setting up identity resolution is straightforward in Statsig.

When setting up an assignment source, input a column for each ID type. Your 'Primary ID' (typically the logged-out identifier) should always be present. The secondary ID (your logged-in userID) can be null at times, especially for users who haven't signed up. In such cases, Statsig intelligently back-attributes any identified secondary ID to records with the same Primary ID.

Analysis power unleashed

Upon setting up an experiment, you can optionally select a Secondary ID. This allows the integration of metrics from both ID types into your analysis. Behind the scenes, Statsig handles all the heavy lifting:

  • It ensures a 1:1 mapping by deduplicating records with multiple mappings.

  • Metrics with the primary ID are joined based on that ID.

  • Metrics with only the secondary ID are integrated based on that Secondary ID.

Furthermore, the feature supports Metric Sources seamlessly, allowing users to establish funnels or ratio metrics spanning the two ID types.

Why ID Resolution matters: Bridging the user journey

Understanding the entire user journey, from the initial entry point to a completed action like purchasing a subscription, is essential for businesses to optimize user experiences and drive long-term value. Here's why ID Resolution is a game-changer in this respect:

Avoid common pitfalls

Even with an approach for joining IDs in place, it’s common for different teams to use their own ad-hoc methodologies, or—without knowing—fall into numerical traps and report erroneous results. Some common issues we’ve seen:

  • Measuring impact per Logged-in User

    • Often, businesses will tag Logged-in data with experiment assignments from Logged-out experiments. This makes it easy for stakeholders to quickly group results by experiment arms and get an estimate of impact.

    • However, this often leads to incorrect interpretations. For example, consider an experiment that drops signup rate by 20%. The signups that do still sign up will tend to be higher quality - they might have 10% higher average revenue.

    • This hypothetical experiment is a loser: Overall, it drops revenue by 12%, but the naive group by would say it lifted revenue by 10%

  • Duplicate mappings

    • It’s pretty common that multiple Logged-out IDs will be associated with the same signup - imagine a user visiting from both web and mobile, and then signing up. Conversely - though it’s rarer - the same Logged-out visitor might generate multiple signups

    • When this happens, it can lead to inflations in metric results, or the same Logged-in user’s metrics being attributed to multiple experiment arms

    • Generally this is handled by dropping duplicates or using some kind of deduplication Strategy like first-touch attribution. This will vary by teams

Differences in how these issues are solved, or people not knowing that these issues exist, lead to inconsistent results and erroneous reporting of results.

Create a free account

You're invited to create a free Statsig account! Get started today with 2M free events. No credit card required, of course.
an enter key that says "free account"

Going beyond surface metrics

Many marketing platforms offer surface-level analysis on conversion and CTR. With the ID Resolution feature, it’s just as easy to understand the whole picture.

  • Impact on All Exposed Users: Using ID stitching lets you get downstream user metrics at a per-visitor grain. This lets you directly measure the long-term value of each lead, and how your experiment impacts that value

  • Understanding Impact Across IDs: For experiments like offering discounts to new signups, you can use ID Resolution to measure both the overall impact on “Revenue” as well as the downstream “Revenue Per Signup." This means you can understand both the overall impact, but also the change of the typical behavior in your logged-in population

By bridging the logged-out and logged-in experiences, ID Resolution provides a fuller picture of user behavior. This allows you to make more informed decisions, optimize your marketing strategies, and ensure that you’re not just driving conversions, but also cultivating high-value, long-term customers.

Visualizing funnels:

Consider a user's path through the following stages:

  1. Landing Page View: A user lands on your website as a logged-out visitor

  2. Exploration: They browse content, view products, or go through a buy-flow

  3. Account Creation/Signup: Some users will decide to create an account and log in

  4. Trial: After signup, users are prompted to start a free trial

  5. Subscription: Some portion of users will choose to pay, whereas some will not

Without ID Resolution, a major gap exists between the logged-out experience (stages 1 & 2) and the logged-in experience (stages 3, 4, & 5). This makes it challenging to understand how the early stages of interaction influence final outcomes, such as purchases.

With ID resolution, you can easily define a funnel metric that goes across these ID types and get insights on tradeoffs in product design - for example, increasing the length of your buyflow may lead to less account creations, but higher subscription rates.

Manage data risk

Another advantage of using Statsig for ID resolution is having consistency and risk management built into your platform.

  • Deduplication Health Checks: Statsig will monitor your deduplication rate, alerting you if it detects unusually high rates. While some duplication is expected (for instance, due to users hopping between devices), it's essential to keep an eye on this metric.

  • Balanced Deduplication Across Experiments: A chi-squared test ensures that the deduplication rate remains consistent across different arms of an experiment. This is crucial as some experiments may naturally cause a higher return rate of users, leading to more duplicates in a particular segment.

  • Consistent Methodology: By using Statsig for ID resolution, you know that the methodology will be the same across all of your experiments, and experiment results will be apples to apples.

Using this tool, your approach to these experiments will be systematic, consistent, and robust—you no longer need to be concerned that someone is running a logged-out user experiment, hurting conversion and topline revenue, and then shipping the feature because the per-signup revenue is positive.

Wrapping Up

Statsig's ID Resolution is a game-changer for our customers who run new-user experiments. By simplifying the process of bridging the identifier gap and adding a layer of data quality checks, we’re aiming to simplify and democratize the analysis of the new user experience.

Join the Slack community

Connect with our data scientists and engineers, and ask questions or just hang out with other cool folks that believe in an experimentation culture!
join slack community cta image


Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy