In most analytics platforms, funnels are a table-stakes feature and can offer rich insight into how a productās users behave and where people drop off in their usage.
Unfortunately, funnels havenāt been heavily used or invested in by experimentation platforms.
This is due to some mix of complexity, actual limitations, and perceived limitations. We recently invested more into the funnel functionality at Statsig, and Iād like to explain whyāand make the case for funnels as a core part of the modern experimenterās toolkit.
Funnels allow you to measure complex relationships with a higher degree of clarity. For example, you see revenue flatten, but product page views are going up. You can infer conversion has gone down, but at what stage?
You can get a good guess by looking at how intermediate steps have changed, but this process is prone to data leaking, and results are usually fuzzy at best.
Funnels streamline this process and add clarity by specifying a specific order events have to take place in, per user or session, and measuring āhow far they make it into the funnel.ā You can also measure stepwise conversion, making it easy to understand exactly where users dropped out of a product.
This is true for most products, but especially for those with buyflows, subscriptions, or ādaily habitsā users should haveāthese behaviors will have a clear set of steps for users to complete, and growth teams are already usually thinking of things in terms of a āfunnelā from landing-page users to successful conversions.
When we introduced our initial funnel product to Statsig Warehouse Native, I expected it to be fairly niche tool. However, with more advanced settings added weāve seen broad adoption, and our most sophisticated users have adopted funnels as a way to explore results more deeply - and to share a more intuitive view of whatās happening inside of the data with their teammates.
In my career as a Data Scientist, Iāve seen and run experiments across gaming, social media, feed ranking, growth buyflows, media marketing, and more.
Only in growth did I see regular usage of āfunnel metrics,ā and it was often done poorly, grouping funnels by test variant in Mixpanel and eyeballing the per-unit āCIā provided in the UI.
Based on this, I was initially fairly cynical about funnels. Itās true that funnels do have some fundamental limitations:
A funnel rate in the context of an experiment can be tricky (or impossible) to extrapolate out to "topline impact" after launch.
Funnels can become fairly complex to calculate, and simple changes to how youāre treating them (count unique users? count sessions? does order matter?) can make two analyses of the same dataset quite different.
Funnels are rarely treated with statistical rigor: In my previous experience working with growth teams, the data team spent a lot of cycles trying to appropriately qualify funnel-based ship decisions the team made since low-volume, high-noise funnels were being used as decision criteria.
Iāve seen experimentation teamsābased on the valid concerns aboveātreat funnels with a fairly dismissive attitude.
This ends up with data teams relegating funnels and all of their rich complexity to the land of product analytics, and not worthy of being included in experiment readouts.
Funnels provide a super-rich and intuitive readout of āwhat happened with our users?ā Using them well is mostly an exercise in risk mitigation. My (verbose) guide for healthy funnel usage is:
Funnels should never be your primary success metric. The bottom of the funnel is what experiments should aim to move. Funnels are to be used as powerful diagnostic tools to help you understand what drove that target behavior (or what didnāt)!
Use funnels as a powerful post-hoc tool. If you donāt understand the relationship between two related observations (e.g., click notification, send message) in an experiment readout, Iād try creating a local funnel metric between the two to see if thereās an obvious drop-off.
Carefully scope funnel metrics. Pick an appropriate ātime to completeā window, whether for the full funnel or between steps. One common pitfall is an unbounded user-level funnel; over a long experiment, the success rate trends towards 1 since users get more āchancesā
Consider when a user vs. session-level funnel is appropriate. If you want to measure a userās journey to subscription, you care about the user-level data. If you care about improving your checkout flow for products, tracking this data at a session level is more powerful, measuring (successes / tries) instead of (successful users / users who tried)
Make sure your experiment tool makes it clear how a funnel is being calculated, and that settings can be standardized across the organization. Donāt calculate funnels in ad-hoc notebooks or reports unless thereās a clear standard or process for calculating funnels
Make sure your experiment tool treats funnels correctly as ratio metrics, applying the appropriate corrections to variance. Measure the relative change in conversion rigorously, vs. just comparing two conversion rates and eyeballing them.
With these steps taken, the downsides of funnels are addressed, and suddenly you have a very flexible and very powerful tool for digging deeper into experiment results.
Funnel quality varies drastically across platforms: some experimentation platforms only offer āone-step,ā unordered funnelsāwhich are basically conversion metricsāand others offer basic ordering capabilities but not much else.
Hereās what we recommend looking for:
Statistical rigor: Make sure funnel conversions have the delta method applied and have sound practices for ordinal logic.
Ordered events: For funnels to be really useful, you should be able to specify that users do events in a specific sequence over time.
Multiple-step funnels: Two-step funnels can be useful, but the ability to add intermediate steps as needed for richer understanding is critical.
Step-level and overall conversion changes: This is how you can identify where drop-offs happen.
Calculation windows: Being able to specify the maximum duration a user has to finish a funnel is critical to running longer experiments.
Session breakdowns: Being able to specify session keys or a sessionization method and count multiple funnels per user allows you to examine a much larger variety of product use casesāparticularly check-out flows, daily tasks, or other recurring flows.
Step-level conversion windows: Being able to say āStep B needs to happen within an hour of step Aā cuts down meaningfully on noise, and reduces confusion about how a funnel conversion came to be.
Time-to-convert functionality: Being able to measure if your changes made the funnel take longer or shorter to complete can help avoid buyflow bloat, or help you slim down your user journeys. The platform should also give context on success rate: Making the funnel shorter by eliminating the slow users is usually not good!
Timestamp management: Specifying if events can occur simultaneously, and if thereās allowance for clock speed logging events slightly out of order can be very important in system/performance use cases
Documentation: Funnel overview in Statsig
Guide: How to create user funnels
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and whatās next! Read More ⇾
A guide to reporting A/B test results: What are common mistakes and how can you make sure to get it right? Read More ⇾
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾
The Statsig <> Azure AI Integration is a powerful solution for configuring, measuring, and optimizing AI applications. Read More ⇾