A more general definition of retention looks like this:
\[ \text{Retention} = \frac{\text{Users who did action A during } T_0 \text{ and did action B during } T_1}{\text{Users who did action A during } T_0} \]
Where A and B can be any action, and T0 and T1 can be any time period. The most common use of retention metrics that you’re familiar with, when A and B are the same action over different time periods T0 and T1, is just a special case of this more generalized definition.
📖 Related reading: The clear definition of retention, and why yours might be ambiguous
“Feature retention” can describe any case when action A or B is the use of a particular feature instead of the use of the product as a whole.
This gives us more specificity in measuring what parts of your product are most meaningful to driving users to return or churn from your platform and what experiences, in particular, are habitual and durable.
Since retention is so flexible as a concept, creating a particular retention metric involves a lot of context-specific decisions to get a meaningful representation of user behavior.
Measuring retention with action A being an interaction with a specific part of the product can be a powerful way to look specifically at the set of users who engaged with that part of the product. This means you can understand with more granularity what parts of your product are driving users to return or churn.
However, when we choose rarer events for A, we are reducing our sample size in order to get a narrower and more specific set of users that we’re measuring this retention for. The smaller our sample size, the noisier a retention metric based on that set of users will be.
Using a specific feature’s usage as action B can be helpful in understanding the ongoing usage of a particular part of your product. This can highlight parts of the product that users find most useful and worth returning to repeatedly.
However, there are also cases where a feature is not designed to be a habitual surface that a user will return to repeatedly.
In any introductory elements of your product—like a sign-up flow, new user experience (NUX), or tutorial—having high feature-level retention may actually indicate that a user is confused and unable to use other parts of your product effectively.
Similarly, if a settings menu is designed effectively, users should be able to once or rarely tweak settings to suit their preferences and not be a surface that they regularly frequent.
These may be cases where we want to use a less specific return event B to capture returning to the product after these kinds of experiences, but seeing repeated use of the feature is not indicative of value.
Implementing feature retention tracking involves a clear understanding of what user engagement looks like for a given feature in order to define a more or less granular view of retention patterns.
Intuitively, if I see a movie in a movie theater once a week, you’d probably say that I go to the movies a lot. If I check the time on my watch once a week, you’d probably say that I check the time really infrequently. I engage in these activities with the same regularity, which is why context is so important when you are defining active engagement.
For a given feature, is it reasonable for a user to be active if they use it daily? Weekly? Monthly? You might answer this question very differently for different features, as well as your product in general.
In our retention definition, these considerations shape how we define T0 and T1. When sparser activity is expected, it may be reasonable for T0 and T1 to be longer durations.
Once you’ve decided what “active” means, you’ll also want to determine how frequently you want to measure this. These decisions frequently hinge on the tension between specificity and noisiness when confronted with seasonality.
Seasonality occurs when there is a change in behavior at certain time intervals. For example, since Statsig is a product that folks use for work, we typically see much more weekday usage than weekend usage. Holidays are a time when we also see less usage of Statsig.
You’re making a judgment call about whether this seasonality is meaningful to measure or should be abstracted away. For example, day-of-week seasonality can be aggregated away by setting T0 and T1 to be 7 days in duration and measuring retention on a weekly cadence.
However, it won’t always be appropriate to abstract away seasonality by using larger granularity of time measures, especially if a seasonal effect is distinct from other time frames in the context of a business’s strategy (like a holiday season for a consumer product) or the expected activity time scale of a user is smaller than the time scale of the seasonality.
In these cases, it can be helpful to choose T0 and T1 based on an active user's expected usage patterns but compare the retention of users who would be experiencing the same seasonal effects.
For example, a fitness app may see a large cohort of users with New Year’s resolutions activity in early January with distinct behavior from users active at other times. An appropriate retention metric might still be centered around an expectation of daily use, but comparing a retention metric or metrics to last year’s users active in January may be more apt than a comparison than comparing to users active in December.
Instead of abstracting away seasonality, we’ve chosen a comparison point of a year ago with a similar seasonal effect.
In Statsig, you can use Metrics Explorer to create feature retention dashboards for monitoring.
You’re able to select any event for your Start Event and Return Event (A and B, respectively, in our retention definition). You’re also able to select any unit ID, not just looking at the user level. While you don’t have full freedom to pick T0, T1, and data point frequency, you’re able to select whether you want to see a daily or weekly scale of retention.
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾
Learn how the iconic t-test adapts to real-world A/B testing challenges and discover when alternatives might deliver better results for your experiments. Read More ⇾
See how we’re making support faster, smarter, and more personal for every user by automating what we can, and leveraging real, human help from our engineers. Read More ⇾
Marketing platforms offer basic A/B testing, but their analysis tools fall short. Here's how Statsig helps you bridge the gap and unlock deeper insights. Read More ⇾
When Instagram Stories rolled out, many of us were left behind, giving us a glimpse into the secrets behind Meta’s rollout strategy and tech’s feature experiments. Read More ⇾