Kaplan-Meier: Visualizing A/B test retention

Mon Jun 23 2025

You know that sinking feeling when you launch an A/B test, see great initial results, and then watch retention tank a month later? Yeah, been there. The problem is that most A/B testing focuses on immediate wins - conversion rates, click-throughs, that dopamine hit of early success.

But retention? That's where the real story unfolds. And measuring it properly in A/B tests is surprisingly tricky, which is why I'm excited to share how Kaplan-Meier curves changed the game for our team.

The challenge of measuring user retention in A/B testing

User retention might be the most important metric nobody measures correctly. Sure, we all track it, but traditional A/B testing approaches often miss the full picture. They'll tell you that 70% of users came back after 7 days, but what about the users who signed up yesterday? Or last week? You're comparing apples to oranges without even realizing it.

This is where survival analysis comes in - specifically something called Kaplan-Meier curves. Don't let the fancy name scare you off. Think of it as a smarter way to track how users stick around over time, even when you're dealing with messy, incomplete data.

The beauty of Kaplan-Meier curves is they handle the biggest headache in retention analysis: censored data. That's just a technical term for users who haven't churned yet because they're still new. Instead of ignoring these users or making bad assumptions, Kaplan-Meier curves include them intelligently in your analysis.

What does this mean practically? You can finally answer questions like:

  • At what point do users in variant A start dropping off compared to variant B?

  • Is that new onboarding flow actually helping long-term retention, or just delaying the inevitable?

  • Which user segments are sticking around longest, and why?

The visual nature of these curves makes it dead simple to spot patterns. When two lines start diverging on a graph, you know exactly when and how your variants are performing differently. No more squinting at retention tables trying to figure out if a 2% difference matters.

Introducing Kaplan-Meier curves for retention analysis

Let me break down how Kaplan-Meier curves actually work without getting too mathematical. Picture a graph where the y-axis shows the percentage of users still active, and the x-axis shows time. The curve starts at 100% (everyone's active on day zero) and gradually drops as users churn.

The magic happens in how the curve calculates each drop. Instead of naive division, it adjusts for the number of users "at risk" at each time point. If half your users joined yesterday, they shouldn't count against your 30-day retention rate - they haven't had the chance to stick around that long yet.

Here's why this beats simple retention rates:

  • More accurate over time: You're not penalizing yourself for growing quickly

  • Statistical rigor: You can actually test if differences between groups are significant using log-rank tests (fancy stats speak for "is this difference real or just noise?")

  • Pinpoint problem areas: See exactly when users start dropping off, not just the final percentage

The Reddit data science community has been all over this approach lately. One post about comparing retention between two groups sparked a great discussion about how traditional methods can mislead you, especially with fast-growing products.

Think about it: if you're comparing a new feature's retention after 30 days, but half the treatment group only joined 15 days ago, your "retention rate" is artificially low. Kaplan-Meier curves fix this by only counting users when they've actually had time to churn.

Applying Kaplan-Meier curves to visualize A/B test retention

Alright, let's get practical. To build a Kaplan-Meier curve for your A/B test, you need just two things: when each user joined, and when they last showed up (or churned). That's it.

Here's the basic process:

  1. Define what "churned" means for your product (no activity for 7 days? 14? 30?)

  2. Calculate survival probability at each time point for both test groups

  3. Plot the curves and watch the story unfold

The curves tell you so much more than a single retention number ever could. When Spotify tested different onboarding flows, they discovered that one variant had better 7-day retention but worse 30-day retention. Without visualizing the full curve, they would have picked the wrong winner.

Real teams are using this for all sorts of decisions:

  • Netflix famously uses survival analysis to understand content engagement patterns

  • Gaming companies compare retention curves for different difficulty settings

  • SaaS products test pricing strategies by looking at long-term customer survival

The visual comparison is what makes this so powerful. When you see two curves slowly separating over weeks, you know you're onto something meaningful. And if they cross? That's a red flag that short-term and long-term impacts are different.

One Reddit discussion highlighted how a mobile app discovered their "winning" variant was actually just delaying churn by a week. The curves looked great initially but converged by day 30. Without the full picture, they would have shipped a feature that didn't actually improve retention.

Enhancing A/B testing with survival analysis techniques

Survival analysis doesn't just give you prettier graphs - it fundamentally improves your ability to detect real effects. The team at Airbnb shared how switching to Kaplan-Meier analysis increased their test sensitivity by 20-30% for retention metrics.

But here's a pro tip most people miss: outliers can completely mess up your retention analysis. You know those power users who check your app 50 times a day? Or that one user who signed up and immediately churned 47 times? They're skewing your results.

The fix is something called winsorization - basically capping extreme values at reasonable percentiles. Instead of throwing out outliers completely (and losing information), you just limit their impact. Set the 99th percentile as your max, and suddenly your tests become way more reliable.

At Statsig, we've seen teams combine these techniques to unlock insights they'd been missing for years:

  • A social app discovered their "failed" test actually improved retention for users who posted content

  • An e-commerce site found that free shipping improved retention, but only after adjusting for seasonal shoppers

  • A B2B tool realized their onboarding changes helped new users but hurt power users

The key is that survival analysis respects the time dimension of user behavior. Traditional A/B testing treats retention as a binary outcome, but Kaplan-Meier curves capture the full journey. When you're comparing groups, this nuance matters.

Closing thoughts

Measuring retention in A/B tests doesn't have to be a guessing game. Kaplan-Meier curves give you the full story - not just who stayed, but when and how users dropped off. Once you start visualizing retention this way, you'll never go back to simple percentages.

The best part? Getting started is easier than you think. Most modern analytics platforms (including Statsig) have built-in support for survival analysis. You don't need a statistics PhD - just the willingness to look beyond surface-level metrics.

Want to dive deeper? Check out:

  • Statsig's guide to measuring retention

  • The survival analysis threads on r/statistics for real-world examples

  • Your own retention data (seriously, try plotting it as a Kaplan-Meier curve and see what you discover)

Hope you find this useful! And next time someone shows you a "winning" test based on 7-day retention, ask them what the curves look like at day 30. You might be surprised by what you find.



Please select at least one blog to continue.

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy