Simpson's paradox: When aggregates mislead

Mon Jun 23 2025

Ever looked at your company's overall metrics and thought everything was going great, only to discover that half your customer segments were actually struggling? That's the kind of surprise that keeps data analysts up at night.

This sneaky phenomenon has a name: Simpson's paradox. It's when your aggregated data tells one story, but dig into the subgroups and suddenly the plot twists harder than a M. Night Shyamalan movie. Understanding this paradox isn't just academic exercise - it's the difference between making decisions based on reality versus a statistical mirage.

When data deceives: Understanding Simpson's paradox

Here's the deal: aggregated data can completely mask or even reverse the trends you see in individual groups. It's like looking at average salaries in Silicon Valley and thinking everyone's rich, when in reality you've got tech millionaires skewing the numbers while teachers and service workers struggle to make rent.

Simpson's paradox happens when a trend that's clear as day in different groups just... vanishes. Or worse, flips completely when you combine those groups. I've seen smart people make terrible decisions because they didn't know to look deeper.

The tricky part? Your brain wants to trust the big picture. We're wired to look for simple patterns, but data doesn't always play nice with our intuition. When you see overall numbers improving, it feels wrong to question success. But that's exactly when you need to be most careful.

So how do you catch this paradox in action? Start by asking yourself: what groups might be hiding in my data? Then actually look at them separately. Run subgroup analyses. Use stratification. Get comfortable with multivariate regression if you need to control for confounding factors. And honestly? Sometimes the best tool is a simple visualization that shows your data split different ways.

The Wikipedia article on Simpson's paradox points out something fascinating - this paradox challenges our basic reasoning skills because we have this innate causal logic that usually serves us well. But data aggregation can break that logic in ways that feel almost impossible until you see it happening.

Real-world scenarios where aggregates mislead

Let me share my favorite example of this paradox in action. Back in the 1970s, UC Berkeley got sued for gender discrimination in graduate admissions. The overall numbers looked damning - men were getting accepted at much higher rates than women. Case closed, right?

Wrong. When statisticians actually looked at individual departments, they found something wild: most departments were actually biased in favor of women. The paradox? Women were applying to more competitive departments with lower acceptance rates overall. The aggregate data told a completely backwards story.

This happens in medicine all the time too. You'll see a treatment that looks amazing in trials - until someone notices it only works for patients under 50. Or a drug that seems mediocre overall but turns out to be a game-changer for a specific genetic subset. The team at JAMA published a great analysis showing how heart disease treatments suffered from exactly this problem.

In business, I've watched companies celebrate overall growth while missing that their core customer base was abandoning ship. The new customers were just masking the exodus. One e-commerce client I worked with was thrilled about their 15% revenue growth - until we segmented by cohort and realized customers from two years ago had almost entirely stopped buying. They were essentially running on a treadmill, constantly needing new customers just to stay in place.

The folks at Statsig have seen this play out in A/B testing too. You run an experiment, see positive results overall, and ship the feature. Then three months later you realize it actually hurt engagement for your power users - the very people who drive most of your revenue. The aggregate win was actually a strategic loss.

The causes and implications of Simpson's paradox

At its core, Simpson's paradox usually comes down to two culprits: confounding variables and wonky sample sizes.

Confounding variables are the hidden puppet masters of your data. They're pulling strings on both what you're measuring and what you think is causing it. In that Berkeley example, department choice was the confounder - it affected both gender (women chose harder departments) and acceptance rates (harder departments accept fewer people).

Unequal sample sizes make things worse. If 90% of your data comes from one subgroup, that group basically determines your overall trend. The other 10% could be doing something completely different and you'd barely notice in the aggregate.

Here's where it gets scary: making decisions based solely on aggregated data is like navigating with a broken compass. You think you're heading north but you're actually going in circles. I've seen product teams kill features that were loved by their most valuable users because the overall metrics looked bad. I've seen medical treatments get approved that helped most people a tiny bit but seriously harmed a vulnerable minority.

The Reddit data science community has some great discussions about detecting these issues. They recommend using algorithms that automatically identify clusters and patterns in your data - basically letting the computer find the subgroups you might miss. Smart approach, but it still requires human judgment to interpret what you find.

Context is everything here. You can't just run the numbers and call it a day. You need to understand the mechanisms at play, think about what might be influencing your results, and constantly question whether your aggregated view is hiding important stories.

Strategies to avoid being misled by aggregates

So how do you protect yourself from Simpson's paradox? Here's what's worked for me:

First, make subgroup analysis your default, not your exception. Every time you look at overall metrics, immediately ask: how does this break down by user type, geography, time period, or any other relevant dimension? I like to think of it as zooming in and out on a map - you need both views to navigate properly.

Visualization is your secret weapon here. A well-designed chart can reveal paradoxes that would take hours to find in spreadsheets. Try scatter plots with different colors for subgroups, or side-by-side bar charts showing both aggregated and segmented views. When patterns jump out visually, they're hard to ignore.

For experimentation specifically, consistent design is non-negotiable. The team at Statsig emphasizes keeping your traffic allocation steady and controlling for confounders through proper randomization. It sounds basic, but I've seen too many tests ruined by mid-flight changes that created artificial subgroups.

Here's my practical framework:

  • Always define your key segments before analyzing

  • Look at both absolute numbers and rates (they can tell different stories)

  • When you find conflicting trends, dig into why - there's usually a logical explanation

  • Use blocking or stratification in your experimental design to ensure balanced subgroups

One thing to remember: segment-level insights are incredibly valuable for understanding your business, but be careful about optimizing for them. If you start making decisions based on every subgroup trend, you'll end up with a Frankenstein product that works perfectly for no one. The Statsig team has a great perspective on this - use segments to identify opportunities, but let overall metrics guide your ship.

The hardest part? Training your team to think this way. Statistical paradoxes aren't intuitive, and it takes practice to spot them. Make it a habit to question aggregate metrics in team meetings. Share examples when you catch paradoxes in the wild. Build it into your analysis templates so people can't skip the subgroup checks.

Closing thoughts

Simpson's paradox is one of those concepts that changes how you see data forever. Once you know it exists, you'll start spotting it everywhere - in news headlines, quarterly reports, even in everyday conversations about averages and trends.

The key takeaway? Never trust aggregated data at face value. Always dig deeper, look at your subgroups, and understand the full story your data is trying to tell. It's more work, sure, but it's the difference between making decisions based on reality versus statistical illusions.

Want to dive deeper? Check out the Wikipedia article on Simpson's paradox for the mathematical foundations, or explore how companies like Statsig build experimentation platforms that help avoid these pitfalls. The UC Berkeley case study is also worth a read - it's a masterclass in how paradoxes play out in the real world.

Hope you find this useful! And next time someone shows you an impressive aggregate metric, you know what question to ask: "How does this look when we break it down?"



Please select at least one blog to continue.

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy