Products

Solutions

Resources

Docs Pricing

Products

Solutions

Resources

Products

Solutions

Resources

Z-tests in A/B testing: When to use

Mon Jun 23 2025

You're running an A/B test. Your conversion rates look different between groups, but here's the million-dollar question: is that difference real or just random noise? That's where statistical tests come in, acting as your BS detector for experimental results.

The problem is, most people get tangled up choosing between z-tests and t-tests. It's like having two similar-looking tools in your toolkit and not being quite sure which one to grab. Let's clear that up once and for all.

Understanding hypothesis testing in A/B testing

At its core, A/B testing is about making a bet. Your null hypothesis is basically saying "nothing to see here folks" - there's no real difference between your control and treatment groups. The alternative hypothesis is where things get interesting: it claims you've actually found something worth acting on.

Here's the thing: statistical tests like t-tests and z-tests are just tools to help you figure out which bet to place. They calculate the odds that your observed difference happened by pure chance. Think of it like this - if you flip a coin 10 times and get 7 heads, is the coin rigged or did you just get lucky? That's essentially what these tests answer for your experiments.

The tricky part is picking the right test. T-tests work great when you're dealing with smaller samples (under 30 observations) or when you don't know much about your overall population. Z-tests shine when you've got tons of data and a solid understanding of your population parameters. Mix these up, and you might end up making decisions based on statistical nonsense.

Even the big tech companies wrestle with this choice. The data science community on Reddit has noted that FAANG companies often default to t-tests for their A/B testing. But here's the kicker - the "right" choice isn't about following what Google does. It's about matching the test to your specific situation.

When to use z-tests in A/B testing

Z-tests are the statistical equivalent of a sledgehammer - powerful but only right for certain jobs. They really come into their own when you're working with large datasets (think thousands of users, not dozens) and you actually know something about your population variance.

Here's when z-tests make sense:

Your sample size is hefty (30+ observations per group)
You know the population standard deviation
Your data follows a normal distribution
Each observation is independent (no user appears twice)

But let's be real - how often do you actually know your true population variance? That's like knowing exactly how all your users will behave before running the test. It happens, but it's rare. The folks discussing this on Reddit's statistics community point out that z-tests are more common in quality control manufacturing than in typical web experiments.

The independence assumption is another gotcha. If you're testing something where the same user might convert multiple times, or where users influence each other (hello, social features), your z-test results could be garbage. Always check your assumptions before trusting your p-values.

Comparing z-tests and t-tests: Choosing the right test

Let's cut through the confusion. The main difference between these tests boils down to what you know and how much data you have.

Use a z-test when:

You've got loads of data (sample size > 30)
You somehow know the population standard deviation
Your metric follows a normal distribution

Stick with a t-test when:

Working with smaller samples
Population variance is unknown (which is usually the case)
You want to play it safe

Here's a dirty little secret: for large samples, t-tests and z-tests give nearly identical results. The t-distribution approaches the normal distribution as sample size grows. So if you're testing with thousands of users, the practical difference is minimal. The statistics subreddit has a great discussion about why t-tests dominate in practice - basically, they're more flexible and you rarely go wrong using them.

The real danger isn't picking the "wrong" test between these two. It's using either test when your data violates the fundamental assumptions. Non-normal distributions, dependent observations, or comparing medians instead of means - these are the mistakes that'll really mess up your results. Tools like Statsig handle a lot of this complexity for you, but understanding the basics keeps you from misinterpreting what the numbers mean.

Practical applications and best practices for z-tests

Before you run any z-test, do a quick sanity check. Plot your data. Does it look roughly bell-shaped? If it's skewed like a hockey stick, stop right there. Z-tests assume normality, and violating this assumption is like building on quicksand.

Here's your pre-flight checklist:

Check sample size (need 30+ per group)
Verify data looks normally distributed
Confirm observations are independent
Know your population variance (or have a really good estimate)

For the actual calculation, don't be a hero - use software. Python's scipy.stats, R's built-in functions, or even Excel can handle z-tests. Hand calculations are for homework problems, not real decisions.

When reading your results, remember that statistical significance isn't the same as practical significance. A z-test might tell you that changing your button from blue to slightly darker blue produces a "significant" 0.1% lift. Sure, it's real, but is it worth the engineering effort? Always pair your p-values with effect sizes and business context.

Z-tests work particularly well for conversion rate comparisons - did more people click the button in version A or B? They're simple, fast, and usually appropriate for binary outcomes with large samples. But be careful with metrics like revenue per user. Money data is almost never normally distributed (thanks to those big spenders skewing everything), so your z-test assumptions crumble. The analytics community has been warning about this exact mistake for years.

Closing thoughts

Choosing between z-tests and t-tests doesn't have to be complicated. For most A/B testing scenarios, t-tests are your safe default - they're flexible, robust, and don't require you to know things you probably don't know. Save z-tests for those specific cases where you have massive samples and actually know your population parameters.

The bigger picture? Don't get so caught up in test selection that you forget what really matters: running clean experiments, avoiding p-hacking, and making decisions that actually move your metrics. Whether you use a z-test, t-test, or let a platform like Statsig handle the statistics for you, the key is understanding what your results actually mean for your users and your business.

Want to dive deeper? Check out any good statistics textbook for the mathematical foundations, or jump into practical experimentation guides from companies that run thousands of tests. Just remember - the best statistical test is the one that helps you make better decisions, not the one with the fanciest math.

Hope you find this useful!

Permalink: https://www.statsig.com/perspectives/ztests-abtesting-when-to-use

Products

Solutions

Resources

Products

Solutions

Resources

Docs

Pricing

Back to Perspectives home

The Statsig Team

Z-tests in A/B testing: When to use

Understanding hypothesis testing in A/B testing

When to use z-tests in A/B testing

Comparing z-tests and t-tests: Choosing the right test

Practical applications and best practices for z-tests

Closing thoughts

Recent Posts

Sink, swim, or scale: What startups teach us about launching AI

Alexey Komissarouk, Yuzheng Sun, PhD

Optimizing cloud compute costs with GKE and compute classes

Pablo Beltran

How Statsig lets you ship, measure, and optimize AI-generated code

Sid Kumar, Brock Lumbard

Your users are your best benchmark: a guide to testing and optimizing AI products

Skye Scofield

The more the merrier? The problem of multiple comparisons in A/B Testing

Allon Korem, Oryah Lancry-Dayan

Randomization: The ABC’s of A/B Testing

Allon Korem, Oryah Lancry-Dayan