Power is the probability that a test correctly rejects a false null hypothesis - i.e., ensuring an A/B tests is sensitive enough to detect a true effect when there is one. To calculate the sample we need for a certain power, we need several inputs - including baseline conversion rate, minimum detectable effect, A/B split ratio, significance and power.
This is the current conversion rate of your control group. In an A/B test, the baseline conversion is the expected rate of conversion (or other desireable outcome) in the control group, or those not being exposed to a new experience.
This is the smallest difference that can be consistently detected in an experiment. In an A/B test, this is the minimum change in desireable outcome you’d want to be able to detect.
The output sample size of the calculator will be the minimum viable amount to consistently achieve statistically significant results, based on the power level that you choose. Choosing a higher power means a lower frequency of false negatives, but will also require a commensurate number more samples.
The calculator is automatically set to optimal defaults, but you can adjust the advanced settings to see how they impact your results.
If you are looking to determine if a single test variation is better than a control, use a one-sided test (recommended). If you want to determine if its different from the control, then use a two-sided test.
Most A/B tests are conducted with a 50%/50% split across test and control users (represented as an input of 0.5 in this calculator), but this can be tuned to your own experimental design.
Alpha is the probability that a statistically significant difference is detected when one does not exist. 0.05 is a common default for Alpha, but you can choose a higher or lower value to adjust the probability that an observed difference isn’t due to chance, but requires a larger sample size.
As shared above, statistical power is the probability that the minimum detectable effect will be detected, assuming it exists. If you’d like to calculate a minimum detectable effect or A/B test duration automatically based on your data each time you run a test, sign up for Statsig!