This ensures the test is adequately powered to detect the change while minimizing statistical noise. In statistical terms, we want to detect the minimum detectable effect (MDE) with statistical confidence.
This is the preexisting or expected conversion rate of your control group. For example, if you are studying the effectiveness of a new UI, the baseline conversion rate would be the percentage of people in your audience who are expected to use the desired features without being exposed to the new UI.
This is the smallest difference in the behavior or outcome that you want to be able to consistently detect with your study. For example, if you are studying the effectiveness of a new UI, the minimum detectable effect would be the smallest increase in usage rate that you want to be able to detect.
The calculator will use these values to determine the optimal sample size for your study. Its output will be the minimum viable test size and minimum viable control size in order to consistently achieve statistically significant results. How consistent will be based on your statistical power—an advanced input we’ll discuss below:
The calculator is automatically set to optimal defaults, but you can adjust the advanced settings to see how they impact your results.
A one-sided test (recommended) is used to determine if the test variation is better than the control. On the other hand, a two-sided test is used to determine if the test variation is simply different than the control.
The default value of 0.5 sets the A/B test to split test users versus control users 50%/50%. Standard A/B test procedure is usually 50-50, but feel free to tune this to your own special circumstances.
The significance value is the probability that a statistically significant difference is detected, given no actual difference exists. This is commonly referred to as a type 1 error or a false positive.
The default value is set to 0.05, but can be set within a range of 0.01 to 0.1. Lowering this number allows for higher confidence that the difference observed isn’t due to chance, but requires larger sample sizes.
The statistical power is the probability that the minimum detectable effect will be detected, assuming it exists. The default value is set to 0.8, but can be adjusted between 0.65 and 0.95. Again, a higher value reduces the frequency of false negatives, but requires a larger sample size.