For B2B experiments on small sample sizes (or tests where a tail-end of power users drive a large portion of an overall metric value), randomization alone doesn't cut it. Your Control and Test groups may not be well balanced if your whales end up in either group.
This new Statsig feature meaningfully reduces false positive rates and makes your results more consistent and trustworthy. It tries a 100 different randomization salts and then compares the split between groups based on a metric or classification you provide to find the best balance. In our simulations, we see around a 50% decrease in the variance of reported results.
Read more about using the feature here, or learn more about how it works here. This is now rolling out on both Statsig Cloud and Warehouse Native on Pro and Enterprise tiers.