As a Data Scientist, my ideal data requirements will include:
the most accurate and reliable data for your feature gate rollouts and experiments.
knowing how many users have seen your new home screen design and its effects on sales.
knowing if a new code deployment is increasing crash rates and hurting your business.
Digital bots and web crawlers can make all of this more difficult. And yet we need them for the modern internet and AI to work. This led us to rollout bot filtering by default on Statsig.
Bots can skew all sorts of results in experiments and analytics by inflating exposure counts and introducing noise. While investigating this issue we found that, depending on the company, bots can be responsible for anywhere from 0% to 50% of their raw exposures. In the worst cases we found that up to 80% of a company’s unique units (e.g. Users, Sessions, Devices) in feature rollouts and experiments could be bots, although most companies see fewer.
Any experiment with high bot participation should still report metrics’ relative deltas (ΔX̄%) correctly since any bots should still be split into variants proportionally like everyone else. If bots bias a metric, we expect the bias to be equally present in both treatment and control; however, the metric’s absolute values and confidence interval will change as a result.
Imagine… your +8% ± 10.0% improvement in revenue per visitor was really +8% ± 5%. That could be the difference between a non-statsig and statsig result! What changed was the number of users not contributing to revenue and adding noise (i.e. variance) to your results.
Crawling bots tend to add users to your experiments that don’t contribute to your success metrics. They don’t buy goods, download software, or sign up for classes. This can dilute your metrics in all variant groups and degrade your experimental power.
Let’s walk through a simple example in detail to understand how bots affect experiment results.
Your company runs a home screen experiment and you want to measure average revenue per visitor. We’ll assume we know the ground truth real outcomes, and then see how our ability to detect improvements degrades when we run a simulation with and without bots.
Let’s say that real human visitors to our website make purchases 50% of the time while bots never purchase anything. When visitors do purchase something the price is normally distributed with a mean of $10 and a standard deviation of $5, with a minimum floor of $0.
We want to test a new homepage variant and now the average purchase price increases 5% to $10.50, and the standard deviation remains $5.
These example conditions make it easy to simulate what happens with and without bots. When visitors and purchase prices perfectly follow the above rules and distributions, our simulations produce the following results:
With 50% of Users as Bots | Bots Removed | |
---|---|---|
Control | Total Visits = 20k
Total Revenue = $50211 Metric Mean = 2.51 Metric Stddev = 24.91 |
Total Visits = 10k Total Revenue = $50211 Metric Mean = 5.02 Metric Variance = 37.22 |
Treatment | Total Visits = 20k Total Revenue = $52664 Metric Mean = 2.63 Metric Stddev = 26.85 |
Total Visits = 10k Total Revenue = $52664 Metric Mean = 5.26 Metric Variance = 39.84 |
Conclusion | Metric: +4.89% ± 5.62% p-value: 0.0883 |
Metric: +4.89% ± 3.43% p-value: 0.0052 |
Removing bots from experiments didn’t meaningfully change the deltas. It did, however, make real improvements to the confidence intervals, helping our example experiment go from non-statsig to statsig.
The meaningful results we saw in the previous example reflects what Statsig sees in general with Bot Filtering: the more bots and users differ in behavior (e.g bots never buy anything but your real customers do), the more bots can seriously affect the results you see. You don’t have to worry about bot filtering changing your core metric results: metric deltas will stay the same. However, removing these bots may have real benefits to your experimental power and sensitivity. This should help you move faster and make decisions with less data.
Our prior example assumed perfect statistical trends: the mean and standard deviations of the human users were perfectly fit. What happens if we run more realistic simulations where purchases are drawn randomly from distributions with random noise? We ran 10k simulations to find out.
We computed the width of the confidence interval and magnitude of p-value over 10k simulations, with and without bots:
Here we see the real improvements between the bot and no-bots simulations. When bots really behave differently from real users, we saw substantial improvements. Confidence intervals widths median decrease was 13.7%, and p-value median decrease was over 66%. All in this led to an increase in the percent of the simulations that found a statsig result from 71% to 81%.
The biggest predictor of bot traffic tends to be the environment that generates it. Bot traffic that comes to Statsig depends on the SDKs that our customers use. Some SDKs tend to be used client-side on websites that bots are freely able to crawl, while others are used on server-side.
One assumption we had when starting was that Bots would pretty much only be found in client SDK traffic given it’s accessibility to them; however, as we saw in the data, there is a large amount of bot traffic that gets processed through servers as well without getting filtered out first.
When we dug into the bot traffic by SDK type further, we saw that the source company made a large difference. When companies use server SDKs but are processing web sessions, we generally see more more bot traffic given the accessibility of bots to access web pages. When companies use server SDKs behind a login window, we see far fewer bots given the natural barriers this poses to bots. Unsurprisingly, client SDKs see the highest bot traffic; however, even here some companies have done more work to avoid logging these bots than others.
Given all these variables that go into knowing if your traffic is accessible to bots, our bot filtering will be applied to clean up analytics for all Statsig SDKs, regardless of source.
When we looked at the user-agent strings Statsig customers logged through our SDKs, we found that many of the biggest bots out there were already self-reporting in their browser_name
. “Googlebot”, “FacebookBot”, and “TwitterBot” were all naming themselves, along with more than 300 others.
We also tested a major industry package that uses IP address to identify bots, but we found that it missed the vast majority of these bots that were already naming themselves. We decided that for our initial launch of bot filtering, the more direct approach was better.
We decided to filter out bot traffic based on the browser_name
of exposure events. This simple change involves comparing browser names against an indexed list of bots that identify themselves as bots. By excluding these bots from our data pipelines, we can provide cleaner and more accurate data for analysis. We will be maintaining our list of bot names on an ongoing basis, ensuring that any new bots or bots we missed don’t start polluting your results again.
We are implementing this filtering at multiple stages in our data pipelines, ensuring that bot traffic is removed as early as possible. This not only improves data accuracy but also reduces storage and compute costs, which we can pass along to our customers.
Customers can benefit from this feature without any additional effort. Bot filtering will be applied automatically to all exposures, improving the accuracy of rollouts and their metrics. For those who prefer not to use this feature, Statsig offers an opt-out option available in the console under your project settings.
The most common response we’ve gotten when sharing this feature with customers has been Take My Money! After that, the second most common response has been to request control over which features and experiment variants bot receive. For example, you might be rolling out a new look for your home page but you don’t want search engines to index it yet in case you roll it back.
Thankfully, this is totally doable on Statsig. Using Segments, you can define a rule that identifies bots according to their browser names (just like we do). This global Segment can then be used on your features and experiments to control exposures. You can find more detailed steps for how to do this in our docs pages here.
Statsig believes in being transparent and fair with our customers. As part of this change, any bot events dropped will not contribute to your billable events. This means that customers will not be charged for bot-generated exposures, leading to potential cost savings depending on their bot traffic.
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾
Learn how the iconic t-test adapts to real-world A/B testing challenges and discover when alternatives might deliver better results for your experiments. Read More ⇾
See how we’re making support faster, smarter, and more personal for every user by automating what we can, and leveraging real, human help from our engineers. Read More ⇾
Marketing platforms offer basic A/B testing, but their analysis tools fall short. Here's how Statsig helps you bridge the gap and unlock deeper insights. Read More ⇾
When Instagram Stories rolled out, many of us were left behind, giving us a glimpse into the secrets behind Meta’s rollout strategy and tech’s feature experiments. Read More ⇾