Date of slack thread: 7/24/24
Anonymous: Hi y’all. Can you point me to guidance on how we can ensure our Statsig experiment variations are not crawled, indexed/cached by googlebot and other search crawlers?
Tarandeep Singh (Statsig): Robots.txt or meta tags wouldn’t be an option since our variations won’t be on separate URLs.
Anonymous: Hi Tarandeep, We have been looking closer at bots and how they affect Statsig data; your feedback is helpful. I would recommend setting up a targeting gate for your experiments that filters out bots. You can do this by creating a gate called “No Bots”, add a rule that Fails known bots by browser name (see screenshot). If you need a list, our internal data says these are the top 20 self-identified bots by browser_name
. The top 4 account for 75% of traffic we see across Statsig customers. ‘Googlebot’, ‘AdsBot-Google’, ‘Applebot’, ‘FacebookBot’, ‘bingbot’, ‘PetalBot’, ‘AhrefsBot’, ‘YandexRenderResourcesBot’, ‘BitSightBot’, ‘YandexBot’, ‘Storebot’, ‘com/bot’, ‘net/bot’, ‘pingbot’, ‘adsbot’, ‘PingdomBot’, ‘SmarshBot’, ‘VirusTotalBot’, ‘UOrgTestingBot’, ‘Monsidobot’ Good luck.
Anonymous: Thanks. I will look at this.