Deltoid, Facebook’s internal AB testing and experimentation tool is arguably its most important tool allowing teams to build, test and ship products at breakneck speed. Its value proposition is simple: know how your features affect your core metrics BEFORE launching.
100s of teams are constantly building 1000s of features at any given time — and even at that scale, Deltoid would diligently track statistical movements between test and control groups, against the company’s core metrics (like DAU, MAU, user retention, engagement, time-spent, transactions, revenue), a team’s most relevant metrics (page views, shares, interactions, time-spent), and key hypothesis and guardrail metrics.
Every engineer, data scientist, product manager, designer, researcher, and even business folks have at some point stared at a Deltoid chart and made important product decisions.
Today with Pulse, we’re bringing the same power of data-driven decision-making to everyone.
Many large companies have also built their in-house experimentation platforms like this one from Spotify and this one from Uber.
Today with Pulse, we’re bringing the same power of data-driven decision-making to everyone. For every feature you’re building behind a Feature Gate, you can now see how it is performing along with how it is affecting your company’s critical metrics.
For instance, in the picture above, a new feature was opened to 10% of user traffic. The horizontal bars in the graph indicate confidence intervals. They start out gray, but once a particular metric gathers statistical significance, it turns either red or green depending on if it’s on the negative or positive side of the axis.
In a single glance you can tell whether or not a feature is ship-worthy. In this example, the new feature is negatively affecting the product_view and product_details metrics while no other metrics show a statistically clear lift; it isn’t wise to ship this as it is. As you improve the feature, your experimental metrics will begin moving to the right, and as you increase exposure, the confidence intervals will narrow down. When the metrics show a lift with tight enough confidence intervals, you’ll achieve statistical significance providing assurance that this feature is ship-worthy.
If you want to see it in action in regard to your application, grab a demo!
Detect interaction effects between concurrent A/B tests with Statsig's new feature to ensure accurate experiment results and avoid misleading metric shifts. Read More ⇾
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and what’s next! Read More ⇾
A guide to reporting A/B test results: What are common mistakes and how can you make sure to get it right? Read More ⇾
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾