Deltoid, Facebook’s internal AB testing and experimentation tool is arguably its most important tool allowing teams to build, test and ship products at breakneck speed. Its value proposition is simple: know how your features affect your core metrics BEFORE launching.
100s of teams are constantly building 1000s of features at any given time — and even at that scale, Deltoid would diligently track statistical movements between test and control groups, against the company’s core metrics (like DAU, MAU, user retention, engagement, time-spent, transactions, revenue), a team’s most relevant metrics (page views, shares, interactions, time-spent), and key hypothesis and guardrail metrics.
Every engineer, data scientist, product manager, designer, researcher, and even business folks have at some point stared at a Deltoid chart and made important product decisions.
Today with Pulse, we’re bringing the same power of data-driven decision-making to everyone.
Many large companies have also built their in-house experimentation platforms like this one from Spotify and this one from Uber.
Today with Pulse, we’re bringing the same power of data-driven decision-making to everyone. For every feature you’re building behind a Feature Gate, you can now see how it is performing along with how it is affecting your company’s critical metrics.
For instance, in the picture above, a new feature was opened to 10% of user traffic. The horizontal bars in the graph indicate confidence intervals. They start out gray, but once a particular metric gathers statistical significance, it turns either red or green depending on if it’s on the negative or positive side of the axis.
In a single glance you can tell whether or not a feature is ship-worthy. In this example, the new feature is negatively affecting the product_view and product_details metrics while no other metrics show a statistically clear lift; it isn’t wise to ship this as it is. As you improve the feature, your experimental metrics will begin moving to the right, and as you increase exposure, the confidence intervals will narrow down. When the metrics show a lift with tight enough confidence intervals, you’ll achieve statistical significance providing assurance that this feature is ship-worthy.
If you want to see it in action in regard to your application, grab a demo!
Find out how we scaled our data platform to handle hundreds of petabytes of data per day, and our specific solutions to the obstacles we've faced while scaling. Read More ⇾
Building a scalable experimentation platform means balancing cost, performance, and flexibility. Here’s how we designed an elastic, efficient, and powerful system. Read More ⇾
The debate between Bayesian and frequentist statistics sounds like a fundamental clash, but it's more about how we talk about uncertainty than the actual decisions we make. Read More ⇾
Here's how we optimized store cloning, cut processing time from 500ms to 2ms, and engineered FastCloneMap for blazing-fast entity updates. Read More ⇾
It's one thing to have a really great and functional product. It's another thing to have a product that feels good to use. Read More ⇾
Stratified sampling enhances A/B tests by reducing variance and improving group balance for more reliable results. Read More ⇾