Deltoid, Facebook’s internal AB testing and experimentation tool is arguably its most important tool allowing teams to build, test and ship products at breakneck speed. Its value proposition is simple: know how your features affect your core metrics BEFORE launching.
100s of teams are constantly building 1000s of features at any given time — and even at that scale, Deltoid would diligently track statistical movements between test and control groups, against the company’s core metrics (like DAU, MAU, user retention, engagement, time-spent, transactions, revenue), a team’s most relevant metrics (page views, shares, interactions, time-spent), and key hypothesis and guardrail metrics.
Every engineer, data scientist, product manager, designer, researcher, and even business folks have at some point stared at a Deltoid chart and made important product decisions.
Today with Pulse, we’re bringing the same power of data-driven decision-making to everyone.
Many large companies have also built their in-house experimentation platforms like this one from Spotify and this one from Uber.
Today with Pulse, we’re bringing the same power of data-driven decision-making to everyone. For every feature you’re building behind a Feature Gate, you can now see how it is performing along with how it is affecting your company’s critical metrics.
For instance, in the picture above, a new feature was opened to 10% of user traffic. The horizontal bars in the graph indicate confidence intervals. They start out gray, but once a particular metric gathers statistical significance, it turns either red or green depending on if it’s on the negative or positive side of the axis.
In a single glance you can tell whether or not a feature is ship-worthy. In this example, the new feature is negatively affecting the product_view and product_details metrics while no other metrics show a statistically clear lift; it isn’t wise to ship this as it is. As you improve the feature, your experimental metrics will begin moving to the right, and as you increase exposure, the confidence intervals will narrow down. When the metrics show a lift with tight enough confidence intervals, you’ll achieve statistical significance providing assurance that this feature is ship-worthy.
If you want to see it in action in regard to your application, grab a demo!
Thanks to our support team, our customers can feel like Statsig is a part of their org and not just a software vendor. We want our customers to know that we're here for them.
Migrating experimentation platforms is a chance to cleanse tech debt, streamline workflows, define ownership, promote democratization of testing, educate teams, and more.
Calculating the right sample size means balancing the level of precision desired, the anticipated effect size, the statistical power of the experiment, and more.
The term 'recency bias' has been all over the statistics and data analysis world, stealthily skewing our interpretation of patterns and trends.
A lot has changed in the past year. New hires, new products, and a new office (or two!) GB Lee tells the tale alongside pictures and illustrations:
A deep dive into CUPED: Why it was invented, how it works, and how to use CUPED to run experiments faster and with less bias.