Picking your battles is all about picking your metrics. What do you REALLY want to fight for?
At AWS, every Monday each team prepares a 2x2 Weekly Business Report (WBR) for leadership review. The purpose of the WBR is not to “read the news” with all the metrics. The purpose is to explain the “so what”. The “so what” narrative doesn’t just show progress towards known goals. It uncovers blockers to business growth. Often, it redefines the goals that the team should work towards, the battles that the team should fight.
Metrics are the essential vocabulary of this narrative because without metrics the narrative degrades into the highest paid person’s opinion. However, picking the right metrics to craft a constructive business narrative is tricky. Such metrics tend to have three qualities:
Metrics capture key drivers.
Metrics drive action.
Metrics frame long term objectives.
Let’s zoom in…
Creating a new industry segment requires defining new business metrics (aka as KPIs or key performance indicators). When a new company sees traction with customers, they know that their product is valuable, but they may not yet know the key drivers of customer engagement or consumption.
Choosing metrics that capture underlying drivers is critical to break down big hairy problems into smaller pieces that are easier to solve. For example, if delivery time is a key driver of strategic intent for DoorDash, it makes sense for them to track the number of hours required to make deliveries, optimized for lower durations and higher productivity as they beautifully explain here:
For our primary supply and demand measurement metric, we looked at the number of hours required to make deliveries while keeping delivery durations low and Dasher busyness high. By focusing on hours, we can account for regional variation driven by traffic conditions, batching rates, and food preparation times.
DoorDash also explains why the granularity of this metric must be at the level of a unique region and time-window:
We generally compute this metric where Dashers sign up to Dash and to time units that can span from hourly durations to day part units like lunch and dinner. It is very important to not select an aggregation level that can lead to artificial demand and supply smoothing. For example, within a day we might be oversupplied at breakfast and undersupplied at dinner. Optimizing for a full day would lead to smoothing any imbalance and generate incorrect mobilization actions.
This hours-required-for-delivery metric enables DoorDash to build mobilization actions when they estimate that availability of hours-required-for-delivery is likely to fall short of demand:
To understand how this metric would work in practice let’s consider an example. Let’s imagine that it is Sunday at dinner time in New York City, and we estimate that 1,000 Dasher hours are needed to fulfill the expected demand. We might also estimate that unless we provide extra incentives, only 800 hours will likely be provided by Dashers organically. Without mobilization actions we would be undersupplied by about 200 hours.
The part that I love most about the DoorDash post is using speed of innovation as a metric for their software quality:
One of the best ways to test if our system is maintainable is to simply check on the iteration speed with which we can push new changes and launch experiments without creating bugs or introducing regressions. At DoorDash, we perform many experiments to determine whether a feature is working as intended. This generally means that we put a lot more emphasis on measuring the software quality by how quickly we can extend and deliver on new functionality. Unsurprisingly, if experiments are challenging to launch and new features are difficult to test, we have failed in our goal.
To increase speed of innovation, your service infrastructure should be an enabler, not a constraint. If it doesn’t include the ability to run experiments at speed, it’s unlikely to be an enabler of high speed innovation.
The best metrics capture the long-term, strategic intent of the business, the fundamentals that don’t change.
For capital intensive businesses, the DuPont model optimizes for ROE (return on equity). From its early days, Facebook cited its daily or weekly active users while its early competitors tracked registered users. Byrne Hobart, writer of the The Diff, describes the problem they were solving that defined their strategic intent:
Choosing these metrics doesn’t just give managers a way to see whether or not they’re doing a good job this quarter; it’s a way to talk about what the business is for. If the long-term goal is to maximize the metrics, it says something specific about the end state the business is aiming for. MySpace and Friendster were implicitly targeting a world where everyone has a profile, but Facebook wanted a world where everyone’s life was mediated through Facebook. Picking the right metrics is a way to claim that, once the company’s position is unassailable, the problem they’re working on will be a solved one; the world needed a social network, and now it has one.
Are you redefining the laws of your business? Tell us about the metrics you care about. Or just write to firstname.lastname@example.org to nerd out on metrics!
Thanks to our support team, our customers can feel like Statsig is a part of their org and not just a software vendor. We want our customers to know that we're here for them.
Migrating experimentation platforms is a chance to cleanse tech debt, streamline workflows, define ownership, promote democratization of testing, educate teams, and more.
Calculating the right sample size means balancing the level of precision desired, the anticipated effect size, the statistical power of the experiment, and more.
The term 'recency bias' has been all over the statistics and data analysis world, stealthily skewing our interpretation of patterns and trends.
A lot has changed in the past year. New hires, new products, and a new office (or two!) GB Lee tells the tale alongside pictures and illustrations:
A deep dive into CUPED: Why it was invented, how it works, and how to use CUPED to run experiments faster and with less bias.