Pick your metrics, pick your battles

Anu Sharma
Wed Aug 11 2021
METRICS METRICS-AND-ANALYTICS

Pick your metrics , pick your battles

Build the world that matters to you

Picking your battles is all about picking your metrics. What do you REALLY want to fight for?

At AWS, every Monday each team prepares a 2x2 Weekly Business Report (WBR) for leadership review. The purpose of the WBR is not to “read the news” with all the metrics. The purpose is to explain the “so what”. The “so what” narrative doesn’t just show progress towards known goals. It uncovers blockers to business growth. Often, it redefines the goals that the team should work towards, the battles that the team should fight.

Metrics are the essential vocabulary of this narrative because without metrics the narrative degrades into the highest paid person’s opinion. However, picking the right metrics to craft a constructive business narrative is tricky. Such metrics tend to have three qualities:

  1. Metrics capture key drivers.
  2. Metrics drive action.
  3. Metrics frame long term objectives.

Let’s zoom in…

Metrics capture key drivers

Creating a new industry segment requires defining new business metrics (aka as KPIs or key performance indicators). When a new company sees traction with customers, they know that their product is valuable, but they may not yet know the key drivers of customer engagement or consumption.

Choosing metrics that capture underlying drivers is critical to break down big hairy problems into smaller pieces that are easier to solve. For example, if delivery time is a key driver of strategic intent for DoorDash, it makes sense for them to track the number of hours required to make deliveries, optimized for lower durations and higher productivity as they beautifully explain here:

For our primary supply and demand measurement metric, we looked at the number of hours required to make deliveries while keeping delivery durations low and Dasher busyness high. By focusing on hours, we can account for regional variation driven by traffic conditions, batching rates, and food preparation times.

DoorDash also explains why the granularity of this metric must be at the level of a unique region and time-window:

We generally compute this metric where Dashers sign up to Dash and to time units that can span from hourly durations to day part units like lunch and dinner. It is very important to not select an aggregation level that can lead to artificial demand and supply smoothing. For example, within a day we might be oversupplied at breakfast and undersupplied at dinner. Optimizing for a full day would lead to smoothing any imbalance and generate incorrect mobilization actions.

Metrics drive action

This hours-required-for-delivery metric enables DoorDash to build mobilization actions when they estimate that availability of hours-required-for-delivery is likely to fall short of demand:

To understand how this metric would work in practice let’s consider an example. Let’s imagine that it is Sunday at dinner time in New York City, and we estimate that 1,000 Dasher hours are needed to fulfill the expected demand. We might also estimate that unless we provide extra incentives, only 800 hours will likely be provided by Dashers organically. Without mobilization actions we would be undersupplied by about 200 hours.

The part that I love most about the DoorDash post is using speed of innovation as a metric for their software quality:

One of the best ways to test if our system is maintainable is to simply check on the iteration speed with which we can push new changes and launch experiments without creating bugs or introducing regressions. At DoorDash, we perform many experiments to determine whether a feature is working as intended. This generally means that we put a lot more emphasis on measuring the software quality by how quickly we can extend and deliver on new functionality. Unsurprisingly, if experiments are challenging to launch and new features are difficult to test, we have failed in our goal.

To increase speed of innovation, your service infrastructure should be an enabler, not a constraint. If it doesn’t include the ability to run experiments at speed, it’s unlikely to be an enabler of high speed innovation.

Metrics frame long term objectives

The best metrics capture the long term, strategic intent of the business, the fundamentals that don’t change.

For capital intensive businesses, the DuPont model optimizes for ROE (return on equity). From its early days, Facebook cited its daily or weekly active users while its early competitors tracked registered users. Byrne Hobart, writer of the The Diff, describes the problem they were solving that defined their strategic intent:

Choosing these metrics doesn’t just give managers a way to see whether or not they’re doing a good job this quarter; it’s a way to talk about what the business is for. If the long-term goal is to maximize the metrics, it says something specific about the end state the business is aiming for. MySpace and Friendster were implicitly targeting a world where everyone has a profile, but Facebook wanted a world where everyone’s life was mediated through Facebook. Picking the right metrics is a way to claim that, once the company’s position is unassailable, the problem they’re working on will be a solved one; the world needed a social network, and now it has one.

Are you redefining the laws of your business? Tell us about the metrics you care about. Or just write to hello@statsig.com to nerd out on metrics!


Try Statsig Today

Explore Statsig’s smart feature gates with built-in A/B tests, or create an account instantly and start optimizing your web and mobile applications. You can also schedule a live demo or chat with us to design a custom package for your business.

MORE POSTS

Recently published

My Summer as a Statsig Intern

RIA RAJAN

This summer I had the pleasure of joining Statsig as their first ever product design intern. This was my first college internship, and I was so excited to get some design experience. I had just finished my freshman year in college and was still working on...

Read more

Long-live the 95% Confidence Interval

TIMOTHY CHAN

The 95% confidence interval currently dominates online and scientific experimentation; it always has. Yet it’s validity and usefulness is often questioned. It’s called too conservative by some [1], and too permissive by others. It’s deemed arbitrary...

Read more

Realtime Product Observability with Apache Druid

JASON WANG

Statsig’s Journey with Druid This is the text version of the story that we shared at Druid Summit Seattle 2022. Every feature we build at Statsig serves a common goal — to help you better know about your product, and empower you to make good decisions for...

Read more

Quant vs. Qual

MARGARET-ANN SEGER

💡 How to decide between leaning on data vs. research when diagnosing and solving product problems Four heuristics I’ve found helpful when deciding between data vs. research to diagnose + solve a problem. Earth image credit of Moncast Drawing. As a PM, data...

Read more

The Importance of Default Values

TORE

Have you ever sent an email to the wrong person? Well I have. At work. From a generic support email address. To a group of our top customers. Facepalm. In March of 2018, I was working on the games team at Facebook. You may remember that month as a tumultuous...

Read more
ANNOUNCEMENT

CUPED on Statsig

CRAIG

Run experiments with more speed and accuracy We’re pleased to announce the rollout of CUPED for all our customers. Statsig will now automatically use CUPED to reduce variance and bias on experiments’ key metrics. This gives you access to a powerful experiment...

Read more

We use cookies to ensure you get the best experience on our website.

Privacy Policy