You’re tasked with deciding where to go for dinner with your friends. You whip out your phone, fire up Yelp, and scroll through nearby restaurants. Piling on some filters, you settle on an option that looks good. Congratulations, you’ve just made a good decision! Much better than our xkcd friend who continues to be paralyzed…
Contrary to classical economics, we humans don’t seek to maximize our benefit from each choice or action. Even with a lot more data in our hands now than we had 20 years ago, we still make decisions that are ‘good enough’. Herbert Simon called this satisficing.
But how do we make decisions when the stakes are higher? I wanted to see if there were any common patterns across different domains for good decision making in general, and how we could apply these patterns to make smarter business decisions. Turns out that decision making at the military frontlines is very different from court decisions and business decisions. Frontline decision makers satisfice as they work under tremendous time pressure, while judges and business leaders spend more time striving for consistent narratives that incorporate the bigger picture.
Gary Klein, a research psychologist who focuses on decision making in natural environments, reports on how experienced officers make decisions on the front line in Decision Making in Complex Naval Command and Control Environments. In the moment, these officers focus on assessing the situation, not on comparing options. They use this situational awareness to identify 95% of the actions they take; they compare multiple options in only 4% of the cases.
Decision makers look for the first workable option they can find, not the best option.
Klein’s subjects include military personnel, firefighters, and paramedics, who make decisions under pressure with little to no time to spare, where the best is literally the enemy of the good. These trained personnel don’t calculate probabilities or maximize expected utility. Their choice of a heuristic depends on the context and evolution of successful heuristics over time. More complex models crumble under pressure, but humans learn to cope and make the best of what’s acceptable in the most complex environments.
Court proceedings are the most systematic process today for decision making under ambiguity. A good lawyer will employ the IRAC model with their client: Issue, Rule, Analysis, and Conclusion. For example, she’ll structure the facts to identify and diagnose the issue, establish the relevant legal principles, analyze how the legal principles apply to the facts of the case, and finally advise what the client should do and/or how the court might conclude. Her next steps will be to communicate a credible narrative to the judge and anticipate challenges to the narrative.
Narratives are how we order our thoughts and make sense of the evidence. The court’s job is to search for the best explanation by inviting challenge to the prosecution’s narrative. Statistical or factual evidence can aid the construction of the narrative, but should never replace it.
In the State of California vs. OJ Simpson case, Simpson’s DNA was found at the crime scene, and identified with a probability of error of 1 in 9.7 billion. Given that Simpson was an ex-partner of the victim, a narrative that connected the probability of error with known priors of female victims may have helped establish the prosecution’s case beyond reasonable doubt. Instead, the defense focused on proposing a counter narrative that the ‘real killer’s’ DNA had somehow ‘vanished’ from the samples due to contamination… and won!
The value of challenging the narrative is not simply to find the best explanation for what happened in the past. It can also test the weaknesses in the plan of action for the future. While a business plan may be aided by numbers, it is best communicated and internalized as a narrative (hint: the numbers will always be wrong). The exercise of preparing the plan forces the author to communicate this narrative to customers, employees, and investors — and to continuously test the narrative for weaknesses.
Challenging narratives is non-optional in high-risk environments. When communication is filtered or when challenges are discouraged, the reference narrative begins to diverge from reality.
And I believe that’s what happened with Zillow Offers.
In Q1 2021, Zillow Offers made twice as much money as they’d anticipated but was going to significantly miss their annual target for the number of homes they wanted to buy, with just 1,856 homes coming in that quarter.
In Q2 2021, they fixed that problem but developed the opposite problem by tweaking their algorithm to acquire more homes at higher prices just as the home price appreciation in their current markets began to unroll. Their average home buying price went from $310K in Q1 to $322K in Q2 and to $354K in Q3 just as home prices were beginning to fall.
Sadly, they bought 1,856 homes in Q1, 3,800 homes in Q2, and 9,680 homes in Q3 2021, and sold 1,965, 2,086, and 3,032 homes respectively, accumulating more homes at higher prices. Their unit economics evolve materially with each quarter (see below).
Opinions abound on what went wrong. Many in the market are certain that it had to do with pricing accuracy (see below) and the lack of data science expertise. Others say that they were simply not willing to lose money. — @rabois
Opendoor, a key competitor, also points to their pricing platform as a key driver of their home coverage in their 2021 shareholder letter:
As our pricing platform continues to ingest new data to learn and improve our price accuracy, we are able to expand the breadth of price points and home types that we can address.
Rich Barton, CEO of Zillow, has reinforced this, calling out their [pricing] error rate in their Q3 results:
“Our observed error rate has been far more volatile than we ever expected possible. And makes us look far more like a leveraged housing trader than the market maker we set out to be.” - Chief Executive Rich Barton
But the most obvious problem to me was that internal challenges to the prevailing business narrative went unheard while the narrative increasingly veered off from the reality on the ground.
Analysts whose job it was to confirm the prices of homes found that they were routinely overruled, those people said, because the company had retooled the system to raise the analysts’ suggested prices. Automatic price add-ons coded into the company system, including one called the “gross pricing overlay” that could add as much as 7%, would boost offering prices to get more home sellers to say yes. Some Zillow employees complained about the pricing in company Slack channels and meetings, but their concerns went largely unaddressed or they were told that the model was working as intended, several current and former employees said.
There’s no control group for history and it’s hard to say what would’ve happened if the home price appreciation hadn’t started to unroll in Q3. But Opendoor’s results prove that the problem at Zillow wasn’t their business model, price prediction models, or the rapidly changing environment. Zillow’s problem was that its mechanisms for identifying risk and addressing challenges were broken.
There’s one more point that jumps out to me in comparing Zillow Offers vs. Opendoor. Forming the bigger business picture is crucial to building a robust narrative. In its Q3 results, Opendoor mentions a key point that their narratives aren’t built in isolation.
We have centralized the review of all our virtual home assessments, vendor sourcing, and vendor management, driving increased operational flexibility and consistency across markets. (emphasis mine)
On the other hand, Barton described how Zillow had operationally shot themselves in the foot by apparently ignoring an increasingly constrained labor environment.
This home price forecasting volatility has also contributed to significant capacity and demand planning challenges, exacerbated by a difficult labor and supply chain environment, leading to our announcement 1 two weeks ago to suspend signing new contracts through the end of this year. (emphasis mine)
Good judgement in business may be controversial, but confidence in one’s judgement does nothing to add accuracy to the judgment. On the contrary, confidence seems to wall off the challenges to the narrative and leads to a growing divergence from reality.
For example, Philip Tetlock’s research compares folks who approach uncertainty and risk with strong priors (hedgehogs) with folks who assemble often contradictory evidence to form an evolving view (foxes). Hedgehogs approach situations with an entrenched belief in themselves while foxes approach each situation with vigilance. As you might guess, foxes predict short and long term outcomes more accurately than hedgehogs.
Perhaps the mark of good business decision making is to organize action around a reference narrative while still being open to the possibility that the narrative is false and that alternative narratives may be relevant. Willingness to challenge the narrative is a key element of good decision making and being right a lot.
In practice, we can incorporate challenges to the narrative in several ways. For example, including a set of alternative options and how these might play out (outcomes) in a decision document can help decision makers quickly consider and eliminate a bunch of options. Framing the decision based on a set of tenets defined up front can help establish the desired outcomes before evaluating various options. Including data to aid the narrative is a classic way cut short debates led by strong voices.
At Statsig, we help companies of all sizes continuously challenge and refine their business narratives through experimentation. Enabling low cost experimentation helps every employee in the company make smarter, faster decisions. More importantly, each experiment contributes to the org-wide narrative that’s continuously evolving and correcting to capture the current reality of their customers and end-users.
This post is a long way of asking… how do you make high stakes decisions for your business? Leave a comment or join us on the Statsig Slack channel to share your decision making processes.
Thanks to our support team, our customers can feel like Statsig is a part of their org and not just a software vendor. We want our customers to know that we're here for them.
Migrating experimentation platforms is a chance to cleanse tech debt, streamline workflows, define ownership, promote democratization of testing, educate teams, and more.
Calculating the right sample size means balancing the level of precision desired, the anticipated effect size, the statistical power of the experiment, and more.
The term 'recency bias' has been all over the statistics and data analysis world, stealthily skewing our interpretation of patterns and trends.
A lot has changed in the past year. New hires, new products, and a new office (or two!) GB Lee tells the tale alongside pictures and illustrations:
A deep dive into CUPED: Why it was invented, how it works, and how to use CUPED to run experiments faster and with less bias.