The Causal Roundup is a biweekly review of industry-leading work in causality. From experimentation to causal inference, we share work from teams who are building the future of product decision-making. In this week’s edition, we focus on people, the most impressive force in the world.
There was some chatter about inflation, ahem… hyperinflation, this past week.
As Byrne Hobart points out, Jack isn’t some random person. He runs Square, a $120B market cap company, that processed $75 B in transactions over H1 2021. From their quarterly SEC filings, it’s clear that this year Square has seen extraordinary growth in GPV (Gross Product Volume), especially in Q2. Perhaps Q3 is trending even more in that direction?
If sales are growing at 88% YoY compared to ~25% in 2019, that’s newsworthy. But if sales are up because of discounts to clear historical inventory, that may be a different story. Square likely has data on item prices by product category though some data analysts would still argue that the more interesting the call out, the more likely it is to be wrong. In this case, certain investors and economists certainly think so…
We humans are folk Bayesian. While Jack may be privy to unique pricing data, Cathie is clearly looking at different data sets. Data tells us a different story based on what we already know. Our prior understanding of historical data and the mental models we use reward us with unique competitive advantages. The more we triangulate with multiple datasets, the more multi-dimensional our mental model becomes, especially when it comes to business decisions. If business is an arms race, data is ammunition that compounds.
The biggest challenge I’ve faced as a PM was finding and using data. At Amazon, every Monday I worked on the WBR (Weekly Business Report) for a series of EC2 products over time. I’d stare at Tableau dashboards for hours, drill into each customer’s usage, and try to squeeze an ounce of insight to explain the week’s gives-and-takes. My understanding of our day-to-day business was only as good as the data we had and the customers we spoke with. Sometimes I was right, but mostly I was wrong and things didn’t work out the way I expected. For example, we once launched a new instance type where one of our hypotheses was that it would cannibalize demand from a previous generation, higher-priced instance type. While the new instance type lowered the entry level price and did create new demand, it didn’t quite cannibalize demand from the previous generation instance type. We had changed the underlying architecture (single threaded to multi-threaded CPU) which improved price/performance with the new instance type in general, but for one instance size the price/performance remained the same between the old and new instance type¹. This limited economic incentive for workloads to shift from the previous instance type to the new instance type, which led to lower adoption for the new instance type than we’d expected. I hadn’t imagined that one instance size could make that much of a difference.
The absolute worst was when I once employed passion to make a case when I didn’t bring data to the table. I was proposing a new pricing model for EC2 Spot instances to the AWS leadership team, including then AWS CEO, Andy Jassy. Towards the end of the meeting, seeing that we hadn’t made much progress, I spoke with passion about how customers were in pain. After I’d spoken for about thirty seconds, the room fell silent. And I simply wanted to slip under the table and cry because I knew that I’d only weakened our case instead of strengthening it.
We consume data in different ways and at different points to ultimately make smarter decisions. In business, we use data to build mental models of the business drivers, generate hypotheses, validate hypotheses, and update our mental models over time. We also build applications that make decisions when serving end users with product recommendations, travel directions and traffic updates, and matches for what to watch. Here we use data to extract predictive features, train machine learning (ML) models, serve results, and improve performance.
While we explicitly try to measure model performance, quite ironically we haven’t made similar strides in helping humans make smarter decisions. Hitting the heart of the issue, this week Benn Stancil, Chief Analytics Officer at Mode, asks about a Yelp for the enterprise:
Most data is accessed through dashboards and BI tools, which are dedicated sites for exploring data that are divorced from decisions themselves… What if we embedded answers to their (product designers) questions directly in Figma? Can we expose numbers — say, the number of times a button was clicked, or the percent of paying customers who use a feature — on mocks themselves? Could designers look at the current product and see an overlay of simple interaction metrics about its elements? Beyond just helping with decisions they know they have to make, this could also help uncover user behaviors that designers had never considered. The paths people take are not always visibly worn.
In an earlier post about the modern data experience, Benn spoke about enabling everyone to do their job rather than asking them to turn into an analyst (or a statistician or a scientist). We don’t hand them data and ask them to analyze it; we incorporate it into their operating systems. At Statsig, we help engineers do their job, so it directly spoke to us. To borrow a line from Netflix’s culture deck on Context, not Control, high performance people will do better work if they understand the context. I’d even add that high performance people will leave when process and control takes over decision making in Day 2 organizations.
I absolutely love the data scientists and analysts I’ve worked with, though I want to submit that ‘analysis’ and ‘data science’ are skills not job titles. Skills unite teams in a common objective whereas job titles tend make people defensive and protective of their jobs. Everyone in the team should be able analyze data and derive the same scientific conclusions without requiring a college degree in statistics, without knowing about ‘analytics engineering’, and definitely without having to operate the infrastructure. If they’re not able to do it today, my submission is that their current tools don’t allow them to.
Tools that enable people to do more in their existing jobs (DevOps? Observability?) add more context to help them make smarter decisions. These tools empower high performance people in delivering (and stretching) their goals. They don’t draw boundaries or create ‘expertise’ that only a few ‘professionals’ can hope to acquire. If he’s listening, I want to submit a small note of disagreement with Tristan Handy’s position that data is irreducibly hard. If ‘flow-state’ is what we crave, we want to be less like lawyers and doctors, and more like builders and writers. What do you think?
Join us on the Statsig Slack channel to continue the conversation.
 This might sound odd but it was a design constraint that we couldn’t get around as we reconciled the product portfolio between different instance types.
Thanks to our support team, our customers can feel like Statsig is a part of their org and not just a software vendor. We want our customers to know that we're here for them.
Migrating experimentation platforms is a chance to cleanse tech debt, streamline workflows, define ownership, promote democratization of testing, educate teams, and more.
Calculating the right sample size means balancing the level of precision desired, the anticipated effect size, the statistical power of the experiment, and more.
The term 'recency bias' has been all over the statistics and data analysis world, stealthily skewing our interpretation of patterns and trends.
A lot has changed in the past year. New hires, new products, and a new office (or two!) GB Lee tells the tale alongside pictures and illustrations:
A deep dive into CUPED: Why it was invented, how it works, and how to use CUPED to run experiments faster and with less bias.