I wanted to show my data scientist audience how powerful Deltoid is, yet was prohibited from doing so as it’s an internal tool.
When I heard that Statsig made a public version of Deltoid, I had to showcase it.
The reason I was so eager to demonstrate Deltoid is because I believe it is fundamental to Facebook’s product and culture, but is not well understood elsewhere. For example, in an earlier video, I discussed how AB testing shapes the meritocracy and competitive culture at Facebook. In a later webinar, my topic was: "There are only two types of data analysts: those who do AB testing and those who will become irrelevant." That’s how strong my conviction is.
There are three events that shaped my conviction.
In 2020, Facebook decided to focus on Shops and e-commerce. Marketplace seemed like a natural candidate for ecommerce too, as users are shopping there. The product leader was from eBay, and we hired teams from companies like Walmart.
The teams went into war rooms to build an e-commerce-first experience on Marketplace. I came from Amazon and was excited about the initiative, voluntarily working many overnights for it.
The first major launch on the main tab was well thought out and had a modern and sleek design, seamless checkout flow, and dedicated teams curating attractive and relevant selections. We also matched competitors' prices with subsidies and offered generous return policies.
Luckily, we did an A/B test for the launch as it turned out to be a disaster. The result shocked everyone, including me.
The test showed drops in just about every metric, from conversion from the product detail page to checkout, click-through rate from browsing to the product detail page, and the number of browsing impressions and daily active users.
It was clear: Whatever we did, Marketplace users hated it.
When a failure happens, there are usually two possible reasons: The idea doesn’t work, or the idea could work but the execution was bad. Naturally, leaders would question the execution—and to be fair, they should, as I was questioning our execution too.
Then, another experiment saved the team from chasing a hopeless idea and getting punished for not making it work.
The experiment was cleverly designed to provide strong evidence that the idea of e-commerce doesn’t work on Marketplace. The hypothesis was that Marketplace users specifically are looking for a deal; they are just not generally interested in e-commerce products.
In retrospect, it makes sense: Why would they buy an e-commerce product from Facebook when they could do it reliably on Amazon?—But how could we prove it to people who were determined to make e-commerce work?
The lever we found was that most “Marketplace native” products have in-context backgrounds with their product images—like putting an item on a table—whereas most e-commerce products have plain, professional white backgrounds with their product images.
So, for the same set of products, we randomized whether users would see a white background image or an in-context image on the browsing feed. We found that the white background caused a decrease in click-through rate and conversion rate: It made users feel like they were in a traditional e-commerce experience, rather than browsing for deals.
Combined with other understanding of the product, we confidently concluded that Marketplace users are just not interested in e-commerce products and may find them off-putting because they look similar to ads on Marketplace.
The power of AB testing stuck with me.
Before the white background experiment, I was tortured in countless meetings that debated what ideas would work, who was to blame for bad executions, and how to execute better.
After the white background experiment, the debate was quickly settled. We refocused on enabling shipping for Marketplace local listings and saw a 26x increase in online transactions on Marketplace over the next half—10x Facebook Shops and Instagram Shops combined.
Without these two experiments, many lives would have been much worse.
Sometimes it feels like the leaders are the villains in these stories: Bad strategy, bad decisions, not understanding the business, blaming direct reports, and so on and so on.
But if I do an honest postmortem, I definitely had a change of heart halfway through the journey. It’s so hard to see the reality when I was heads-down making to make the idea work.
To me, that’s the true power of causal evidence and AB testing, because I don’t think Mark Zuckerberg himself would have convinced me more effectively.
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾
The Statsig <> Azure AI Integration is a powerful solution for configuring, measuring, and optimizing AI applications. Read More ⇾
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾