How Meta made me a big-time A/B testing advocate

Tue Sep 10 2024

Yuzheng Sun, PhD

Data Scientist, Statsig

I recorded Statsig’s first public demo over three years ago.

I wanted to show my data scientist audience how powerful Deltoid is, yet was prohibited from doing so as it’s an internal tool.

When I heard that Statsig made a public version of Deltoid, I had to showcase it.

yuzheng sun on youtube reviewing statsig

The reason I was so eager to demonstrate Deltoid is because I believe it is fundamental to Facebook’s product and culture, but is not well understood elsewhere. For example, in an earlier video, I discussed how AB testing shapes the meritocracy and competitive culture at Facebook. In a later webinar, my topic was: "There are only two types of data analysts: those who do AB testing and those who will become irrelevant." That’s how strong my conviction is.

There are three events that shaped my conviction.

Measuring our failure

In 2020, Facebook decided to focus on Shops and e-commerce. Marketplace seemed like a natural candidate for ecommerce too, as users are shopping there. The product leader was from eBay, and we hired teams from companies like Walmart.

The teams went into war rooms to build an e-commerce-first experience on Marketplace. I came from Amazon and was excited about the initiative, voluntarily working many overnights for it.

The first major launch on the main tab was well thought out and had a modern and sleek design, seamless checkout flow, and dedicated teams curating attractive and relevant selections. We also matched competitors' prices with subsidies and offered generous return policies.

Luckily, we did an A/B test for the launch as it turned out to be a disaster. The result shocked everyone, including me.

The test showed drops in just about every metric, from conversion from the product detail page to checkout, click-through rate from browsing to the product detail page, and the number of browsing impressions and daily active users.

It was clear: Whatever we did, Marketplace users hated it.

When a failure happens, there are usually two possible reasons: The idea doesn’t work, or the idea could work but the execution was bad. Naturally, leaders would question the execution—and to be fair, they should, as I was questioning our execution too.

Understanding our failure

Then, another experiment saved the team from chasing a hopeless idea and getting punished for not making it work.

The experiment was cleverly designed to provide strong evidence that the idea of e-commerce doesn’t work on Marketplace. The hypothesis was that Marketplace users specifically are looking for a deal; they are just not generally interested in e-commerce products.

In retrospect, it makes sense: Why would they buy an e-commerce product from Facebook when they could do it reliably on Amazon?—But how could we prove it to people who were determined to make e-commerce work?

The difference a white background can make

The lever we found was that most “Marketplace native” products have in-context backgrounds with their product images—like putting an item on a table—whereas most e-commerce products have plain, professional white backgrounds with their product images.

So, for the same set of products, we randomized whether users would see a white background image or an in-context image on the browsing feed. We found that the white background caused a decrease in click-through rate and conversion rate: It made users feel like they were in a traditional e-commerce experience, rather than browsing for deals.

Combined with other understanding of the product, we confidently concluded that Marketplace users are just not interested in e-commerce products and may find them off-putting because they look similar to ads on Marketplace.

two masks

The counterfactual of no A/B testing

The power of AB testing stuck with me.

Before the white background experiment, I was tortured in countless meetings that debated what ideas would work, who was to blame for bad executions, and how to execute better.

After the white background experiment, the debate was quickly settled. We refocused on enabling shipping for Marketplace local listings and saw a 26x increase in online transactions on Marketplace over the next half—10x Facebook Shops and Instagram Shops combined.

Without these two experiments, many lives would have been much worse.

Sometimes it feels like the leaders are the villains in these stories: Bad strategy, bad decisions, not understanding the business, blaming direct reports, and so on and so on.

But if I do an honest postmortem, I definitely had a change of heart halfway through the journey. It’s so hard to see the reality when I was heads-down making to make the idea work.

To me, that’s the true power of causal evidence and AB testing, because I don’t think Mark Zuckerberg himself would have convinced me more effectively.

Create a free account

You're invited to create a free Statsig account! Get started today with 2M free events. No credit card required, of course.
an enter key that says "free account"

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy