A/B Testing Playbooks for E-commerce

Anu Sharma
Wed Sep 15 2021
CONVERSION-OPTIMIZATION EXPERIMENTATION A-B-TESTING E-COMMERCE-BUSINESS

How to generate ideas for growth and experimentation

E-commerce shows the way on how to make smarter, data-driven business decisions

A/B testing helps you create the best version of a product tailored for your customers. E-commerce applications are inherently primed to implement A/B tests because these teams running these applications are already heavily metrics-driven and track conversion at every point. Yet, more e-commerce customers ask us every day, “What do we test?”

Let’s set the basic context with the most common metrics in e-commerce and then get into some playbooks on what to test.

Common Business Metrics

The most common metrics for e-commerce businesses are conversions, average order value, frequency of purchase, customer lifetime value, and customer acquisition cost.

  • Conversions span the entire customer journey, from search to cart, cart to checkout, checkout to purchase, and first purchase to repeat purchase
  • Average order value (AOV) is a function of converting customer interest to intent, say through personalized recommendations, featured deals and promotions, shipping incentives, and so on
  • Purchase frequency is a function of a positive customer experience, reinforced by historical familiarity, trust, and loyalty incentives
  • Average order value and frequency of purchase determine the customer lifetime value (CLV) that sets the high end mark for customer acquisition costs (CAC)

Primary Metrics

For experiments in e-commerce, conversion rates are often the primary metrics that determine the success or failure of an experiment. A statistically significant improvement in conversion marks the experiment as a good candidate to roll out to all users. This is because (a) conversion is actionable and sensitive to actions that a small team can test, and (b) improving conversion directionally improves output business metrics such as total gross merchandise value (GMV) that aren’t as actionable at the team level.

Guardrail Metrics

AOV and purchase frequency often serve as guardrail metrics to ensure that the team doesn’t over-index on short term conversions instead of long term customer sentiment and purchase behavior. Application performance also provides common guardrail metrics such as page load time or error and crash rates.

What to test?

Playbook 1: Test Every Update

Borrowing from Booking.com, the first approach is to validate whether every update to the application has the expected impact. This method of ‘testing every atomic change’ is so effective that Booking.com enjoys conversion rates 2–3x higher than industry average. Stuart Frisby, Director of Design at Booking.com, explains their approach:

Almost every product change is wrapped in a controlled experiment. From entire redesigns and infrastructure changes to the smallest bug fixes, these experiments allow us to develop and iterate on ideas safer and faster by helping us validate that our changes to the product have the expected impact on the user experience.
If it can be a test, test it. If we can’t test it, we probably don’t do it.

Booking.com also runs “non-inferiority tests” to identify any regressions in guardrail metrics such as error rates and customer support inquiries. For example, when they introduced the “Print Receipt” feature, they ran an A/B test to measure the impact of the new feature on Customer Support calls. The experiment suggested a 0.78% increase, less than the pre-defined threshold of 2%, marking this experiment a success.

A non-inferiority test at Booking.com

Playbook 2: Think Globally, Act Locally

The second approach is to set a top-down direction based on an essential, unchanging customer need. As Jeff Bezos said about Amazon.com, “We don’t make money when we sell things. We make money when we help customers to make purchase decisions.

“Working backwards” from an aspirational vision but staying relentless about course-correcting is a playbook that Amazon has perfected. Perhaps what makes Amazon especially unique is its ability to embrace failure as organizational learning, making the company’s unique cultural traits heavily path dependent. Bezos has explained this in some detail:

You really can’t accomplish anything important if you aren’t stubborn on vision. But you need to be flexible about the details because you gotta be experimental to accomplish anything important, and that means you’re gonna be wrong a lot. You’re gonna try something on your way to that vision, and that’s going to be the wrong thing, you’re gonna have to back up, take a course correction, and try again.
Most large organizations embrace the idea of invention, but are not willing to suffer the string of failed experiments necessary to get there.

A key aspect of this playbook is to ask what’s the smallest big step you can take to test the riskiest assumption of your vision. Ideally, this experimental step will generate measurable results that either meaningfully validate your assumption or pointedly surprise you with an insight that changes your assumption. For example, if you’re testing product pricing and assume that customers always prefer lower prices, an experiment may reveal that below a certain price range your customers begin to lose trust in your product. Not surprisingly, there is lot of room to experiment with pricing in e-commerce!

The second level of this playbook is to recognize behavioral characteristics that help users achieve their objectives. In the example below, adding a customer testimonial improved credibility with the users and increased conversion rate by 35%.

Use social proof to improve credibility and conversion

The third level of this playbook includes tactical steps to remove unwanted friction. Any action that requires the user to slow down adds a point of friction. If it doesn’t add value to the user at some point, it’s unwanted friction. In the example below, reducing input fields to only what’s necessary (and adding security certification with improved button copy) increased the revenue per order by 56%.

Remove unwanted friction

Poor presentation of information can also add unwanted friction. Here’s an example where structuring product information and highlighting a single CTA increased conversion rate by 58%.

Poor presentation results un unwanted friction

Removing unwanted friction is an ongoing, iterative effort. One of the best books that have helped me identify and address unwanted friction in e-commerce applications is Don’t Make Me Think by Steve Krug. It’s a short and delightful read!

Playbook 3: Focus on Growth

The third approach focuses on growth. For example, Pinterest’s dedicated growth team focuses on conversion, turning prospective users into active users. To improve conversion, they come up with ideas for improvements, use experiments to measure the change, and analyze results before rolling out the change to all users.

Pinterest initially set up a bottom-up approach where individual team members were tasked with coming up with new ideas but found that team members didn’t know how to come up with high quality ideas. Their recent Experiment Idea Review (EIR) process now requires team members to actively build the skills for generating high quality ideas and measures their performance based on these ideas.

For example, the EIR process requires team members to clearly outline the problem, hypothesis, opportunity size, and expected impact from their proposed experiment in a document. Team-leads review these documents ahead of a team review to spot any gaps and further flesh out these ideas. After a review, the team green lights promising proposals and adds them to a backlog. With each experiment, the growth team builds more resources and improves their skills to raise the bar for the next set of ideas.

While this is admittedly the least concrete approach, think of it as a meta-approach to build the clock that tells the time rather than simply telling the time when someone asks. Leading by example and hiring thoughtful growth leaders may be the most meaningful takeaways here.

Which approach is best for you?

What’s best for you depends on your leaders, your organizational culture, and how deeply your organization cares about incorporating data in decision making. At Statsig, we help e-commerce organizations of all sizes bootstrap their experimentation, whether it is in service of their culture, vision, or growth.

But every approach begins with running the first experiment.

Get Started Now

If you’re already using feature flags to ship software, the easiest way to run an A/B test is with no additional effort — see Statsig’s smart feature gates to kick off an A/B test within minutes.

The good news about getting started is that it automatically generates data that fuels more new ideas for growth and experimentation.

Want to chat more about your e-commerce application and find ideas to experiment in your business? Join the conversation on the Statsig Slack channel.


Try Statsig Today

Explore Statsig’s smart feature gates with built-in A/B tests, or create an account instantly and start optimizing your web and mobile applications. You can also schedule a live demo or chat with us to design a custom package for your business.

MORE POSTS

Recently published

My Summer as a Statsig Intern

RIA RAJAN

This summer I had the pleasure of joining Statsig as their first ever product design intern. This was my first college internship, and I was so excited to get some design experience. I had just finished my freshman year in college and was still working on...

Read more

Long-live the 95% Confidence Interval

TIMOTHY CHAN

The 95% confidence interval currently dominates online and scientific experimentation; it always has. Yet it’s validity and usefulness is often questioned. It’s called too conservative by some [1], and too permissive by others. It’s deemed arbitrary...

Read more

Realtime Product Observability with Apache Druid

JASON WANG

Statsig’s Journey with Druid This is the text version of the story that we shared at Druid Summit Seattle 2022. Every feature we build at Statsig serves a common goal — to help you better know about your product, and empower you to make good decisions for...

Read more

Quant vs. Qual

MARGARET-ANN SEGER

💡 How to decide between leaning on data vs. research when diagnosing and solving product problems Four heuristics I’ve found helpful when deciding between data vs. research to diagnose + solve a problem. Earth image credit of Moncast Drawing. As a PM, data...

Read more

The Importance of Default Values

TORE

Have you ever sent an email to the wrong person? Well I have. At work. From a generic support email address. To a group of our top customers. Facepalm. In March of 2018, I was working on the games team at Facebook. You may remember that month as a tumultuous...

Read more
ANNOUNCEMENT

CUPED on Statsig

CRAIG

Run experiments with more speed and accuracy We’re pleased to announce the rollout of CUPED for all our customers. Statsig will now automatically use CUPED to reduce variance and bias on experiments’ key metrics. This gives you access to a powerful experiment...

Read more

We use cookies to ensure you get the best experience on our website.

Privacy Policy