Marketplace challenges in A/B testing and how to address them

Wed Mar 26 2025

A/B tests in marketplaces can be tricky because different groups—buyers, sellers, and the platform itself—constantly affect one another.

When you test a new feature, you can’t ignore how these groups overlap or how supply and demand might shift in unexpected ways. Traditional per-user randomization often creates “cross-group contamination,” where one side of the marketplace in treatment interacts with another side in control.

Network effects make things more complicated, since a small change on one side can ripple through the entire system. Even a simple fee change might push demand or supply in a way that “steals” traffic from the control group, inflating or deflating your metrics.

Despite these hurdles, randomized tests remain the gold standard for measuring causal impact. The key is to keep the principles of randomization intact while dealing with the realities of a multi-sided marketplace. Below are several practical approaches that tackle typical marketplace pitfalls, along with notes on identity resolution and choosing the right testing method.

Methods

1. Cluster-based randomization

Rather than randomizing individual users, you can randomize whole clusters — cities, categories, or geographic regions. That way, everyone in a single region or category sees the same variant. This setup limits the chance that a buyer on treatment bumps into a seller on control (or vice versa).

Why it works

By grouping people who are naturally connected, you cut down on cross-group contamination and better isolate the impact of your test. For example, if you’re trying out a new shipping policy, you can apply it in one state while leaving a similar region as control.

Trade-offs

Clusters can differ in important ways (like economic trends or population size), so you need enough clusters to balance out those differences. Also, you typically need a larger sample size to get clear, reliable results because there’s less variation within any single cluster.

2. Switchback testing

Switchback testing involves turning a single unit—like a city or an entire platform—into treatment or control at different times. You might run the treatment on one day, then switch back to control the next day, and keep alternating.

How it helps

When a marketplace runs only one variant at a time, there’s no risk that users see both versions at once. This cuts down on cross-group contamination. It also helps you capture cyclical patterns (for instance, weekend vs. weekday behavior) because each version experiences a mix of days and times.

Example

A food-delivery app might test a new surge pricing algorithm on Monday, then switch back to the old algorithm on Tuesday, and so on. Over a couple of weeks, each version gets exposed to multiple weekdays and weekends. This helps confirm whether the new pricing actually cuts wait times without hurting order volume or driver availability.

Practical tips

Randomize which variant runs during which block of time, and run the test long enough to see a range of conditions — peak hours, slow hours, weekends, and holidays. Pay attention to any “carryover” effects that might spill into the next time block.

3. Phased Rollouts

A phased rollout gradually increases the share of users or clusters that see a new feature (e.g., 1% to 10% to 50%), always using random assignment at each step. Phased rollouts are common outside of marketplaces, too — pretty much any large-scale product can do them. But when you’re dealing with a multi-sided marketplace, there are a few extra things to watch.

Marketplace twist

A phased approach is especially helpful if your feature might disrupt the balance between buyers and sellers. You can start small, see if there’s a ripple effect — like sellers dropping out or buyers flooding in — then push forward if things stay under control. This stepwise approach helps you spot negative signals early before you expose the whole market.

Key tip

Give each phase enough time to stabilize, since supply and demand can take a while to adjust. If you keep reassigning users too often, you won’t get a clean read on which changes are due to the feature versus normal market fluctuations.

4. Weighted randomization (multi-armed bandits)

Multi-armed bandit algorithms start by splitting traffic fairly evenly across multiple variants, then send more traffic to the “winners” over time. This reduces the negative impact of a potentially bad variant by ramping up the promising ones quickly.

Warning for marketplaces

In a marketplace with strong network effects, bandits can cause problems if one variant starts pulling in users at the expense of the other. The real-time reallocation might not give you a stable picture of what happens once supply and demand settle.

If bandits aren’t used carefully, you can end up with skewed results or inadvertently harm one side of the market by over-optimizing for short-term metrics. For experiments where network effects matter a lot, consider sticking to more traditional A/B testing methods (clusters or switchbacks) that keep the traffic split stable.

Managing identity resolution and entity properties

In many marketplaces, a single person can act as both a buyer and a seller (or a driver and a rider, etc.). You might have a user who’s a buyer in one context but also lists items to sell. This dual role complicates randomization because you want to ensure that individuals see consistent versions when they’re operating under either identity—or, if you need to track them differently, that you have a clear strategy for doing so.

Practical suggestions

  • Define clear identities: Decide up front how you’ll identify each user in both roles. You might use a primary ID with an “entity property” that distinguishes buyer actions from seller actions.

  • Ensure consistent assignment: If you want a single user to see the same variant as both a buyer and a seller, factor that into your randomization logic.

  • Separate roles when needed: In some tests, it might make sense to treat the same person’s buyer behavior and seller behavior differently — but do it in a way that avoids cross-contamination if those two roles might interact.

Why these methods work

All these techniques stick to the principle of randomization but tweak how you assign users or when you run variants. By doing this, you keep different test conditions separate enough to avoid contamination, and you factor in the unique cycles or geographic differences that shape marketplace behavior.

Whether it’s clustering everyone in a region, switching back and forth over different days, or rolling out a new feature in stages, these methods help you see how supply and demand settle into a new equilibrium without mixing in too many confounding factors.


References and further reading

  • DoorDash Engineering Blog. (2020). “Exploring Switchback Experiments to Mitigate Network Spillovers.”

  • Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.

  • eBay Tech Blog. (2019). “Managing Search Ranking Experiments in a Two-Sided Marketplace.”

  • Uber Engineering Blog. (2021). “Designing City-Level A/B Tests in Multi-Sided Platforms.”

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.
request a demo cta image

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy