Last month, we hosted a virtual meetup featuring our Data Scientist, Craig Sexauer, and special guest Jared Bauman, Engineering Manager, Machine Learning, from Whatnot, to discuss warehouse native experimentation. You can watch the recording of the full webinar here:
Over the last few years, we've observed that as more companies have begun to centralize their data into a cloud data warehouse like Snowflake or Databricks, there has been an increasing demand for warehouse-native experimentation.
This approach allows you to run the stats engine atop your data warehouse, enabling you to maintain a single source of truth dataset for your business metrics. Typically, you'll use the vendor's SDK for assignment and execute statistical calculations for experiments in your warehouse, utilizing your existing metrics and compute power.
Fully-hosted cloud platforms, on the other hand, are event-driven, meaning metrics are derived by logging events. This may lead to inconsistencies between the experimentation platform and the warehouse's source of truth data. However, this approach might be preferred for engineering-led experimentation, where the analysis requirements are real-time and focused on feature rollouts.
At Statsig, we offer two solutions: a fully hosted (Cloud) solution and a run-in-your-warehouse version, Statsig Warehouse Native. Read this blog on warehouse-native versus cloud-hosted experimentation platforms to learn how to decide which one is right for your needs.
The Warehouse Native approach ends up being the preferred choice for organizations with established data ecosystems dealing with complex business use cases and need:
A single source of truth for data
Flexibility around cost/complexity tradeoffs for measurement
Flexibly re-analyze experiment results
Easy access to results for experiment meta-analysis
However, we've observed that warehouse-native experimentation tends to be less effective when organizations lack centralized KPIs in a warehouse, in which case additional effort is needed to derive value from Warehouse Native. In such scenarios, having a solution that can log events of interest can assist in tracking metrics and developing a metrics library.
Warehouse Native experimentation offers flexibility, agility, and trust, helping organizations gain deeper insights while reducing experiment cycle times. Furthermore, at Statsig, we have full-time teams with the bandwidth to solve statistical problems and the platform itself comes with a range of out-of-the-box features for assignments, sample ratio mismatch (SRM) checks etc.
Whatnot has been scaling experimentation with Statsig over the past couple of years. Jared Bauman, Engineering Manager who works on machine learning and experimentation and previously spearheaded experimentation and LLM initiatives at DoorDash, summarized his team's thinking for pivoting to the warehouse native approach:
Trust & transparency: Whatnot was able to get more buy-in from teams to trust the results. Jared explained how experiment results weren't a black box because you could point to the SQL query to check for correctness if concerned.
Agility: Jared shared how not having to worry about perfection before the experiment launches means there aren't many consequences if they don't get it right during setup. With Statsig Warehouse Native, they can easily create new metrics or even re-analyze past experiments.
Flexibility: Whatnot gained more flexibility using Statsig Warehouse Native compared to Cloud for their business needs. Jared noted how they achieved meaningful variance reduction on their most important metrics by changing the default CUPED lookback window.
Jared noted that Whatnot's experimentation costs and efficiency improved because they only accessed data when needed for an experiment — which can now be done on the fly.
Because the stats engine operates on top of your warehouse, you can leverage all of its data to conduct deeper analysis. Data-oriented teams can take advantage of:
Anonymous ID resolution: A key unlock we've seen is when it comes to tying together different identities. Customers can run experiments on logged out users but measure logged in metrics — connecting the journey from when a user visits your platform to becoming a paid user — you can now get user-level metrics easily into 'visit ID' experiments.
SQL-based customization and semantic layer sync: Customers can apply SQL-based filters, allowing for complex operations on JSON fields or the parsing of arrays. If you have an existing semantic layer or metric definition layer, you can sync it with Statsig. Since Statsig reads directly from your source of truth, your existing definitions will still work.
Global user dimensions: Tools like Statsig can automatically manage joins for you, allowing you to use a global table, such as user-country, and then apply those dimensions to any experiment. This enables you to add deeper context to experiments and derive more detailed conclusions from the data.
Furthermore, the warehouse native approach provides more cost controls and transparency, although there are trade-offs of having to maintain storage and compute.
Having previously worked at DoorDash, a renowned experimentation-led organization and seen the space evolve over time, Jared shared his perspectives on the build vs. buy debate. He explained why it makes sense for nearly all organizations to partner with a vendor to get state-of-the-art capabilities while saving on costs:
Further reading: The build vs buy discussion.
While warehouse-native vendors traditionally have limited real-time capabilities due to delayed data in warehouses, vendors like Statsig assist you in combining real-time events with warehouse metrics. You can integrate Statsig's SDK to accelerate engineering velocity with feature rollouts (online) and conduct comprehensive hypothesis testing later (offline).
Read our docs to learn how you can get started with Statsig Warehouse Native to run your first analysis right away!
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾
The Statsig <> Azure AI Integration is a powerful solution for configuring, measuring, and optimizing AI applications. Read More ⇾
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾