Customer and application data has become an essential commodity and provides boundless insights with proper storage and analysis.
As large amounts of customer and application data are ingested, organizations look to data warehouse systems to store this data securely and efficiently. Over time, these sophisticated data systems become a single source of truth for an organization, the proverbial “oil field” from which product analysts/data scientists consume and produce refined insights on customer retention, engagement, product usage, etc.
Companies are increasingly reliant on the expertise of these product analysts and data scientists to understand how their customers are using their platforms, how they are not, and where overarching product development efforts are either succeeding or falling short. As a consequence, experimentation—or, more broadly, the ability to drive a larger user base to different variants of your product for impact measurement—has become commonplace in a data-driven product function.
Integrating a sophisticated experimentation tool atop an existing data warehouse where an organization’s business logic exists enables teams to quickly garner product insights and begin enacting a broader culture of experimentation.
If your organization does not yet have a data warehouse in place, no need to worry! With Statsig you have the flexibility to utilize our cloud-hosted product. We’ll jump into the differences below.
As data warehouse ingestion and analytics workflows evolve, two dominant modes of conducting experimentation at scale have emerged:
Cloud-hosted experimentation refers to the use of cloud-based platforms and tools to conduct tests and experiments. This not only includes experiment analysis, but also leveraging assignment SDKs to serve different product variants to your users and ensuring consistent experiences are delivered as new features are rolled out and tested. Typically metric log collection is done either with logging SDKs or through third-party data connectors, such as a CDP or other data import mechanisms.
The advantage of cloud-hosted experimentation is that the infrastructure is managed by the software provider, alleviating companies from the burden of maintaining physical servers and other hardware. This allows companies to scale quickly without worrying about security or maintaining high availability.
Warehouse-native experimentation deploys a provider’s solution directly on top of a data warehouse. All statistical analysis across metrics occurs on the warehouse resources and results are written locally to a designated table. Unlike cloud-hosted solutions, warehouse-native experimentation doesn’t require data to be transferred to cloud-hosted systems. This not only guarantees that user-level data does not leave the warehouse but also enables quick insights and post-hoc analysis on exported data.
Statsig Warehouse Native allows users to effectively run statistical analysis on existing metric definitions and other business logic metrics that already exist in your warehouse. This eliminates redundant work to build out a metrics catalog and enables product stakeholders to define more derivative metrics downstream, such as funnels, ratios, and more, with an intuitive UI.
We also see scenarios where an organization wants to migrate their existing/internal experimentation systems and have past user assignment data alongside metric definitions in their warehouse.
With warehouse native, these users can quickly analyze and validate previous/pre-existing experiments with the Statsig Stats Engine. This has the added benefit of a very fast time to value; we typically see users able to analyze an existing experiment in less than an hour. Contrast this with orchestrating an internal or cloud deployment, which can range from two weeks to several months before you’re able to analyze any results.
Statsig’s warehouse native can enhance existing warehouse data with the flexibility of a “hybrid” deployment, where you can utilize various SDKs for collecting additional metrics and assignment data alongside any existing data that may exist. Log collection (metric and user assignment data) and experiment results are forwarded to internal data warehouse tables where they are easily accessible for more granular ad-hoc analysis and validation.
Statsig Cloud primarily serves organizations looking for a cloud-hosted solution that simplifies the setup and management of experiments. This is the default path for organizations that do not currently utilize a data warehouse architecture as the core of their data but want to set up an experimentation program. It is characterized by:
Ease of setup and use: With an integrated end-to-end platform covering SDKs for feature rollout, experiment allocation, analysis, and readouts, Statsig Cloud is designed for user-friendliness and efficiency. This makes it ideal for companies starting their data-driven journey or those seeking a reliable method for creating metrics without heavy data engineering work.
Scalability and maintenance: Being cloud-hosted, it offers scalability and high availability without the need for businesses to manage physical servers or other hardware. This is particularly beneficial for fast-growing startups and enterprises that prefer outsourcing infrastructure management.
Data source and integration: In Statsig Cloud, the primary source of metrics comes from Statsig SDKs or Customer Data Platforms like Segment, making it suitable for organizations that either do not have a large existing data warehouse, are already leveraging other data collection tools, or prefer to keep their experimentation data separate from other business data.
Potential drawbacks: For organizations with stringent data security and privacy policies, there might be a hard requirement that all operations exist within an organization's own VPC, in which case warehouse native becomes the correct option. For those with an extensive in-house data infrastructure, relying entirely on a cloud-hosted platform may pose challenges. If metric definitions already exist as an outcome of internal data/product analysts, importing those existing metrics (with warehouse native) removes the need to recreate those definitions in a cloud environment.
Statsig Warehouse Native, on the other hand, is tailored for businesses that already have a robust data warehouse infrastructure and are looking to leverage their existing data systems for experimentation. Its features include:
Data hosting and integration: Warehouse Native is designed to integrate directly with a company's data warehouse. This allows for seamless utilization of existing data pipeline resources and computation systems, making it ideal for organizations that view their data warehouse as the central hub for all data-related activities.
Flexibility and customization: With Warehouse Native, businesses have the flexibility to bring existing internal metrics and run complex experiments using their existing data. Data warehouse native, in this case, acts as a toolbox to enhance existing data analysis practices that already exist internally.
Data security and sanctity: Because Warehouse Native operates on data that already exists within the system and writes results to the same warehouse where it operates, the risks associated with data exports are minimized.
Potential drawbacks: An internal data team must set up and manage the warehouse resources where the stats engine will operate. This also means internal teams must monitor costs incurred for computation and storage, which can be perceived as drawbacks. To mitigate these risks, Statsig Warehouse Native surfaces computation costs associated with analysis performed and can provide guidance to reduce costs associated with querying and computing the data.
Feature/Aspect | Statsig Warehouse Native | Statsig Cloud |
---|---|---|
Data Hosting | Data is hosted in your own data warehouse. Results/exposures are written to the same warehouse. | Statsig hosts your data in our systems. Results/exposures can be exported. |
Primary Source of Metrics | Metric definitions in the warehouse are the primary source, ideal for using existing data pipelines and computation. | Primary source of metrics comes from Statsig SDKs or CDPs like Segment. |
Analysis Needs | Flexible analysis on top of existing source of truth metric data. | Automated experimentation for every experiment and product launch, especially with metrics from event logging. |
Data Team Involvement | Necessary for setting up warehouse connection and configuring core metrics. | Involvement is optional but recommended for experiment design and readouts. |
Costs | Includes Statsig license plus costs for computation and storage in your warehouse. | Total cost of ownership (TCO) is slightly lower, no warehouse costs involved. |
Modularity | Modular: Opt for integrated end-to-end platform or select subsets of capabilities. | Integrated end-to-end platform covering SDKs for feature rollout, experiment execution, analysis, and readouts. |
Integration with Existing Tools | Seamless integration with tools like Snowflake, BigQuery, Segment, Redshift, Databricks, etc. | Synergy with other SaaS tools for customer data and metrics, like Segment. |
Privacy and Security | Preferable for organizations with stringent data egress policies. | Suitable for organizations with less restrictive data egress policies. |
Setup Complexity | Requires more initial setup, particularly around data warehouse resource setup and integration with metric data. | Easier setup for developers, less initial configuration required. |
Experimentation Flexibility | Supports advanced experimentation features using pre-existing data sets. | Provides a reliable method for creating metrics with less engineering effort. |
SDKs Usage | Use your own or third-party SDKs for feature assignment; also supports Statsig SDKs. | Primarily relies on Statsig SDKs for feature assignment and metrics logging. |
Statsig Warehouse Native is designed to integrate seamlessly with an organization's existing data warehouse infrastructure, allowing businesses to leverage their pre-existing data systems and compute resources for experimentation and analysis. In general, this leads to a faster time to value, especially when an organization wants to leverage existing data models (as outcomes of business logic).
This model also enables product stakeholders to create derivative metrics without having to write SQL queries or requiring data analysts/engineers to get involved. This has the benefit of freeing up data resources but also creating autonomy and fostering a culture of experimentation across the org. Under the hood, advanced options save your data team from tedious work by allowing metrics to be configured with bake windows, winsorization, filters, and more.
We support the major data warehouse technologies so teams can keep their data where it most likely already lives, using querying engines they are already familiar with. As with both the cloud and warehouse native product, we provide a simple Pulse UI to make it easy for users across the org to understand the impact of various product decisions.
For product, this means extremely quick access to insights backed by best in class statistical methods. For data stakeholders, this means more time spent interpreting results and less time spent building and maintaining tedious queries.
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾
The Statsig <> Azure AI Integration is a powerful solution for configuring, measuring, and optimizing AI applications. Read More ⇾
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾