During my endeavors as a data engineer, I’d spend hours helping teams translate data models and building data pipelines between software tools so they could effectively communicate.
The frustration of getting the data into tools often spoiled the intended initial experience and added complexities to a production implementation. To add to this, as our models evolved, we’d have to spend meaningful development time to ensure everything continued to operate smoothly, often due to rigid downstream schemas.
Now as a solutions engineer, where I’m tasked with helping prospects understand the lift of data ingestion, I can rest assured that they can avoid a lot of these frustrating experiences with Statsig.
It doesn’t matter whether your team has an existing data pipeline feeding into a warehouse as a single source of truth, are utilizing a third-party tool for data collection, or are jumping into data logging for the first time; Statsig offers several mechanisms to address these different data flows with sophisticated SDKs, third-party integrations, and a Data Warehouse Native Solution. This guide will provide you with a step-by-step walkthrough of how to implement these solutions, with clear instructions and best practices to ensure a smooth setup.
Before you begin, ensure you have:
An account with Statsig
Access to your data source (e.g., data warehouse, 3rd party data tool/CDP, or application code).
Necessary permissions to read from and write to your data source. (if applicable)
Familiarity with SQL (if you're using a data warehouse)
Choose the one that best fits your infrastructure or current needs. You can also use multiple ingestion methods in most cases, if necessary. Typically if you’re just starting out you’ll want to orchestrate the SDK into your code to get oriented with the platform. On the other hand, if you have existing metric models in a warehouse, then our warehouse native solution generally will work better for your teams. Lets dive in!
Ideal for real-time event tracking directly from your application code. Load the Statsig SDK into your app or server and utilize built in methods to log any events with relevant metadata for thorough analysis. Perform experiments across multiple surfaces easily with the SDKs.
Use the logEvent
method in various languages to quickly populate Statsig with data for experimentation and analysis
Log additional metrics alongside 3rd party or internal logging systems to enhance measurement capabilities
Suitable for server-side event logging when the SDKs don’t cover your stack or if you prefer direct API calls. Note: This typically requires a bit more work to get it into a reliable state (vs the SDKs)
Use in situations where the SDKs are not supported or preferred
Can be used to support custom metric ingestion workflows
Use pre-built integrations with CDPs like Segment or mParticle for seamless data flow. Maintain your existing logging infrastructure to power experimentation quickly. Event filters can be utilized to reduce downstream event volume to Statsig, so only your relevant metrics are analyzed.
Quickly populate Statsig with metric data and analyze with the metrics explorer
Reduce time to analysis by simplifying experiment setup
Control event billing volume with event filtering
Connect your data warehouse (e.g., Snowflake, BigQuery) to Statsig for bulk data import. Map your data to Statsig’s expected schema and schedule regular (daily) imports for metric analysis. A copy of your metric data is stored in Statsig servers for processing.
Send custom events or precomputed metrics to cover more complex use cases, internal computations, attribution windows, etc
Use SQL queries to pull in data, join data when you need to to pull in more metrics.
If you have an existing warehouse with metric data, perhaps downstream a reverse ETL tool or internal logging system This ingestion method differs from also allows Statsig will operate on top of your data warehouse data and utilize warehouse resources to run experimentation and analysis in real-time. No user-level data is replicated in Statsig servers, so this pathway is preferred for privacy-conscious industries.
Easy SQL interfaces for connecting metrics, as well as assignment data for experiment analysis (offline experiments, 3rd party systems, internal systems)
Create multiple metric sources and build additional aggregate metrics on top of these sources
If you have existing experiment allocation data, you can perform experiment analysis in ~30 minutes
Reload experiment analysis on demand
Depending on your chosen method, you'll need to prepare your data connection and/or initialize our SDKs within your app:
Once you’ve chosen your SDK, you’ll need to integrate the Statsig into your application. Follow the official SDK documentation for specific instructions. We offer the high level steps here:
Initialize the SDK in your chosen language.
Start calling the logEvent
method. See a more in depth walk-through here.
import type { StatsigEvent } from '@statsig/client-core'; // log a simple event myStatsigClient.logEvent('simple_event'); // or, include more information by using a StatsigEvent object const robustEvent: StatsigEvent = { eventName: 'add_to_cart', value: 'SKU_12345', metadata: { price: '9.99', item_name: 'diet_coke_48_pack', }, }; myStatsigClient.logEvent(robustEvent);
Authenticate with your Statsig API key and use the log_event
endpoint to send data. Check the HTTP API documentation for details. Generally:
Fetch a client SDK key (or generate a new one) from the console
Send a POST request to /log_event
endpoint
curl \ --header "statsig-api-key: <CLIENT-SDK-KEY>" \ --header "Content-Type: application/json" \ --request POST \ --data '{"events": [{"user": { "userID": "42" }, "time": 1616826986211, "eventName": "test_api_event"}]}' \ "https://api.statsig.com/v1/log_event"
Set up the integration within your CDP's interface and link it to Statsig. Refer to the Integrations documentation for guidance on the specific tool you’re using. Generally:
Navigate to the integration section (settings → project → integrations) and select the applicable tile
Follow the specific instructions for connecting the data
Optional: Apply event filters to reduce downstream event volume
Configure the connection in the Statsig console, map your data fields, and schedule ingestion. The Data Warehouse Ingestion guide provides comprehensive instructions. The high level steps are:
Establish a connection with your data warehouse. You’ll need to create a service role that requires the ability to read metric and assignment data from your warehouse, write to a staging data set for caching and experiment results, and ability to run queries on top of the warehouse.
Establish metric and assignment sources via SQL queries or simply by table name. Map your data to Statsig’s expected schemas to establish baseline data.
Once a data connection has been established, you’ll want to verify that metrics are correctly flowing into the system and in the correct format. Depending on your implementation, Statsig provides a few ways of doing so:
Events Logstream - A live view of events that are logged and/or ingested. Dig into individual events to verify name, value, date, metadata, and that the correct ID is provided (A common mistake during implementation is unmatched/incorrect IDs)
Metrics Explorer - Provides mechanisms to dive into metric views via charts, funnels and more. You can apply filters to event and user metadata and validate the your data is being ingested correctly. Read more here.
SQL Debugging (for Warehouse Native/Ingestion) - With Warehouse Native, each metric source (and metric definitions built on top of sources) are produced as the result of a SQL query, so you’ll be able to quickly verify that the data exists on your warehouse.
For SDKs: Ensure the SDK is initialized correctly and that you're using the latest version. Check out the client SDK debugging guide.
For HTTP API: Check for errors in your API requests and ensure you're using the correct endpoints and authentication.
For Integrations: Verify that the integration settings match between your upstream tool and Statsig. Many of these systems have debugging tools to help diagnose improper data flows.
For Warehouse Native: Refer to this guide for assistance with debugging.
Reach out on our community slack channel for support if you still need assistance.
Once you’ve correctly orchestrated data logging and ingestion, you’ll be able to start creating a metrics catalogue that can be leveraged for analysis and experimentation. This metrics catalogue will ultimately become the bedrock of your product measurement, so it’s important to spend some time getting oriented.
For cloud, follow the creating Metrics guide to get started
For warehouse native, use this guide instead
If you’ve made it this far, I hope you’ve found this guide useful and you now have a clear understanding of how to ingest data into Statsig so that you can begin measuring impact!
As more teams get involved, revisit this guide to help orient new members. Should you have any questions or feedback on how we can improve our existing ingestion methods, please visit our community slack and drop us a line!
Understand the difference between one-tailed and two-tailed tests. This guide will help you choose between using a one-tailed or two-tailed hypothesis! Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾
From continuous integration and deployment to a scrappy, results-driven mindset, learn how we prioritize speed and precision to deliver results quickly and safely Read More ⇾
The Statsig <> Azure AI Integration is a powerful solution for configuring, measuring, and optimizing AI applications. Read More ⇾
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾