How to debug your experiments and feature rollouts

Wed May 22 2024

Ryan Musser

Head of Solutions Engineering, Statsig

When it comes to digital experimentation and feature rollouts, the devil is often in the details.

Whether you're a data scientist, a product manager, or a software engineer, you know that even the most meticulously planned test rollout can encounter unexpected hiccups.

In this blog, we'll explore common pitfalls in digital experimentation, strategies to avoid and debug them, and best practices to ensure your experiments yield valuable insights.

Plus, we'll highlight how Statsig specifically aids in smoothing out the process.

The common pitfalls of digital experimentation

Digital experimenters face a myriad of challenges that can skew results or derail projects entirely. Here are a few common pitfalls based on our own experience helping customers:

  1. Inadequate sampling: A sample that's too small or not representative of an evenly weighted distribution can lead to inconclusive or misleading results.

  2. Lack of confidence in metrics: Without trustworthy success metrics, it's harder to determine how to make a decision.

  3. User exposure issues: If users aren't exposed to the experiment as intended, your data won't reflect their true behavior.

  4. Biased experiments: Running multiple experiments that affect the same variables can contaminate your results.

  5. Technical errors: Bugs in the code can introduce unexpected variables that impact the experiment's outcome.

Avoiding and debugging pitfalls: Strategies and best practices

To navigate these challenges, here are some strategies and best practices:

Ensure proper sampling

  • Randomization: To mitigate selection bias and ensure a fair comparison between control and test groups, employ a hashing algorithm. SHA-256 is what Statsig uses. These algorithms can deterministically assign users to groups based on unique identifiers, ensuring a balanced distribution. Consider using feature flagging allocation frameworks here.

Define metrics clearly

  • Success criteria: Before launching an experiment, it's crucial to define how you’re measuring success. Make these metrics readily available for analysis.

  • Consistency: Consistency in metrics is vital for comparability across experiments, and business dashboards. This ensures that stakeholders can compare results from different experiments or business surfaces on a like-for-like basis. Don’t have slightly different metric definitions in all of your different tools, it’s a bad look for a lot of reasons.

Monitor user exposure

  • Real-time tracking: Implement real-time tracking to monitor user exposure to experiments. Open-source pipeline tools like Snowplow or Apache Kafka can capture and process exposure events as they happen, providing immediate meta feedback on the reach, split and engagement of your experiment.

  • Exposure events: Consider using any existing event tracking frameworks like Segment, mParticle, Google Tag Manager, RudderStack, Mixpanel etc. to log when users encounter the experimental feature.

Manage overlapping experiments

  • Running experiments mutually exclusively: To prevent experiments from influencing each other, consider using feature flagging frameworks.

  • Interaction effects: When experiments cannot be run mutually exclusively, it's important to measure and account for interaction effects. Statistical software like R or Python's SciPy library can help you analyze the data for potential interactions between experiments, allowing you to adjust your analysis accordingly.

Debug technical issues

  • Pre-launch testing: Before going live, rigorously test your experiment in a staging environment that mirrors production. Tools like Jenkins or CircleCI can automate the deployment of your experiment to a staging environment, where you can perform integration and user acceptance testing.

  • Monitoring tools: Once your experiment is live, use monitoring tools like Datadog, New Relic, or Prometheus to track its performance. These tools can alert you to bugs, performance issues, or unexpected behavior in real-time, enabling you to address problems quickly before they affect the validity of your experiment.

How Statsig elevates the experimentation process

Statsig offers a suite of tools that specifically address these common pitfalls, making it easier to run successful experiments.

Sampling and segmentation with Statsig

When initializing Statsig’s SDKs, you can share custom traits that allow you to segment users at a granular level.

sampling and segmentation

Statsig’s SDKs manage allocation deterministically, so you don’t have to worry about managing the randomization. Statsig also offers stratified sampling; When experimenting on a user base where a tail-end of power users drive a large portion of an overall metric value, stratified sampling meaningfully reduces false positive rates and makes your results more consistent and trustworthy.

Statsig also offers experiment templates to help standardize the blueprint for gates or experiments, ensuring consistency across projects by including predefined metrics and settings, which helps prevent setup errors.

Defining metrics with Statsig

Statsig's Metrics Catalog allows you to create and organize a collection of metrics, which can be tagged for easy retrieval and association with specific product areas, business functions, or objectives. You can manage change control and maintain quality through versioning and reviews. Additionally, the catalog supports collaboration and standardization across teams, as metrics can be shared and accessed by all team members, ensuring consistency in metric definitions and analyses.

defining metrics with statsig

Within the Shared Metrics Catalog, Statsig offers Tagging which allows for easy organization and retrieval of relevant metrics during analysis, ensuring the right metrics are consistently applied to experiments.

tagging within the shared metrics catalog

Real-time user exposure insights

Statsig’s SDKs automatically manage exposure logging anytime you call the checkGate or getExperiment methods.

Statsig then provides a diagnostics view, which contains a log stream for real-time exposure data, allowing for detailed insights into the evaluation process for each assignment log.

real time user exposure insights

Statsig also runs automated health checks that monitor the health of an experiment, alerting users to potential issues such as imbalances in user allocation or mismatches in metric data.

statsig automatic health checks

Managing overlapping experiments

In some cases, running multiple overlapping experiments can lead to interaction effects, where the influence of one experiment impacts the outcomes of another, potentially confounding results. This can make it difficult to isolate the effect of individual changes and understand their true impact on user behavior.

Statsig addresses this challenge with Layers, which allow for the creation of mutually exclusive experiments, ensuring that a user is only part of one experiment within a layer at any given time. This feature helps maintain the integrity of experiment results by preventing overlap and the associated interaction effects between concurrent experiments.

Also, within Statsig, Teams can be configured at the project level to control pinned metrics, template access and enforce template usage, providing an additional layer of oversight and reducing the likelihood of mistakes in experiment setup.

Technical debugging with Statsig

Statsig allows you to enable the experiment in lower environments without affecting production traffic. Statsig also allows you to override specific users, for example, if you wanted to test your feature with employees first in production.

testing a feature with employees first

Statsig's Metric Alerts notify you when a metric deviates beyond a set threshold, which can be crucial for identifying issues in an experiment. These alerts can be configured to monitor for specific changes, such as a drop in daily active users (DAU) or checkout events, and can be set to check for hourly or daily variances. When an alert is triggered, subscribers receive notifications via email, the Statsig Console, and Slack, allowing for quick investigation and debugging of the experiment.

metric delta alerts

Looking forward

Digital experimentation is a powerful way to drive product innovation and growth, but it comes with its own set of challenges. By understanding common pitfalls and implementing best practices, you can set your experiments up for success. Statsig, with its comprehensive suite of tools, can be a valuable ally in this process, providing the infrastructure and insights needed to run effective experiments.

Whether you're just getting started or looking to refine your approach, Statsig's platform is designed to support you every step of the way. Happy experimenting!

Get a free account

Get a free Statsig account today, and ping us if you have questions. No credit card required, of course.
an enter key that says "free account"

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy