An introduction to testing in production

Thu Feb 15 2024

"Testing in production" may sound like a risky proposition, but it's an essential practice in modern software development. By testing new code changes with real user data, you can gain valuable insights and catch issues that might slip through in staging environments.

While the idea of testing in production might raise some eyebrows, it's important to understand what it really entails. Let's dive into the details and clear up some common misconceptions.

Understanding testing in production

Testing in production (TIP) refers to the practice of testing new code changes on live user traffic, rather than solely relying on staging environments. It's a crucial component of continuous delivery, allowing teams to validate their changes in real-world scenarios.

The importance of testing in production lies in its ability to provide accurate, real-time feedback. By using actual user data and behavior, you can uncover issues that might not be apparent in controlled staging environments. This helps ensure a better user experience and reduces the risk of deploying faulty code.

However, it's essential to address some common myths surrounding testing in production:

  • Myth: Testing in production replaces the need for QA or other testing methods.

  • Reality: TIP is an extension of existing testing practices, not a replacement. It complements traditional QA processes by providing an additional layer of validation.

  • Myth: Testing in production is reckless and puts users at risk.

  • Reality: When done correctly, with proper safeguards like feature flags and gradual rollouts, TIP can be a safe and effective way to test new features.

Remember, testing in production is not about blindly releasing untested code to users. It's a strategic approach that involves careful planning, risk mitigation, and continuous monitoring. By embracing TIP as part of your software development lifecycle, you can deliver higher-quality software faster and with greater confidence.

Types of tests in production

Testing in production is not a one-size-fits-all approach. Different types of tests serve different purposes, allowing you to validate specific aspects of your software. Let's explore two common types of tests conducted in production environments.

Hypothesis-driven testing

Hypothesis-driven tests are designed to validate specific expectations or hypotheses about your software's behavior. These tests aim to answer questions like:

  • Will this new feature increase user engagement?

  • Does this change have any negative impact on the user experience?

By conducting hypothesis-driven tests in production, you can gather real data to support or refute your assumptions. This data-driven approach helps you make informed decisions about your software's direction and prioritize improvements based on actual user behavior.

Safety and performance tests

Ensuring the stability and performance of your production environment is crucial. Safety and performance tests help you identify potential issues before they impact your users. These tests may include:

  • Load testing: Simulating high traffic to ensure your system can handle the expected load without crashing or slowing down.

  • Stress testing: Pushing your system beyond its normal capacity to identify breaking points and potential bottlenecks.

By conducting these tests in production, you can validate your software's resilience in real-world conditions. This helps you proactively identify and address performance issues, ensuring a smooth user experience even under heavy load. For more on load testing and stress testing, you can refer to Towards Data Science's article on the experimentation gap and Microsoft's guide to trustworthy experimentation.

Strategies for testing in production

Blue-green deployments

Blue-green deployments involve maintaining two identical production environments. One environment (blue) serves live traffic, while the other (green) is used for testing updates. Once the updates are verified in the green environment, traffic is switched from blue to green, making the updated version live. Learn more about blue-green deployments and their benefits.

Canary releases

Canary releases gradually roll out new features to a small subset of users. This approach allows you to monitor the impact of changes and catch issues early. If the new feature performs well, it can be incrementally released to more users; if issues arise, the rollout can be halted, minimizing the impact on the overall user base.

Feature flags are a powerful tool for implementing canary releases. They allow you to control the visibility of features for specific user segments. By combining feature flags with real-time monitoring, you can quickly detect and respond to any issues that arise during a canary release. For example, see how Statsig uses feature flags.

Progressive delivery takes canary releases a step further. It involves automatically increasing the rollout of a feature based on predefined metrics and thresholds. For example, you can configure your system to automatically increase the rollout percentage if key performance indicators (KPIs) remain within acceptable ranges. This approach enables a more data-driven and automated release process.

Advantages of testing in production

Real-world accuracy

Testing in production provides the most accurate results, as it uses real user data. This approach ensures that test results reflect true user behavior, including edge cases. Real-world data is essential for making informed decisions about feature releases and improvements.

Limiting the blast radius

By testing with a subset of users, you minimize the risk of widespread issues. If a problem arises, you can quickly roll back the change, limiting the impact. This approach allows for faster iteration and more frequent releases, without compromising stability.

Immediate feedback

Real-time monitoring and feedback from production systems enable you to quickly identify and address issues. This immediate feedback loop allows you to respond to problems before they affect a significant portion of your user base. Combining real-time data with feature flags and progressive delivery techniques further enhances your ability to manage risks.

Tools and techniques for safe production testing

Feature flags

Feature flags enable you to safely deploy new features by allowing you to toggle them on and off without redeploying code. This approach provides fine-grained control over feature visibility, making it easier to test in production and roll back if necessary. Implementing feature flags also enables progressive delivery and experimentation, allowing you to gradually release features and measure their impact.

Progressive delivery

Progressive delivery methods use observability data to gradually roll out features, ensuring stability and performance before full deployment. By automatically increasing the rollout percentage based on predefined metrics, progressive delivery minimizes the risk of introducing bugs or performance issues. This data-driven approach to testing in production helps you make informed decisions about when to fully release a feature.

Observability and monitoring

To effectively test in production, you need a robust observability and monitoring infrastructure. This includes collecting and analyzing metrics, logs, and traces. By setting up alerts and dashboards to track key performance indicators (KPIs), you can quickly detect and respond to issues. Integrating your observability tools with feature flag management systems enables you to correlate changes with their impact, providing valuable insights for future releases.

Tools and techniques for safe production testing

Feature flags

Feature flags enable safe deployment of new features by allowing toggling on and off without code redeployment. They provide fine-grained control over feature visibility, making it easier to test in production. Implementing feature flags also enables progressive delivery and experimentation.

Progressive delivery

Progressive delivery methods use observability data to gradually roll out features, ensuring stability and performance before full deployment. By automatically increasing the rollout percentage based on predefined metrics, progressive delivery minimizes the risk of introducing bugs. This data-driven approach helps you make informed decisions about when to fully release a feature.

Observability and monitoring

To effectively test in production, you need a robust observability and monitoring infrastructure. This includes collecting and analyzing metrics, logs, and traces. By setting up alerts and dashboards to track key performance indicators (KPIs), you can quickly detect and respond to issues.

Integrating your observability tools with feature flag management systems enables you to correlate changes with their impact. This provides valuable insights for future releases and helps you optimize your testing strategy. Monitoring production systems in real-time is crucial for identifying and addressing issues before they affect a significant portion of your user base.

Automated rollbacks

Automated rollbacks are another essential tool for safe production testing. By defining thresholds for key metrics, you can automatically roll back changes that negatively impact performance or user experience. This reduces the time it takes to resolve issues and minimizes the impact on your users.

Combining automated rollbacks with feature flags and progressive delivery creates a powerful safety net for testing in production. It allows you to confidently release new features while minimizing the risk of introducing bugs or performance issues.

Chaos engineering

Chaos engineering is a technique that involves intentionally introducing failures into your production system to test its resilience. By simulating real-world scenarios, such as server outages or network failures, you can identify weaknesses in your system and improve its ability to recover from failures.

Incorporating chaos engineering into your testing strategy helps you build more resilient systems that can withstand unexpected events. It also provides valuable insights into how your system behaves under stress, enabling you to optimize performance and reliability.


Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy