Fault tolerance

Fault tolerance is the ability of a system to continue operating properly even when one or more of its components fail. It's a critical property for any system that needs to be reliable, because let's face it, shit happens (especially in software).

How to use it in a sentence

  • I told the PM that we couldn't just deploy the app on a single EC2 instance and call it a day - we needed to build in some fault tolerance so it wouldn't fall over like a drunken frat boy every time there was a hiccup in the cloud.

  • Management keeps harping on about five nines of uptime, but without investing in some serious fault tolerance measures, they'll be lucky to get five minutes of uptime before the whole thing goes tits up.

If you actually want to learn more...

  • Catastrophic Failover: A cautionary tale about how fault tolerance measures like failover can sometimes bite you in the ass if not implemented carefully.

  • Eradicating Non-Determinism in Tests: Tips for eliminating sources of non-determinism (like relying on the system clock) that can cause intermittent test failures and make fault tolerance harder to achieve.

  • Developing Patterns in Enterprise Software: An overview of various pattern catalogs for enterprise software development, which often touch on fault tolerance concepts.

Note: the Developer Dictionary is in Beta. Please direct feedback to skye@statsig.com.

Join the #1 experimentation community

Connect with like-minded product leaders, data scientists, and engineers to share the latest in product experimentation.

Try Statsig Today

Get started for free. Add your whole team!

Why the best build with us

OpenAI OpenAI
Brex Brex
Notion Notion
SoundCloud SoundCloud
Ancestry Ancestry
At OpenAI, we want to iterate as fast as possible. Statsig enables us to grow, scale, and learn efficiently. Integrating experimentation with product analytics and feature flagging has been crucial for quickly understanding and addressing our users' top priorities.
OpenAI
Dave Cummings
Engineering Manager, ChatGPT
Brex's mission is to help businesses move fast. Statsig is now helping our engineers move fast. It has been a game changer to automate the manual lift typical to running experiments and has helped product teams ship the right features to their users quickly.
Brex
Karandeep Anand
President
At Notion, we're continuously learning what our users value and want every team to run experiments to learn more. It’s also critical to maintain speed as a habit. Statsig's experimentation platform enables both this speed and learning for us.
Notion
Mengying Li
Data Science Manager
We evaluated Optimizely, LaunchDarkly, Split, and Eppo, but ultimately selected Statsig due to its comprehensive end-to-end integration. We wanted a complete solution rather than a partial one, including everything from the stats engine to data ingestion.
SoundCloud
Don Browning
SVP, Data & Platform Engineering
We only had so many analysts. Statsig provided the necessary tools to remove the bottleneck. I know that we are able to impact our key business metrics in a positive way with Statsig. We are definitely heading in the right direction with Statsig.
Ancestry
Partha Sarathi
Director of Engineering
We use cookies to ensure you get the best experience on our website.
Privacy Policy