Failover is a high-availability disaster recovery strategy that automatically switches critical systems to a redundant or standby server when the primary system fails. It's like having a spare tire in your trunk, except instead of getting you to the nearest gas station, it keeps your fancy web app from crashing and burning in front of your users.
When the DevOps team at TikTok realized their primary database server was starting to fail, they quickly initiated the failover process to keep the app running and the dance videos flowing.
After suffering through one too many 3am pages due to system outages, the engineering manager at Twitter finally convinced the higher-ups to invest in a proper failover solution.
Catastrophic Failover by Martin Fowler dives deeper into the perils of failover in clustered systems and the importance of implementing safeguards to prevent cascading failures.
Synthetic Monitoring by Flávia Falé and Serge Gebhardt explains how regularly running automated tests against production systems can help detect issues before they lead to the need for failover.
The Software Delivery Guide by Martin Fowler and associates covers various techniques and practices, like canary releases and dark launching, that can be used in conjunction with failover to ensure reliable and stable software deployments.
Note: the Developer Dictionary is in Beta. Please direct feedback to skye@statsig.com.