Move fast and break things

Fri Jan 27 2023

Pierre Estephan

Software Engineer, Statsig

Company culture can make or break a company.

Culture is more than just brightly colored posters with preachy quotes you'll see in every tech office. Culture is a set of shared values that guide how your approach your work. Your culture will shape your infrastructure, tooling, and processes and that in turn will shape how fast you can move as a team. Culture is how you get shit done.

In the same vein, superficial cultural values like “be smart” or “optimization” don’t actually show your employees how to act—or more importantly, how not to act. Real values help you make tradeoffs when faced with a decision. For instance, “Find a way or make one” expressly discourages halting for red tape.

I spent the first 6 years of my career as an engineer at Facebook, and the culture—especially early on—was amazing. It had a profound effect on how I thought about building and shipping products.

The culture empowered me to push for and drive the changes I believed in. It made me feel strong ownership over everything we built and shipped. Most importantly, I was able to move fast, lightning fast.

sonic the hedgehog with the caption gotta go fast

Why is moving fast important?

Moving fast isn’t just shipping a lot of code; it’s also how quickly you can learn and iterate on a feature or product.

As an industry, we used to move slowly. As recently as 10-15 years ago, most companies had a waterfall approach to building products, and would work on a piece of software for years before launching.

The problem with this approach is that you don’t get signal about how people are using your product while you’re building it. You aren't able to use data to refine your product or guide your roadmap. Over time, the best companies realized this and began to move away from the waterfall model in favor of a more iterative approach, building entire systems to speed up the iteration loop.

In the early 2010s, Facebook had a weekly push where new features and bug fixes would be released. Today, leading tech companies like Facebook deploy code continuously to production, 24 hours a day, 7 days a week. By the time I left, I could start working on a feature one evening, safely push out a change the next morning, and have billions of users interacting with my feature by the afternoon.

This speed of iteration inherently changes how you build software. The shorter your iteration cycle, the more signal you get on what you should be building. It is infinitely easier to build a great product by rapidly iterating with data and feedback, rather than trying to create a perfect product in a vacuum.

Facebook encapsulated this in a value as “Move Fast and Break Things”.

"By the time I left, I could start working on a feature one evening, safely push out a change the next morning, and have billions of users interacting with my feature by that afternoon."

Move fast and break things

Move Fast and Break Things was my favorite value at Facebook. It was a useful guide when thinking about tradeoffs in your day-to-day. Answering questions like the following became easy;

  • Should I launch a scrappy MVP of a new feature early or wait for it to be perfect?

  • Is it okay to take time to prioritize working on infra or tooling to help my team move faster over feature work?

  • Is this process valuable or is a more lightweight option appropriate?

It was extremely empowering to work in this kind of environment—it pushed you to be bold and take bigger risks, and to push the limits of how fast you could move.

One criticism this value received was around the “Break Things” portion—in 2014 FB dropped “break things,” relaunching the slogan as “Move fast on Stable Infra”, which never really had the same ring to it.

mark zuckerberg on stage with move fast and stable infra poster behind him

In my opinion, this iteration came from trying to clarify a common misunderstanding of the original value. “Break Things” was never supposed to mean not caring about whether things broke in production, moving recklessly, or ignoring product quality.

Side note, imagine telling your manager “but I thought I was supposed to break things!”

Things will sometimes break; that is an inevitability of building software. What matters more is what happens when things do go wrong. How quickly can you catch and revert issues? How can you make sure they won’t happen again?

You can move fast, safely, but you can only move as fast as your infrastructure will allow. At Facebook, our culture of moving fast shaped our tooling, infrastructure and processes—and that in turn set the pace for how quickly we were able to move.

So how did we move fast without breaking things?

Several pieces come together to make it easy to move fast safely, from infrastructure to process to values.

a meme about two wolves that break things

Controlling feature rollouts

Controlling how you roll out features and code is critical to moving fast, and we had powerful tooling to control every aspect of a rollout. Almost every single code change we pushed out was controlled by a gate or experiment:

With Gatekeeper, your feature could be rolled out slowly to any specific subset of users (country, age, device, etc), and rolling back changes took minutes. Control over who saw which feature was granular and fine-tuned, i.e. finding an issue in Android v222 meant only rolling back for that version and OS, not everywhere.

With QuickExperiment, you could get a detailed readout of how your new change affected every single company metric.

Gating code also allowed us to decouple shipping code from releasing features. This meant we could safely push unfinished features to production and only launch them internally, parallelizing dogfooding and testing new products while working on final changes and bug fixes for a launch.

Deploying code

It was ultra-fast and safe to get code into production—we continuously pushed code 24/7. Your code would slowly roll out to different tiers—first to employees, then to 1% of the population, then gradually to all regions and users. There were tests and metric guardrail checks running at each step to roll back a push if anything went wrong.

Handling issues in production

How you handle issues when they do inevitably come up can completely change the severity of a problem, and how likely it is to happen again.

We were able to catch problems quickly with powerful automated systems—automated alerts for metrics, testing frameworks for each stack and tests on rollout, clean alerting channels per team, and well-thought-out oncall rotations with automated issue triaging.

We had amazing tooling for finding and debugging metric regressions—Scuba allowed you to query sampled data in seconds with no SQL. This made understanding the impact of an issue (i.e. which platforms were affected/what time/which users etc) insanely fast. Unidash made it easy and fast to create dashboards for your team - which resulted in metrics showing up on TVs all over the office; seeing a topline regression in metrics was as simple as glancing across the room.

Finally, we had a blameless culture: When things did break, rather than blaming an individual we would hold a "SEV review" to (a) identify a root cause and (b) create follow-ups—what new infrastructure, tooling, or processes would make sure similar issues didn’t happen again.

Other values at Facebook

Other values worked together with “Move Fast and Break Things” to incentivize moving fast safely.

During performance reviews, we rewarded engineers that worked on “Better Engineering”—fixing bugs, cleaning up tech debt, scheduled quality weeks, writing tests, and working on new infrastructure to prevent or catch issues early.

“Nothing at FB is someone else’s problem” was another a company value that encouraged engineers to be conscientious stewards of all the products we shipped. There would always be many hands on deck whenever something went wrong.

Facebook also had a strong culture of dogfooding its products. Everyone in the company used the latest test builds of each app to catch and flag issues before they hit production.

Moving fast at Statsig

When I left Facebook in 2021 and was looking for new opportunities, my highest priority was finding a company with great culture and hard problems to solve. I feel incredibly lucky to have found a company that moves insanely fast, even faster than we could at Facebook, and has values that I resonate with.

One of our core values, "For Builders By Builders" challenges me daily to think about which tools our team needs to move faster, safely, and build those into the product—using Statsig to build Statsig. Every day I get to work on adding better tools to our toolbelt which help with rolling out changes and launches of features, experimentation, and data analysis, enabling other companies (and myself) to move fast.

Today, even small startups can move as fast as big tech companies like Facebook. We're working to make the sophisticated tooling that makes this possible accessible to everyone. We want to democratize experimentation with the hope that this helps more companies become successful, and helps them provide better products and experiences to their users.

So I'll leave you with a question; what is slowing you down today, and how can you help your team move faster?

a poster that says

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy