You need a checklist
When asked for software development book recommendations, I often surprise people by suggesting a non-technical book: The Checklist Manifesto: How to Get Things Right. (It’s a quick read!)
While engineers generally excel at technical details, they often overlook the human and organizational aspects of their work. This is where checklists are crucial - they’re a powerful tool for managing complexity. In high-stakes fields like aviation and surgery, where systems are too complex for any one person to fully manage, checklists have evolved to manage complexity and enable fast decisions under pressure.
You need a checklist
Software engineering is one of those fields filled with complex systems that exceed single-person comprehension. Deployments, in particular, are high-stakes operations where both speed and accuracy are critical. This combination of complexity and time pressure makes checklists an essential tool, especially during the deployment process.
Checklists are meant to be used at a clear “pause point” in a process. In software engineering, there are 2 pause points that are a natural fit for a checklist:
- When a Pull Request is opened
- When a Deployment is triggered
These pause points are a natural place to stop and check that what you’re about to ship will not break something. A checklist should have 5-9 quick steps that are:
- Critical safety steps in danger of being missed (e.g. your unit tests).
- Not adequately checked by other mechanism (e.g. a QA engineer can’t manually check 100 API routes).
- Actionable and specific.
Github actions are checklists
The great news for engineers is that most of us write our code on github, which already has a built in checklist system: Github Actions!
Adding a checklist to your software engineering workflow is as simple as adding your unit tests to github actions, and requiring them to pass before PRs merge.
If you don’t already have unit tests, now is a great time to start! Unit tests are a checklist for your code!
Zach’s checklist for engineering success
All of these steps can be easily configured under “settings” for your github repo. If you’re confused, drop me a line and I can walk you through it in about 15 minutes. You can also roll each of these steps out slowly, over time:
- Use github
- Write unit tests that are fast and robust
- Run in less than 1 minute
- Run without external dependencies
- Are easy to run locally
- Require pull requests to merge to main. Optionally require a review.
- Require the unit test to pass before merging. Block PRs if the tests fail.
The last item on this list is where I most often get pushback. People want to be able to merge hot fixes quickly, and they worry that a pause point in their workflow will slow them down.
Usually, a good point to re-introduce the idea of a checklist after a failed deployment where a change needs to be rolled back because it broke something else (and now the CEO is even madder!) Pause points actually help you move faster in high-stress situations, because they give you confidence to take decisive action.
At this point, I ask the team if they would like to stop being the team that’s always shipping broken code. A lot of eng teams have this reputation internally, and feel helpless to change it. Reading the checklist manifesto is step one towards resolving this anti-pattern.