Failure is something that fascinates me, because it often comes down to not just doing the wrong thing, but thinking about things the wrong way. Systems thinking is hard even when you've studied it, and for those that haven't... problems with software can feel more like a chaotic force of nature than a mechanical failure.
I plan to write more about that topic soon enough. Before I do, I'd like to share a few favorites from my own reading list that I think you'll enjoy:
Queues don't fix overload -- This article talks about the many reasons why work queues aren't a proper fix for performance issues, and what can go wrong when you misuse queues for that purpose. It's worth a read for the "kitchen sink" diagrams alone, which are as informative as they are hilarious.
What do you mean, 'we need more time'??? -- Here you'll find a detailed thought experiment on why we are often very bad at providing accurate estimates, along with a helpful reminder that scheduling is fundamentally a separate problem from estimation and should be communicated about separately.
There is no happy path in programming -- A short and very practical case study of how the 'Fallacies of Distributed Computing' can cloud your judgement and lead to bad system behavior, even in the context of a relatively simple web application.
How complex systems fail (PDF) -- This amazingly concise overview of the nature of failure covers 18 useful maxims for thinking about what happens when things go wrong in systems that can't easily fit in one person's head. It originates from a medical context, but applies just as well to software development.
Hope you enjoy these links. If you'd like to discuss them, just reply to this email. :-)
PS: I discovered most of these links via the Practicing Developer's Workshop. We're accepting a few new members soon, so if you're interested in joining, let me know!