What Could Possibly Go Wrong

At my work we often throw around the phrase WCPGW (What Could Possibly Go Wrong) in response to ill-advised or just plain crazy ideas. It's fun, and lets off some steam, but it occurred to me recently that there's a useful kernel of truth in it. Indeed, a good sysadmin is always asking this question; when designing systems, preparing to make a change, in the heat of an emergency, and in security design and response.

When designing a system, we think about the way things hang together, and try to make it at least a little bit robust against single points of failure (for systems that need it, of course). Can we have some trivial redundancy in the system so that if a single server has a little moment, service is retained. Or, what could go wrong at a single geographical location, and how do we protect against that (geographical redundancy, BCP, or DR).

When preparing to make a routine change, we think about the ways the routine change could surprisingly go wrong, and how we will recover if it does. Having at least a vague idea of how we'd back out from any given point in the process is much better than having to completely make it up as you go along. Of course we can't think of everything in advance but there are typically higher risk points that can be identified in advance and prepared for.

In the middle of an emergency/outage, when we're about to take some remedial action, we think about how our action could make things worse, and decide if the risk is worth it. Sometimes we cannot possibly make it worse, which is kinda liberating, but it's not unusual to be in a make-or-break state where a wrong choice will make things infinitely worse (e.g from just having a service unavailable, to having lost all data and being unrecoverable short of restoring from backup).

And finally security, which is the canonical situation where WCPGW is the question to ask.  WCPGW is that an attacker could manipulate the system and gain unwarranted access.

So, this Christmas, keep on using WCPGW as a stress relief and source of humour, but remember: this is the job of a sysadmin, to be forever considering WCPGW.

 

PS: This also probably partly explains why sysadmins are such a cynical bunch.