What Could Possibly Go Wrong

At my work we often throw around the phrase WCPGW (What Could Possibly Go Wrong) in response to ill-advised or just plain crazy ideas. It's fun, and lets off some steam, but it occurred to me recently that there's a useful kernel of truth in it. Indeed, a good sysadmin is always asking this question; when designing systems, preparing to make a change, in the heat of an emergency, and in security design and response.

Fun times with random IPSec corruption

Let me tell you a story of woe, intermittent/random corruption, and confusion.

Background

We had reason to stand up a new VPN, from a new data center, to a VPC in AWS. For some semi-philsophical semi-technical reasons, this was not a "VPC VPN", but rather a GRE-over-IPSec tunnel to an EC2 instance inside the VPC, and it was the first of this kind we'd deployed (i.e. GRE over IPSec, to/from AWS).

Pre-problem 1: Racoon sucks.

Things I have learned - Part 5

Short and sweet today: There is always a point of failure, between your redundant, non-single-point-of-failure components You know, the single cable or switch that connects your VRRP firewalls, which on failure results in two machines that both think they're master. Or the RAID controller that connects to both disks in your RAID-1 mirror, which on failure takes out both disks (or worse, corrupts data on them).

Pages

Subscribe to stroppykitten.com RSS