AWS Security Groups: A glimpse behind the curtain

Or: a little clue as to how the sausage is made.

At work recently I was creating an internal NLB (Network Load Balancer) in AWS.  An NLB is a magic bit of engineering that behaves a lot like LVS (Linux Virtual Server); it does healthchecks like a normal load balancer, but when forwarding requests it sends the packets on, having only modifyied their destination IP to that of the chosen backend.   Compare this to the original ELB (Elastic Load Balancer) or ALB (Application Load Balancer) both of which create a brand new TCP/IP connection from the ELB to the backend, and will have gotten all up in the HTTP request and potentially done all manner of things to/with the HTTP payload (adding headers at least, perhaps much much more).

This makes NLBs really useful, because the IP packet that arrives at your backend instance has the original source IP address not the internal IP of the ELB/ALB.  Sure, for HTTP(S) load balancing you've got the X-Forwarded-For header, but for something like an SSH ELB that's just rewriting and forwarding packets, this is quite handy

As an aside on security groups (if you're not familiar), they have two quite distinct roles:

  1. Contain security group rules that specify allowed traffic to (inbound) or from (outbound) the entities (e.g EC2 instances, RDS instances, or ELBs/ALBs) that have the SGs attached
  2. By being attached to entities, act as an identifier that can be used as the source/target of the rules just described.

It's worth remembering that it is entirely valid and reasonable to create an SG with no rules in it, assign it to entities as a tag that they're in a certain class, then using that group as the source/target in the actual rules included in other SGs. 

Interestingly, NLBs do not have security groups (SGs) attached to them like ELBs or ALBs, and this is where things get interesting.  I was setting up a path from our internet-facing reverse proxy (P) to an internal backend target service (B) that lives on 2 identical servers, with an NLB in between to deal with healthchecks and load balancing the 2 backends.  Yes, I do know I could have done this many other ways, including with HAProxy or something similar hosted on the reverse proxy, or with an ELB etc; don't judge me for my poor life choices.  Because the source of the traffic arriving at the backend was from the internal interfaces of the reverse proxy, it felt a little weird adding a rule allowing traffic from 0.0.0.0/0 to the SG attached to the backends (B), so I instead added a rule for each of:

  1. The /24 address for the subnet the NLB lives in, to allow the healthchecks to succeed
  2. A security group uniquely associated with the reverse proxy instances, for the traffic that has come through the NLB.

I was expecting the latter to allow traffic because a packet arriving at a backend would have the source IP of the reverse proxy instance the packet originated from.  I was rather surprised when it didn't work; cracking out tcpdump showed that the packets from P -> NLB -> B were simply not being seen by the backends.  I checked, and the proxy instances could connect directly to the backend instances, as would be expected.  So for debugging I added 0.0.0.0/0 to the SG on the backends, and the traffic started flowing; tcpdump showed that the source IP addresses were exactly what they should be (the internal IP of the proxy instances).  So I removed the 0.0.0.0/0 rule and added one for the /24 address for the subnet the reverse proxy instances were in.  Everything continued to work, so it wasn't some magic caused by a rule for 0.0.0.0/0.  Curiouser and curiouser.

Up until this moment, I had believed that the Security Groups worked much like a traditional firewall.  In my mind, a rule that allowed packets to port 80 from Security Group 'A' was implemented as a bunch of IP-address based rules that had an entry for the IP address of each entity that had Security Group 'A' attached to it.  If you added a Security Group to an instance/entity, something would trawl around and update all those firewall entries.  Clearly, based on what I had just observed, this was not true.  When the rule had a Security Group as the allowed source, the actual IP address of the packet had no bearing on whether it was allowed or not, after the packet had passed through the NLB.  The most obvious conclusion is that under normal circumstances, rules with Security Group references are implemented with some sort of tagging; a packet leaving an entity is tagged (encapsulated I guess) with identifiers of the SGs attached to that entity.  When the packet arrives at the target, those tags are compared against the SGs in the rules of the target, not the literal source IP in the packet.

It also appears that when passing through an internal NLB those SG tags are stripped, so that on arrival at the backend, the only thing left to check against is the IP address, giving the behaviour I saw.

I find this absolutely fascinating, even delighting.  It's fairly obvious in hindsight, and is quite a reasonable way to work (for reasons I'll go into shortly), but in nearly 6 years working with AWS I had not once seen even the slightest clue that this was the case. 

So why is this reasonable?  Mainly, I posit, for performance.  It is likely much quicker to check a set of tags against the rules than it is to do all the usual IP address matching (even /32's).  In thinking about this some more, it seems likely that the tags won't be the security group names/identifiers that are used in the UI/API, because they're quite long and not quick to check.  Rather, I'm guessing that they are small numeric identifiers, probably only unique per VPC.  This is slightly backed up by the default limits in EC2:

  1. 5 security groups per interface, which can be increased if you ask, but only to a maxium of 16, with a corresponding constraint on the limit of rules per security group.  Most importantly though, 16 looks to me like a limit to the number of tags the encapsulation can contain
  2. 500 Security Groups per VPC which has no stated direct maximum, but the documentation contains the statement that "The multiple of the number of VPCs in the region and the number of security groups per VPC cannot exceed 10000." 

This suggests that maybe it's a customer or account id in the tag, not a VPC id, and that each security group tag is perhaps 14-bits long (up to 16K groups), although it seems odd that there's such a gap from the documented limit (10K) and the theoretical capacity of that size of field, so I may be missing something interesting here.

The other reason it's a good way to work is that there's no need to go updating firewall rules when an entity gets a Security Group attached to it; packets leaving the entity after that will get the new SG tag, and the rules being applied at the target end will be not need to be even identified, let alone touched.  Clearly this is much more efficient.

It's always fun finding little clues like this, and using them to extrapolate how things are working under the hood.  It's one of the most delightful aspects of working in IT, in my humble opinion.  No doubt in another 5 years I'll find some other hint that will blow my mind further.  Yay!