Last week, Amazon Web Services unveiled a new rationale behind their Elastic Load Balancing offering.
The introduction of the ELB Network Load Balancer, a high performance TCP load balancing middleware, felt like a good opportunity to explore how the Peak Scalability metric provided by the Stacktical Scalability Report could be used to make the most out of an auto scaling AWS infrastructure.
After all, approximately 32% of your AWS bill is wasted in unused computing because of bad auto scaling policies…
Choosing the right Elastic Load Balancer on AWS
There are now 3 different available load balancers on AWS:
- The Network Load Balancer (NLB)
- The Application Load Balancer (ALB)
- The Classic Load Balancer (CLB)
As stated by the Amazon team itself:
“You can select the appropriate load balancer based on your application needs. If you need flexible application management, then we recommend you to use Application Load Balancer. If extreme performance and static IP is needed for your application, then we recommend you to use Network Load Balancer. If you have an existing application that was built within the EC2-Classic network, then you should use Classic Load Balancer.”
For starters, we recommend you to rule out and migrate off CLB since this type of Load Balancer won’t allow traffic to more than one port of an Instance.
This would make it rather hard to optimize resource usage. e.g. Hosting multiple Docker containers on the same EC2 instance using Kubernetes.
More importantly, CLB doesn’t currently seem to support custom metrics in auto scaling policies. (which is what this article is about).
AWS offers a straight-forward tool to migrate from CLB while keeping its configuration. It’ll allow you to easily register your existing instances to your shiny new ALB or NLB.
Among other differences, NLB works at the TCP level whereas ALB provides more advanced Load Balancing (L7) for HTTP and HTTPS traffic. NLB also offers a significant decrease in latency (from approximately 400ms down to 100ms) compared to the rest of the ELB offering, and can theoritically handle millions of request per second.
Overall, ALB seems to be more suitable to a Docker-based microservices architecture, while NLB will make a lot of sense if you aim at handling huge spikes of traffic with the best possible latency.
Auto Scaling with Target Tracking Scaling Policies
Now that you’ve chosen the right Load Balancer for your need, the next step is to set your Target Tracking Scaling Policies.
“With target tracking, you select a load metric for your application, such as “Average CPU Utilization” or the new “Request Count Per Target” metric from Application Load Balancer, set the target value, and Auto Scaling adjusts the number of EC2 instances in your Auto Scaling group as needed to maintain that target.”
In short, it’s never been easier to setup Auto Scaling for your infrastructure. It comes down to defining the target metric that will trigger autoscaling, and their value.
But what target value shall you set exactly?
CPU-based metrics VS Concurrency-based metrics
It’s a common practice to see Ops auto scale their group when an instance hits 75% to 80% of CPU usage. That is largely due to the fact that before Stacktical, there were no easy way to calculate the concurrency of your services.
There are two main reasons CPU-based auto scaling practices are wasteful.
It’s an approximate metric
If CPU usage is unreliable, it’s because there is no definite way to measure it in the first place. With the exact same hardware, results will vary from one monitoring implementation to another. That also means they will most likely vary between AWS, Azure and GCP.
For example, monitoring agents and Application Performance Management softwares have their own custom CPU usage formulas. They will never match your AWS metrics.
Furthermore, as a virtualized environment, AWS relies on hypervisors to manage resources. When needed, hypervisors can steal CPU for use by other EC2 instances, introducing a measurement bias in the process.
Last but not least, a lack of AWS CPU credits can affect the measured CPU usage because of burst and threshold mechanisms.
It knows nothing about business
A high CPU usage level doesn’t necessarily mean that you have reached the maximum number of simultaneous users (and operations per second) your application can handle.
After all, some applications are vastly more CPU-intensive than others, right?
Because server CPU usage lack business transaction context, it cannot be deemed a relevant capacity metric for your application.
Concurrency-based metrics, on the other hand, provide much more fine-grained control over how your groups will auto scale.
Eliminating waste using Stacktical concurrency metrics as target values
Contrary to their CLB counterpart (as mentioned earlier), the NLB and ALB load balancers let you access specific custom metrics that you can use to define your auto scaling policies.
The Network Load Balancer provides an
ActiveFlowCount custom metric, the total number of concurrent TCP flows (or connections) from clients to targets.
The Application Load Balancer provides an
ActiveConnectionCount custom metric, the total number of concurrent TCP connections active from clients to the load balancer and from the load balancer to targets.
Based on our Stacktical Scalability Report demo (Fig. A), we could set our
ActiveConnectionCount targets to a value of 201 peak concurrent users to create an efficient auto scaling scaling policy for our demo application.
We call this process “right-sized auto scaling”.
AWS Elastic Load Balancing offering is great, as long as you’re paying attention to the relevance of your Target Tracking Scaling Policies.
ALB and NLB Load Balancers let you leverage custom target metrics in your auto scaling policies while Stacktical lets you set the right target value to save 32% on your AWS bill.
To learn more about Stacktical, go to stacktical.com.
For more information about Target Scaling Policies, follow this link and don’t hesitate to explore the AWS ELB offering on the AWS ELB product page.