How we helped one of our clients save over 30% in AWS billing.
During the course of this brief review we are going to tell you how we helped one of our clients achieve a reduction of 35% in billing with respect to their total AWS costs in 2020 and 25% if we take into account the total that they would have spent without making their engineering more efficient.
First stage: Darkness
After our client declared its digital transformation efforts back in 2017, all technology related areas began an intense journey to accompany that objective. Without a doubt, migrating to the cloud was one of the most significant changes and as the challenge grew, we took as a parameter two of the five pillars suggested by AWS: Cost Optimization and Performance Efficiency. About this last one we can say:
The Performance Efficiency pillar includes the ability to use computing resources efficiently to meet system requirements and maintain that efficiency as demand changes and technologies evolve.
When we looked at things from this angle we were forced to ask what they were spending to improve efficiency. From the roadmap that we initially proposed, a microteam was formed to deal exclusively with this task: combining a technical cloud look and a financial vision.
At the end of 2019, we started the task of giving our client visibility of all the costs derived from the cloud, which implied a step beyond the simple control of monthly billing. As a first requirement we had to solve the budget planning for the coming year taking into account an imminent growth in cloud consumption based on the growth plans our client had.
We immediately put a key question on the table: what would be the annual spending projection. At that time we had to answer how much they were going to spend in 2020. We did what every good IT team does when facing a POC: aim for an exceeding number so as not to fall short.
From here we helped our client follow the following steps:
Putting together cost models (identifying those responsible, accounts, teams) all with the aim of breaking down each AWS invoice.
We established a structure to rank AWS accounts.
We identified idle resources as remnants of old architectures, other elements that were oversized and an autarky in the generation of formulas or recipes to implement architecture.
We define objectives to attack.
The set of actions applied allowed us to obtain the tip of the sword needed to attack costs: Visibility.
Second stage: Our clients understand their costs. Now what to do?
After sweeping all of our clients equipment and validating what was unnecessary, we eliminated unnecessary resources achieving immediate savings. Enthusiastic about the first results, we achieved a from our client to work with full efficiency moving forward.
After achieving the first savings for unused resources, cross services began to be looked into on all the accounts that were highlighted in our clients monthly bill.
The first service that made noise was the EC2 service that piles up the costs of elastic computing capacity on AWS. Hence we put focus on the EC2 instances that spend per hour and depending on the computing capacity -this cost per hour is variable-. Inventories are carried out by accounts in which we discriminate the type of instance and the amount of volume associated with them (EBS). The result was to be expected: everyone when provisioning computation defined very expensive instances that in very rare cases used their full capacity. Contemplating that at the same time they used past gen families, which are always more expensive due to energy disadvantages or maintenance.
Studying workloads, our DevOps teams began to find usage patterns that conformed to specific instance families. Use recommendations for non-productive and productive environments emerged from this analysis.
At the same time, a general oversizing is identified in the EBS volumes associated with the instances. Just by applying a smaller volume we achieve significant savings for our client.
Identifying savings opportunities was good, but applying them involved coordination and time when teams could not work (during updates). At that time, much of our clients infrastructure was supported by Kubernetes clusters, which after a while had to be updated, a task that fell to the Cloud Team. When updating the whole environment, it was not available. Here we propose the idea of not only updating the cluster but of applying the improvement opportunities that were discovered with the DevOps analysis: Carrying out the first resizing of the entire infrastructure.
At the end of the 1st month, results were beginning to be seen. The environments continued to operate normally but supporting their workloads with cheaper and many times more efficient families (The recommendation is always to use the most current instance family).
At this point we had eliminated what was unnecessary and normalized all environments.
By resuming the analysis of the computational patterns, we identified an unnecessary cost that, by eliminating it, would dramatically reduce the cost of non-productive environments.
We noticed that at the end of our clients' work-hours, the use of RAM and CPU dropped sharply during non-business hours. We understood immediately that this was to be expected as the development teams weren't working. Automation is set up to shut down non-productive environments during non-working hours and on weekends. This reduced the non-productive computation from one month.. to just 10 days! (Off from 7pm to 7am and weekends). The savings were instantaneous and the work teams were not impacted in their daily activity.
We called this stage: Standardization.
Third stage: Discovery.
By this time our clients monthly bill had deflated, We were helping them tackle detours and to emphasize good practices.
During November 2019, the “AWS re: Invent” event was held, an event in which new features, services and all the news from AWS are revealed. One of these announcements had particularly caught our attention: the new "Saving Plan" computer purchase option. This came to complement the already existing reserved instances. The idea of this new service is to commit to a period of no less than one year to a certain amount of computation per hour. The idea was backlogged because at that time our client still had a bit of chaos and a lack of definitions as to where to go next. But, after our standardization process, candidate families were going to stop supporting charge-on-demand.
We carried out the first POCs in productive and non-productive accounts that were standardized and of which the family of instances was not expected to change during the year (productive instances are generally the most expensive) . The savings began to be seen the first day applying the savings plan.
Although the first “Saving Plan” was applied to families that were standardized as the optimal ones for our clients' environments (Instance Saving Plan), after a sprint we saw no reason not to support the remaining workload. It was decided to opt for a Compute Saving Plan from our clients cost centralizing account to apply discounts to all accounts. This type of savings plan applies discounts to all families and includes services such as Lambda and Fargate. Once applied, our client began to see a drastic decrease in costs day by day.
The last big milestone that helped to reduce costs was having our client join the Enterprise Discount Program (EDP), an Amazon Web Services business discount program, in which a percentage of discount is granted based on consumption companies commit to AWS.
We call this last stage “Maturity” since in it, without being satisfied with the savings obtained, we understood that our clients level of maturity was sufficient to migrate from the purchase-on-demand model to others with pre-agreed commitments and understanding it allowed them to have significant savings.
Everything implemented led to a double process that was not minor in the teams and in the ways of working: moving from a reactive mindset (controlling spending on billing) to a proactive paradigm: applying a strategic way of working to maximize efficiency not only in the billing but also the use of resources.