Stop bleeding money in the cloud: Getting started with cloud cost optimisation

In the days BC (Before Cloud), companies had to strategically plan their infrastructure requirements up front and well in advance of when they actually needed a service or system. They’d plan their capacity requirements up to five years in advance with purchases of expensive items such as storage, servers, networking equipment, firewalls, not to mention the internet connectivity contracts and upstream DDOS protection services if they were an online business. These items were expensive, and also highly configurable which meant infrastructure teams had to spend a lot of time carefully designing future-state architectures and then costing them out.

Procurement teams and management invariably questioned the costs (often exorbitant in nature) before finally approving them and placing the orders. Then came the wait for hardware delivery before teams could get stuck into the racking, cabling, network and firewall configuration, OS installation, storage allocation and testing which took a few more weeks. Eventually up to six months later, you could hand over the servers to the application teams to do their thing.

Fast forward to today, widespread cloud-adoption means businesses are now able to spin up resources on-demand, enabling them to move significantly faster than previously possible. Adoption of the cloud has also meant that the upfront financial planning process has largely fallen away. However, one would be foolish to think that financial strategy is no longer needed. If anything, it's needed now more than ever before; it's just now applied retrospectively. This is where FinOps and Cloud Cost Optimisation practices are important - they are effectively the mirror image of the old-school planning processes applied later with the goal of ensuring the best value is delivered to the business.

Busting the ‘Cloud is more expensive’ myth

Where previously teams would predict usage needs and allocate chunks of compute and storage to various teams and products during the planning phase, now that happens at the point when resources are created.

Companies that are not on top of FinOps and cost optimisation end up running their systems in the cloud without any cost controls which carries massive risk and can result in chaotic cloud sprawl. This can lead managers to develop the perception that running in the cloud is more expensive than on-prem.

More often than not, a well-run cloud environment is cheaper than its on-prem equivalent as long as the effort is put in to optimise workloads accordingly.

Plus you can switch it off and stop paying for it if your needs change.

Three steps to help you get started optimising your cloud costs

How do you ensure that by choosing the cloud, you don’t end up with unnecessary wastage?

It all starts from the top: If there is a strong culture of cost awareness and frugality in an organisation, then cloud cost optimisation and FinOps will naturally fit in well and be supported. After all, they are just regular cost centres after all. Generally, the higher-ups don’t need the tech detail, they want to know who is spending what, and why. In order to create this sort of visibility, there needs to be accurate identification and allocation of cloud resources. That’s why you first need to look at tagging.

Step 1. Tagging resources

Tagging refers to the application of easy to understand labels to individual resources in the cloud. All cloud providers offer the facility to tag resources and it is the single most important activity to perform when building systems in the cloud. The reason for this is that without tags it is impossible to understand what everything is and therefore, to know who to speak to about the costs of the resources themselves. A clear tagging strategy means clear ownership and less time trying to identify who owns what!

Tagging is linked to Ownership - in fact an ‘Owner’ tag should be mandatory for all resources in all organisations. The owner is responsible for creation, management and destruction of cloud resources. They are the ones who best understand what a thing is and what it does. Tagging with certain mandatory tags (like an ‘Owner’ tag) can be enforced for all newly created resources, and this is definitely something that should be applied in any cost-conscious environment.

Note: not all cloud resources are taggable. For example, some data transfer and shared service costs are not. However, usually around 90% or more of a cloud architecture can be clearly tagged.

Once tagging is in place you can work on creating visibility for stakeholders. Basic reporting and visibility is available in all cloud portals, and third-party tools exist for more granular or specific reporting requirements.

Step 2. Detecting anomalies and managing budgets

If you are looking for quick wins, start by using readily available tools that offer anomaly detection and budget management services. The AWS Anomaly Detection tool is a machine learning based tool which is very easy to configure and generates alerts when spend deviates from the baseline. This offers some peace of mind for paranoid bill-payers, although it doesn’t solve the root cause of the cost anomaly itself.

AWS Budgets can be configured to provide guardrails, with alerts set up at a service, account or tag level. Budgets help to set a ‘normal’ baseline from which costs can be tracked. Owners should always understand what ‘good’ looks like for their responsibility areas. Of course, budgets are not fixed and can be adjusted upwards or downwards however, having a budget ensures that discussions about the ‘normal baseline’ can take place in the right way.

Step 3. Exploring time-based commitments

Next, take a look at using time-based commitments, which offer discounts in the form of Reserved Instances or Compute Savings Plans in the AWS world. Reserved Instances are specific to a resource e.g. a specific instance in a specific region, whilst Compute Savings Plans can be applied across a range of compute types including EC2 instance families, Fargate and Lamda. Commitments can be for a period of 1 or 3 years with discounts of up to 40%. More commitments for longer gives the cloud provider the confidence to invest in the requisite infrastructure at their sites, so it really is a win for customer and cloud provider. Of course it is tempting to max out the savings opportunities using these plans, however care should be taken not to over-commit as unused commitments can erode expected savings if workloads change.

Procurement teams may also be able to negotiate a further discount in some instances with the cloud provider based on their overall commitment to that provider for a period of time (typically 3-5 years) and a minimum annual spend commitment.

How can Mechanical Rock help you optimise your cloud costs?

Mechanical Rock has helped many companies optimise their AWS costs, beginning with Cloud Cost Optimisation assessment.

If you want to make sure you’re getting the most out of your cloud provider, let’s chat today.