Matching Supply and Demand
When moving to the cloud, pay only for what is needed. When the supply of IT services matches the demand for those services at the time they’re needed, it eliminates the need for costly and wasteful overprovisioning. However, the economic benefits of just-in-time supply need to be balanced against the need to provision to account for resource failures, high availability, and provision time. Depending on whether the demand is fixed or variable, plan to create metrics and automation that will ensure management of the cloud environment is minimal – even as it is scaled.
Use a number of different approaches to match supply with demand:
Leveraging the elasticity of the cloud to meet demand as it changes provides significant cost savings. Elasticity refers to the virtually unlimited capacity of the cloud, where the vendor is responsible for capacity management and provisioning of resources. By taking advantage of API’s or service features, it is possible to programmatically vary the number of cloud resources in the architecture dynamically. This allows components in the architecture to be scaled, automatically increases the number of resources during demand spikes to maintain performance, and decreases capacity when demand is reduced to reduce costs.
This is typically accomplished using Auto Scaling, which helps to scale compute instances capacity up or down automatically according to defined conditions. Auto Scaling is generally used with a Load Balancer to distribute incoming application traffic across multiple compute instances in an Auto Scaling group. Auto Scaling is triggered using scaling plans that include policies that define how to scale (manual, schedule, and demand) and the metrics and alarms to monitor. Metrics are used to trigger the scaling event, CPU utilization, network throughput, and Load Balancer observed request/response latency, and even custom metrics that might originate from application code on the compute instances.
Use custom code or metrics to trigger Auto Scaling in response to a business event.
When architecting with a demand-based approach, keep in mind two key considerations First, how quickly new resources need to be provisioned. Second, that the size of the margin between supply and demand will shift. Be ready to cope with the rate of change in demand, and also be prepared for resource failures.
Optimize the speed of provisioning by reducing the startup and configuration tasks of the compute instances run at boot. This is often done using prebaked instance images. Note that this optimization will be at the expense of the configurability of these instances. Balance speed versus configurability based on the particular needs of the architecture. The margin between supply and demand is extensive when starting development, and with time, is reduced through experimentation, load testing, and confidence-building as the architecture moves into test and production.
A buffer-based approach to matching supply and demand uses a queue to accept messages (units of work) from producers. For resiliency, the queue is using durable storage (disks or a database). A buffer is a mechanism to ensure that applications continue to communicate with each other when they are running at different rates over time. Messages are read by consumers, which allows the messages to run at a rate that meets the consumers’ business requirements. Using a buffer-based approach allows the user to decouple the throughput rate of producers from that of consumers, which removes the need to worry about producers having to deal with data durability and backpressure. (When producers slow down because their consumer is running slowly.)
When a workload generates significant write load that doesn’t need to be processed immediately, a buffer is used to smooth out demands on consumers.
Users architecting with a buffer-based approach need to keep in mind two key considerations. First, what is the acceptable delay between producing the work and consuming the work? Second, how are duplicate requests for work handled?
A time-based approach aligns resource capacity to demand that is predictable or well-defined by time. This approach is typically not dependent upon utilization levels of the resources. A time-based approach ensures that resources are available at the specific time they are required, and provided without any delays due to startup procedures and system or consistency checks. Using a time-based approach provides additional resources or increased capacity during busy periods.
Auto Scaling configured with scheduled scaling is an example of how to put a time-based approach in place. Systems are scheduled to scale out or in at defined times - for example, at the start of business hours - thus ensuring that resources are available when users arrive.
Leverage cloud API’s and Devek to automatically provision and decommission entire environments as these are needed. This approach is well-suited for development or test environments that run only in defined business hours or periods of time.
Use API’s to scale the size of resources within an environment (vertical scaling). For example, scale up a production system by changing the instance size or class. This is achieved by stopping and starting the instance and selecting the different instance sizes or types. This technique is also applied to other resources, block storage volumes modified to increase capacity, adjust performance (IOPS), or change the volume type while in use.
When architecting with a time-based approach, keep in mind two key considerations: First, how consistent is the usage pattern? Second, what is the impact when the pattern changes? Increase the accuracy of predictions by monitoring systems and by using business intelligence. When any significant changes in the usage pattern are seen, adjust the times to ensure that coverage is provided.
Join Devek in reducing Cloud complexity
Looking to reduce complexity of cloud infrastructure? Look no further, we are here to make it happen!
Please leave some details and we will get back to you when Devek is available for trying out.