• Trade-offs in Performance Efficiency

When architecting solutions, think about trade-offs to ensure an optimal approach. Depending on the situation, trade consistency, durability, and space versus time or latency, to deliver higher performance.

Using the cloud, it is possible to go global in minutes and deploy resources in multiple locations across the globe to be closer to end users. Dynamically add read-only replicas to information stores, like database systems, to reduce the load on the primary database. Use caching solutions to provide an in-memory data store or cache, and content delivery networks (CDN’), which caches copies of static content closer to end users.

The following describes some of the trade-offs to make and how to implement them:


Most workloads rely on a dependent component, a service or database that offers a source of truth or a consolidated view of data. Generally, these architecture components are harder to scale and represent a significant proportion of the cost of the workload. Improve performance efficiency by using caching to trade off against freshness or memory used. These techniques generally update asynchronously or periodically. The trade-off is that the data isn’t always fresh and, therefore, not always consistent with the source of truth.

Application Level Caching

It is possible to make this trade-off at a code level by using application-level caches or memorization. When requests are cached, execution time is reduced. This provides a way to scale horizontally through the caching layer, and reduces the load on the most heavily used components.

In-memory and distributed caches are used in two main cases: coordinating transient data and state between distinct servers, such as user sessions in an application, or protecting databases from read-heavy workloads by serving the most requested elements directly from memory.

Platforms like Redis, Memcached, or Varnish deployed on compute instances will provide robust caching engines for applications. The design and scale of these platforms are usually based on the memory of the instance and by designing appropriate key management in the applications when activating a cluster model. (For example, for Redis clustering, consistent hashing).

Cloud providers offer managed services for these platforms, making it easy to deploy, operate, and scale an in-memory data store or cache in the cloud. For example, Memcached with support of sharding to scale in-memory cache with multiple nodes. Or Redis that includes clustering, with multiple shards forming a single in-memory key-value store that is terabytes in size, plus read replicas per shard for increased data access performance.

Database Level Caching

Database replicas enhance performance databases by replicating all changes to the master databases to read replicas. This replication makes it possible to scale out beyond the capacity constraints of a single database for read-heavy database workloads.

Managed database services provide read replicas as a fully managed service, which enables the creation of one or more replicas of a given source database and serves high-volume application read traffic from multiple copies of the data, thereby increasing aggregate read throughput. It is possible to add additional indexes to the read replica, where the database engine supports it. For example, by adding more indexes to the MySQL read replica. For latency-sensitive workloads, use the Multi-AZ feature to specify which Availability Zones the read replica needs to be in to reduce cross-Availability Zone traffic.

Geographic Level Caching

Another example of caching is using a CDN, which is a good way to reduce latency for clients. CDN’s are used to store static content and to accelerate dynamic content. Consider using a CDN for API’s as well; even dynamic content is able to benefit through the use of network optimization methods.

Managed global cloud CDN services are used to deliver entire websites, including dynamic, static, streaming, and interactive content, using a global network of edge locations. Requests for this static content is automatically routed to the nearest edge location, so content is delivered with the best possible performance.

Partitioning or Sharding

When using technologies like relational databases that require a single instance due to consistency constraints, and only scale vertically (by using higher specification instances and storage features), hitting the limits of vertical scaling mandates the use of a different approach called data partitioning, or sharding. With this model, data is split across multiple database schemas, each running in its own autonomous primary DB instance.

Managed relational database engines in the cloud remove the operational overhead of running multiple instances, but sharding will still introduce complexity to the application. The application’s data access layer will need to be modified to gain awareness of how data is split so that it directs queries to the right instance. In addition, any schema changes will have to be performed across multiple database schemas, so it is worth investing some effort to automate this process.

NoSQL database engines will typically perform data partitioning and replication to scale both reads and writes in a horizontal fashion. They do this transparently without the need of having the data partitioning logic implemented in the data access layer of the application. NowSQL databases manage table partitioning automatically, adding new partitions as the table grows in size, or as read- and write-provisioned capacity changes.

Partitioning or sharding provides a way to scale write-heavy workloads, but requires that data is evenly distributed and evenly accessed across all partitions or shards. It does introduce complexity in relational database solutions, while NoSQL solutions generally trade consistency to deliver this.


Compressing data trades computing time against space and greatly reduces storage and networking requirements. Compression applies to file systems, data files, and web resources like stylesheets and images, but also to dynamic responses like API’s.

Managed CDN’s support compression at the edge. Even when the source system serves resources in a standard fashion, the CDN will automatically compress the resources if, and only if, the web clients support it.


Buffering uses a queue to accept messages (units of work) from producers. For resiliency, the queue uses durable storage. A buffer is a mechanism to ensure that applications communicate with each other when they are running at different rates over time. Messages are then read by consumers, which allows the messages to run at the rate that meets the consumers’ business requirements. By using a buffer, it is possible to decouple the throughput rate of producers from that of consumers. This eliminates worry about producers having to deal with data durability and backpressure (where producers slow down because their consumer is running slowly).

When using a workload that generates significant write load that doesn’t need to be processed immediately, use a buffer to smooth out demands on consumers.

When architecting with a buffer, keep in mind two key considerations. First, what is the acceptable delay between producing the work and consuming the work? Second, how are duplicate requests for work handled?

Join Devek in reducing Cloud complexity

Looking to reduce complexity of cloud infrastructure? Look no further, we are here to make it happen!
Please leave some details and we will get back to you when Devek is available for trying out.

we only use this e-mail for the release announcement

or e-mail us at contact@devek.cloud
Devek will become available in the following months.