System design basics – II

Tags: , , ,

System design basics – Cache

In last post, we discussed the design fundamentals for database, in this post let’s talk about caching.

What is caching? Caching is a mechanism to store and fetch data faster by applications without referring to the original source of data. Usually, applications store data which they think will be used again, instead of doing a round trip to the original source, applications use the stored data to serve requests. These are mostly software caches implemented using main memory.

caching

A cache is a storage which stores a subset of the entire data based on current usage and probability of its future usage for faster execution of requests. This storage is typically in main memory which has must faster read time compared to disk.

In typical microprocessors, hardware cache exists which between CPU and main memory, these are very high-speed miniature memories which store instructions and data, which are processed by CPU, referred to as L1, L2 caches.

In distributed web systems, caching can be done at various places and levels which we will discuss in a while.

Caching eviction strategy

As mentioned above, cache stores a subset of the original data. Question is which subset and how does it work when data which is not in the cache is requested? There are different approaches to select which data stays in the cache, these are called cache eviction strategies.

Before understanding more on eviction strategies, a cache miss is an event where the application did not find data it requested in the cache. A cache hit is when the application did find what requested in the cache.

  1. First in first out : In the event of a cache miss, the entity which came first into the cache is evicted first. Every entity has a timestamp associated with it and the oldest timestamp entity is removed first.
  2. Least recently used: In the case of a cache miss, a page which was used least recently gets evicted from the cache.
  3. Least frequently used: In case of a cache miss, the entity which is least frequently used among all the entities in the cache is thrown out of the cache.

There are other eviction strategies like minimum used eviction or heuristic-based eviction.

Cache hit ration is dependent on many parameters, first, the size of cache key-space, the more unique cache keys your application generates, the less chance you have to reuse any one of them. Always consider ways to reduce the number of possible cache keys. second, the number of items you can store in the cache, the more objects you can physically fit into your cache, the better your cache hit ratio. Third, longevity, how long each object can be stored in cache before expiring or being invalidated.

Caching levels

Caches can be applied and leveraged throughout various layers of technology including Operating Systems, Networking layers including Content Delivery Networks (CDN) and DNS, web applications, and Databases.
In a web application, most of the time, user requests need the same data to fulfill those request. If each request starts hitting your database, your servers will be overloaded and response time will slow. To avoid unnecessary load on servers and to decrease the response time, we place caches in between our databases and application. These caches can be on the same servers as the database, completely separate servers or at the application servers. Based on the metric and function you want to optimize, use appropriate caching level.

– Client caching
Caching which is done at the client-side like operating system and browser of the user. Typical examples are Address Resolution Protocol and static assets like HTML and CSS. Remember, you did nothing in this case, everything is done by browser and not your system.

– CDN caching
Imagine that you want to serve your static content like Javascript files, HTML templates, CSS without going to web servers. Anyways, web servers do not anything with static content than just passing it along. This is where Content Distribution Networks come in the picture. One can create CDNs near geographic locations of users and server static content to users from the nearest CDN, which makes website or app load faster.

– Server caching
We could have a cache directly on the servers. Each time a request is made to the service, the server will quickly return local, cached data if it exists. If it is not in the cache, the requesting node will query the data by going to network storage such as a database.
How this solution will scale as we may grow to many nodes? If we decide to expand to multiple nodes, it’s still quite possible to have each node host its own cache. However, if your load balancer randomly distributes requests across the nodes, the same request will go to different nodes, thus increasing cache misses.

Another design pattern to handle the cache miss problems is to have a common cache for the entire system which all server write and read from. This pattern scales better and even if the requests in the same session go to multiple servers, the user gets the consistent experience without latency. Trouble is that now you cache layer has become a bottleneck.

Caching approach

– Cache-aside
In this caching approach, we write directly on to the DB. The cache reads the info from DB in case of a miss and then stores it till the eviction policy moves it out of cache. This approach can lead to higher read latency in case of applications which write and re-read the information quickly.

– Write-through
Where writes go through the cache and write is confirmed as success only if writes to database and the cache succeed. There is data consistency between cache and database. If your cache crashes due to power failures or other disruptions and restarts, nothing will be lost. However, write latency will be higher in this case as there are writes to two separate systems.

– Write-behind (write-back)
In this approach we write on cache and it is confirmed as soon as it is done on cache without writing it on to database. This write is asynchronously synced to database. It results in quick write latency and high write throughput for the write-intensive applications. However, you have a risk of loss of data incase the caching layer dies because the only single copy of the written data is in the cache. We can improve this by having more than one replica acknowledging the write in the cache.

Advantages of using cache in your system design

1. Improve Application Performance
Because memory is orders of magnitude faster than disk (magnetic or SSD), reading data from the in-memory cache is extremely fast (sub-millisecond). This significantly faster data access improves the overall performance of the application.

2.Reduce Database Cost
A single cache instance can provide hundreds of thousands of IOPS (Input/output operations per second), potentially replacing a number of database instances, thus driving the total cost down. This is especially significant if the primary database charges per throughput.

3. Reduce the Load on the Backend
By redirecting significant parts of the read load from the backend database to the in-memory layer, caching can reduce the load on your database, and protect it from slower performance under load, or even from crashing at times of spikes.

4. Eliminate Database Hotspots
In many applications, it is likely that a small subset of data, such as a celebrity profile or popular product, will be accessed more frequently than the rest. This can result in hot spots in your database and may require overprovisioning of database resources based on the throughput requirements for the most frequently used data. Storing common keys in an in-memory cache mitigates the need to overprovision while providing fast and predictable performance for the most commonly accessed data.

5.Increase Read Throughput (IOPS)
In addition to lower latency, in-memory systems also offer much higher request rates (IOPS) relative to a comparable disk-based database. A single instance used as a distributed side-cache can serve hundreds of thousands of requests per second.