System design basics – II

System design basics – Cache

In last post, we discussed the design fundamentals for database, in this post let’s talk about caching.

What is caching? Caching is a mechanism to store and fetch data faster by applications without referring to the original source of data. Usually, applications store data which they think will be used again, instead of doing a round trip to the original source, applications use the stored data to serve requests. These are mostly software caches implemented using main memory.


A cache is a storage which stores a subset of the entire data based on current usage and probability of its future usage for faster execution of requests. This storage is typically in main memory which has must faster read time compared to disk.

In typical microprocessors, hardware cache exists which between CPU and main memory, these are very high-speed miniature memories which store instructions and data, which are processed by CPU, referred to as L1, L2 caches.

In distributed web systems, caching can be done at various places and levels which we will discuss in a while.

Caching eviction strategy

As mentioned above, cache stores a subset of the original data. Question is which subset and how does it work when data which is not in the cache is requested? There are different approaches to select which data stays in the cache, these are called cache eviction strategies.

Before understanding more on eviction strategies, a cache miss is an event where the application did not find data it requested in the cache. A cache hit is when the application did find what requested in the cache.

  1. First in first out : In the event of a cache miss, the entity which came first into the cache is evicted first. Every entity has a timestamp associated with it and the oldest timestamp entity is removed first.
  2. Least recently used: In the case of a cache miss, a page which was used least recently gets evicted from the cache.
  3. Least frequently used: In case of a cache miss, the entity which is least frequently used among all the entities in the cache is thrown out of the cache.

There are other eviction strategies like minimum used eviction or heuristic-based eviction.

Cache hit ration is dependent on many parameters, first, the size of cache key-space, the more unique cache keys your application generates, the less chance you have to reuse any one of them. Always consider ways to reduce the number of possible cache keys. second, the number of items you can store in the cache, the more objects you can physically fit into your cache, the better your cache hit ratio. Third, longevity, how long each object can be stored in cache before expiring or being invalidated.

Caching levels

Caches can be applied and leveraged throughout various layers of technology including Operating Systems, Networking layers including Content Delivery Networks (CDN) and DNS, web applications, and Databases.
In a web application, most of the time, user requests need the same data to fulfill those request. If each request starts hitting your database, your servers will be overloaded and response time will slow. To avoid unnecessary load on servers and to decrease the response time, we place caches in between our databases and application. These caches can be on the same servers as the database, completely separate servers or at the application servers. Based on the metric and function you want to optimize, use appropriate caching level.

– Client caching
Caching which is done at the client-side like operating system and browser of the user. Typical examples are Address Resolution Protocol and static assets like HTML and CSS. Remember, you did nothing in this case, everything is done by browser and not your system.

– CDN caching
Imagine that you want to serve your static content like Javascript files, HTML templates, CSS without going to web servers. Anyways, web servers do not anything with static content than just passing it along. This is where Content Distribution Networks come in the picture. One can create CDNs near geographic locations of users and server static content to users from the nearest CDN, which makes website or app load faster.

– Server caching
We could have a cache directly on the servers. Each time a request is made to the service, the server will quickly return local, cached data if it exists. If it is not in the cache, the requesting node will query the data by going to network storage such as a database.
How this solution will scale as we may grow to many nodes? If we decide to expand to multiple nodes, it’s still quite possible to have each node host its own cache. However, if your load balancer randomly distributes requests across the nodes, the same request will go to different nodes, thus increasing cache misses.

Another design pattern to handle the cache miss problems is to have a common cache for the entire system which all server write and read from. This pattern scales better and even if the requests in the same session go to multiple servers, the user gets the consistent experience without latency. Trouble is that now you cache layer has become a bottleneck.

Caching approach

– Cache-aside
In this caching approach, we write directly on to the DB. The cache reads the info from DB in case of a miss and then stores it till the eviction policy moves it out of cache. This approach can lead to higher read latency in case of applications which write and re-read the information quickly.

– Write-through
Where writes go through the cache and write is confirmed as success only if writes to database and the cache succeed. There is data consistency between cache and database. If your cache crashes due to power failures or other disruptions and restarts, nothing will be lost. However, write latency will be higher in this case as there are writes to two separate systems.

– Write-behind (write-back)
In this approach we write on cache and it is confirmed as soon as it is done on cache without writing it on to database. This write is asynchronously synced to database. It results in quick write latency and high write throughput for the write-intensive applications. However, you have a risk of loss of data incase the caching layer dies because the only single copy of the written data is in the cache. We can improve this by having more than one replica acknowledging the write in the cache.

Advantages of using cache in your system design

1. Improve Application Performance
Because memory is orders of magnitude faster than disk (magnetic or SSD), reading data from the in-memory cache is extremely fast (sub-millisecond). This significantly faster data access improves the overall performance of the application.

2.Reduce Database Cost
A single cache instance can provide hundreds of thousands of IOPS (Input/output operations per second), potentially replacing a number of database instances, thus driving the total cost down. This is especially significant if the primary database charges per throughput.

3. Reduce the Load on the Backend
By redirecting significant parts of the read load from the backend database to the in-memory layer, caching can reduce the load on your database, and protect it from slower performance under load, or even from crashing at times of spikes.

4. Eliminate Database Hotspots
In many applications, it is likely that a small subset of data, such as a celebrity profile or popular product, will be accessed more frequently than the rest. This can result in hot spots in your database and may require overprovisioning of database resources based on the throughput requirements for the most frequently used data. Storing common keys in an in-memory cache mitigates the need to overprovision while providing fast and predictable performance for the most commonly accessed data.

5.Increase Read Throughput (IOPS)
In addition to lower latency, in-memory systems also offer much higher request rates (IOPS) relative to a comparable disk-based database. A single instance used as a distributed side-cache can serve hundreds of thousands of requests per second.

System design interview basics – I

System design interview basics

The system design round is one of the rounds in any technical interview. The idea of this round is to know your design skills, analytical and trade-off skills, to see if you can take a fuzzy problem, break it down it in small executable chunks and actually solve the problem. The beauty is that there are no correct or incorrect answers, it is more of a discussion.

To have a fruitful discussion, one must know the basics and should be able to prove his/her claims with facts. This article discusses what are all those basic technical concepts you should be aware of when you go into a system design round.

System Design concepts: Databases

Databases are essentially required in any application/system unless it is a non-authenticated, static, read-only and not frequently changing content only application. It is therefore important to know a few concepts around databases.
First of all, understand that these concepts are independent of what actual DB used, it can be open source like MySQL or proprietary like MS-SQL Server.

Imagine a simple application like which is set up like the figure below. What are the problems with setup?

system design interview

There are three problems to start with among many others: First, what if the only DB server goes down or we have to take it down for maintenance? It impacts the system’s availability. Second, on the other instance, what if the server crashes and becomes unrecoverable, all application data in it is also gone, this affects the application’s data availability and persistence. Third, what if there are many reads and writes requests coming to the server. The only server will degrade in performance, simple read requests will take a lot of time waiting for complex writes to finish, impacting the performance of the overall system.

What can we do? Let’s set up a new server, which is nothing but a replica of the original server, and store everything which is on the original server. This process is called replication, the original server is called master and the copy server is called slave.

System design DB replication

How does it solve the problems: First, having a replicated server gives us the power to immediately switch to a new server in case the master goes down. The system will keep running as if nothing happened and nobody except your database administrators needs to know anything.
Second, you have the exact copy of the data, so even you lose data on one server, you can always extract that from the other.
Third, since the servers contain the exact same data, you can read from one server and write on another. This improves the performance of your read requests (which are usually more in any system) and is not blocked on write.

So in essence, replicating your DB servers gives you system availability, protection against data loss, and performance gain .

Usually, in production systems, there are two masters, (one stand by) and many slaves attached to those masters. Also, you want to keep a server with replication delayed by a few hours (24 hours ideally). Just in case if data on master get corrupted ( developer run a wrong query, file system issues) and it gets replicated immediately to all the servers, all of the data is now corrupted. This delayed replicated server can give you sane data although you lose data of a day.

System design replication of database

Read more about replication in MySQL

Where is the catch, where is the tradeoff here? We improved availability and performance, however, we have risked the consistency. Imagine a case, where data is written on the master. Due to some lag, data replication to slave did not happen quickly. If another service reads from the slave the same data, either it does not get the data or gets an obsolete piece of data.

Replication improves availability and performance, however, risks consistency.

A database is essentially a collection of tables that are nothing but rows and columns. The number of rows and columns in tables vary. Let’s have a very popular application and half of the earth’s population is registered to your application. In this case, your user table has too many rows. On the other hand, you have 10K employee and you want to store everything about those employees in one table called employee, here you have a table which has too many columns.

Before understanding partitioning, let’s understand why too many rows or columns in the table are problems for a system?
With too many rows, index size will be large and search performance will degrade, this will impact the overall performance of the service for a specific user. Let’s say a few thousands of users are accessing your service which queries this table, one user query starts degrading the other and your system goes into this spiral of degrading performance. If this big data gets corrupted, it impacts the availability of service to all the users.

With too many columns, you will be pulling a lot of data in every query even when you need just a few columns out of those. This will indirectly impact the performance and put a lot of unnecessary data on the network.

What is the solution to too many rows or columns? The answer is partitioning. There are two types of partitioning:
Horizontal partitioning
In horizontal partitioning, a table is divided into multiple partitions, each partition stores rows of the table. These partitions can be stored in multiple database clusters. One of the examples is that all the users from Germany, Netherlands, and France in a table called EuropeUser whereas all the users in USA and Canada are stored in a table called NorthAmericaUser and so on for Asia, Africa, etc. If you want to know all the users of your service, all you need to unite all these tables.
Gain is that now you have to search less number of rows to find a European user compared to earlier, which improves query performance and hence improves the response time of service. If table EuropeUser is corrupted or deleted, your service is still available in North America and Asia.

system design horizontal partitioning

Vertical partitioning
In vertical partitioning, we divide the table into multiple partitions and each partition contains certain columns. For example, an employee table with id, name, email, picture, salary, organization can be partitioned into three one with columns: id, name, and email; second: id, picture; Third: id, org, and salary.

vertical partitioning

There are multiple ways you can partition a table:

Range partitioning – selects a partition by determining if the partitioning key is inside a certain range. It distributes tuples based on the value intervals (ranges) of some attributes.

List partitioning – a partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen. For example, all rows where the column Country is either Iceland, Norway, Sweden, Finland, or Denmark could build a partition for the Nordic countries.

Composite partitioning – allows for certain combinations of the above partitioning schemes, for example first applying a range partitioning and then a hash partitioning.
Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the keyspace to a size that can be listed.

Round-robin partitioning – the simplest strategy, it ensures uniform data distribution. With n partitions, the ith tuple in insertion order is assigned to partition (i mod n). This strategy enables sequential access to a relation to be done in parallel.

Hash partitioning – applies a hash function to some attribute that yields the partition number. This strategy allows exact-match queries on the selection attribute to be processed by exactly one node and all other queries to be processed by all the nodes in parallel.

Partitioning improves your database availability and performance.

We already know that horizontal partitioning splits a big table into multiple partitions, however, these partitions are stored in the same schema and on the same server. Sharding takes it one step beyond, where partitions can be stored on multiple servers and in different schemas. This division of data and its storage on multiple servers gives the flexibility to store more data that does not fit on single servers and fast search responses on the data.

Algorithmic sharding
In algorithmic sharding, the client can determine a given partition’s database without any help. Usually, there is a partition key, we can derive cluster-id using that partition key. A simple hash function Hash(key) can be used as a sharding function. An example of such sharding implementation is Memcached.

Dynamic sharding
In dynamic sharding, an external locator service determines the location of data. To read and write data, clients need to consult the locator service first.

Sharding a database table before it has been optimized locally causes premature complexity. Sharding should be used only when all other options for optimization are inadequate.


This is true because adding sharding actual may reduce performance if your data is not uniformly distributed by creating hotspots. Also, it adds lot of operational overheads, adding a new node in the system may require re-sharding and moving data between nodes.

System design basics – III Consistent hashing

Consistent Hashing

To understand consistent hashing, first of all, we have to understand traditional hashing and it’s limitations in large scale distributed systems.

Hashing in plain terms is nothing but a key-value pair store, were given a key, the associated value can be found very efficiently. Example: Let’s say we want to find the name of a street in the city given its zip code. Idea is to store this information as hash as

The problem becomes more interesting when data is too big to store on one node or machine, multiple such nodes or machines are required in the system to store it. For example, a system which uses number of web caches. First question: How to decide which key goes on which node? The simplest solution is to use a modulo function to decide. Given a key, take the hash of the key,  divide it by the number of nodes in the system and then put that key on to that node. Similarly, when fetching key, hash the key, divide with the number of nodes and then go to that node and fetch the value. The picture depicts conventional hashing in the multi-node system.


Failures are common in commodity hardware-driven distributed multi-node system. Any node can die without any prior notice and the expectation is that the system runs unaffected with a slight cost on performance. What happens when in the system described above, a node goes down?  In traditional hashing, the total number of nodes has decreased, so function determining node to put key on or fetch key from, changes. New keys will be working fine. What happens to existing keys? They are all in the wrong nodes as per the new function.

traditional-hashing (1)

To fix this problem, we have to redistribute all existing keys on the remaining nodes, which may be a very costly operation and can have detrimental effects on the running system. Again what happens when a node comes back? well, repeat what was done when the node went down. This may result in thrashing effect where if a node misbehaves regularly, the system would do no good work except re-distribution of keys.

How to solve this challenge? This is where consistent hashing comes into the picture.

Consistent hashing in distributed multi-node systems

Consistent hashing comes up with a very simple idea, that is to have nodes and keys in the same id space, unlike traditional hashing where node id and keys were in two different id space. Node id can be a hash function to IP address and then the same hash function is applied to keys to determine which node key goes on or to fetch from.

A critical requirement for consistent hashing implementation is to have a hash function which is consistent irrespective of system view and map keys roughly uniformly on all machines.

Chose any base hash function such that it maps a keyspace to integers in the range [0..M]. Once we divide it with M, it gives us a unit circle. Now, each key once hashed represents a point on this unit circle.

Consistent hashing

How does a key map to a node exactly? Well, a key is hashed and then put key on to the first node you find while moving clockwise. Simple enough, huh? To find a key, take a hash and go to the first node while moving clockwise on to the unit circle.

consistent-hashig (1)

How does it solve the problem of scale? Let’s my system is receiving  5 x load, what happens to nodes and how can I balance load or reduce it? Simple thing is to add more nodes uniformly distributed on the unit circle and problem solved. Consistent hashing built of scale.

What happens when a node goes down? All the keys which were on this node are reallocated to the next successor node on the circle. All other keys remain unchanged. This is far more optimal compared to the case when we have re-distribute all keys on the failure of one node.

As mentioned earlier, we assume that the hash function used will distribute keys on nodes uniformly, which is not realistic. To reduce non-uniformity,  virtual nodes are introduced. In this case, each node is hashed with K different hash function which maps nodes on different points on the circle. Still, the node is going to get 1/N keys however, virtual nodes reduce key load variance significantly.

One challenge still remains: How to efficiently find successor node for a given key, we want to find source s such that h(s) > h(k) of key k. The intuitive solution is to use hash, but hashes do not maintain any ordering information. Best bet is to use binary search tree which maintains order, but the successor function is proportional to the depth of tree which is O(N), We can reduce that by using balanced binary search trees like red-black trees which reduces complexity to log(N).

Where all consistent hashing is used?

Consistent hashing is used in Memcached, Casandra, Amazon Dynamo, etc.

If you find this article useful, please share it. If there is something missing or wrong in the article please share and we will correct it.