Master theorem for complexity analysis

What is Master Theorem?

The Master theorem provides us solution for complexity analysis of recursive function by substituting values for variable. The Master Theorem applies to recurrences of the following form:

T (n) = aT (n/b) + f(n)

where a ≥ 1 and b > 1 are constants and f(n) is an asymptotically positive function.
These kinds of recurrence relations occur in the analysis of many “divide and conquer” algorithms like binary search or merge sort.

But what do these variables a, b and n mean?
n is the size of the input or the problem.
a is the number of subproblems in the recursion, means are we dividing the problem into two halves or 3 or 5? For example, for binary search algorithm a=1, and for merge sort algorithm it is 2.
n/b is the relative subproblem size. What rate is the input reduced? E.g., Binary search and Merge sort cut input in half.
f(n) is the cost of the work done outside the recursive calls, which includes the cost of dividing the problem and
the cost of merging the solutions to the subproblems.

Once, we have a, b and f(n) it is easy to find complexity of the algorithm by substitutin values in this expression


However, that’s not all, we have f(n) in the equation and final runtime of your algorithm depends on the relationship between nlogba and f(n)

There are three possible scenarios:
Scenario 1: f(n) = O(nc) and c < logba then the complexity is nlogba This mean recursion is taking more time than the combine and merge.

T(n) = 8T(n/2) + 1000n2
Here, logba = log28 = 3 > c (2); hence complexity is O(nlogba) = O(n3)

Scenario 2: f(n) = O(nclogkn) for k >= 0 and c = logba then the complexity is nlogbalogk+1n. It essentially means same runtime inside and outside the recursion.

T(n) = 2T(n/2) + 10n
10n can be written as O(nlog0n) with k = 0.
Here, logba = log22 = 1 = c (1); hence complexity is O(nlogbalogk+1n) = O(nlogn)

Scenario 3: f(n) = O(nc) and c > logba then the complexity is O(nc). It essentially means same runtime outside the recursion is more than split and recurse.

T(n) = 2T(n/2) + n2
Here, logba = log22 = 1 < c (2); hence complexity is O(nc) = O(n2)

Exceptions to Master Theorem

1. T(n) = 2n T(n/2) + nn.

This is not admissible because a is not constant here. For Master theorem to be application a and b must be constant.

2. T(n) = 0.5 T(n/2) + n2.

This is not admissible because a < 1 which is to say we are reducing the problem in less than one subproblems. For Master theorem to be application a must be greater than 1.

3. T(n) = 64T(n/2) – n2logn.

This is not admissible because f(n) is negative.

4. T(n) = 64T(n/2) – 2n

This is not admissible because f(n) is not polynomial.

Master Theorem Examples

Let’s apply the Master theorem on some of the known algorithms and see if it works?
Binary Search Algorithm
In the binary search algorithm, depending on the relationship between the middle element and the key, we discard one part of the array and look into the other. Also, from above, we know the Master theorem:

T(n) = aT(n/b) + f(n)

In this case, a is 1 as we are reducing to only one problem. b is 2 as we divide the input by half and no outside of recursion no work is done, hence the f(n) is O(1)

logba is log21 = 0. So logba is actually equal to c which is 0. In that case, the complexity of the algorithm is defined by O(nlogbalogk+1n), where k = 0. Substituting values in this, we get the complexity of binary search alogrithm as O(logn)

Merge Sort Algorithm
In the merge sort algorithm, we split the array into two equal parts and sort them individually. Apart from split we do a merge of elements, which take O(n) time. Let’s find a, b in the Master Theorem equation.

T(n) = aT(n/b) + f(n)

In this case, a is 2 as we are reducing to only two subproblems. b is 2 as we divide the input by half and outside of recursion no work is done, hence the f(n) is O(n)

logba is log22 = 1. So logba is actually equal to c which is 1. In that case, the complexity of the algorithm is defined by O(nlogbalogk+1n), where k = 0. Substituting values in this, we get the complexity of merge sort alogrithm as O(nlogn)

Please book a free session if you are looking for coaching to prepare for your next technical interview. At Algorithms and Me, we provide personalized coaching and mock interviews to prepare you for Amazon, Google, Facebook, etc. interviews.


  • Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms,
    Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Sections 4.3 (The master method) and 4.4 (Proof of the master theorem), pp. 73–90.
  • Michael T. Goodrich and Roberto Tamassia. Algorithm Design: Foundation, Analysis, and Internet Examples.
    Wiley, 2002. ISBN 0-471-38365-1. The master theorem (including the version of Case 2 included here, which is stronger than the one from CLRS) is on pp. 268–270
  • Master theorem teaching at CSE
  • Master theorem teaching at CSD
  • Arrays With Elements In The Range 1 to N

    When an array has all its elements in the range of 1 to N ( where N is the length of the array ) we can use the indices to store the ordered state of the elements in the array. This ordered-state can in-turn be used to solve a variety of problems which we’ll explore soon. First, a very simple demonstration of this property. 

    Here is an array which has unique elements in the range of 1 to N.
    Given array (A) : 5,3,1,4,2
    Indices:                0,1,2,3,4

    Sort in Linear Time

    The first use-case of this unique property is being able to sort in O(N) time i.e. a special-case(all unique elements) of the Counting Sort. The crux of this sort is to check whether an element is at its corresponding index and swap it to its correct index if it’s not. Following is a demonstration of this logic:

    Given array (A) : 5,3,1,4,2
    Indices:                0,1,2,3,4

    For each A[i] check if A[A[i] – 1] equals A[i] or not. If they are not equal then swap element at A[A[i] – 1] with A[i]. Basically the correct value for any index i is when A[i] contains i+1. 

    In the above case, let’s start with i = 0.

    A[A[0] – 1] or A[5-1] orA[4] which is 2 and A[0] = 5. This means that A[A[i] – 1] is not equal to A[i] and hence not in its correct position. So we need to swap in order to put A[0] -> 5 to its correct position which is index 4 and A[0] will hold 4 after the swap. Similarly, we need to repeat this check & swap for all the elements.

    What if we cancel-out the common terms and modify the check from  A[i] != A[A[i] - 1] to i != A[i]-1 ?

    Find The Missing Integer

    A similar approach can help us find the smallest missing positive-integer in a given array. By smallest missing positive-integer, we just mean the smallest positive integer that does not exist in the given list of numbers. For example: 

    Given Array: -2, 3, 0, 1, 3
    In the above case, the smallest missing positive integer is 2.

    If we were to apply the usual sorting techniques and then scan the array for the smallest positive integer absent it would imply a time-complexity of O(NLog(N)) + O(N). We can definitely do better than this! At first glance, it seems that this problem does not lie within the unique property of elements being in the range of 1 to N since the numbers in the given array are well outside the range, but to solve this problem we still only need to figure out whether we have elements from 1 to N present in the given array or not.

    How do we know whether the given array has elements from 1 to N? We can use the counting-sort discussed earlier to put each element in its “correct position”, i.e index 0 should hold 1, index 1 should hold 2 and so on. The smallest index that does not hold its correct element is the missing integer.

    If we sort the given array using counting sort described above, we will get: 1, 0, 3, -2, 3. And the smallest index i to not hold its correct value i.e. i+1 will give us the answer to the smallest missing positive integer. In this case, that index is 1 since it does not hold 2, thus the smallest positive missing integer is 2.

    Find The Duplicate Element

    The third use-case of this property is to figure out the duplicate elements without using any extra space. We can iterate over the array A and mark the corresponding index of the encountered element as negative – unless it has already been marked negative! For example: if A[1] = 3 (or -3 ) then mark A[ Abs[3] - 1] as negative, this way whenever we encounter 3 (or -3) again in any of the A[i] we will know that the value 3 has been visited before since A[3-1] will be negative.

    Given array (A) : 5,3,1,4,3
    Indices:                0,1,2,3,4

    When we encounter A[0] i.e. 5, we make A[5-1] i.e. A[4] negative, so the array becomes: 
    Next, we encounter A[1] i.e. 3, we make A[3-1] i.e. A[2] negative, so the array becomes: 
    Next, we encounter A[2] i.e. -1, we make A[1-1] i.e. A[0] negative, so the array becomes: 
    Next, we encounter A[3] i.e. 4, we make A[4-1] i.e. A[3] negative, so the array becomes: 
    Next, we encounter A[4] i.e. -3, we want to make A[3-1] i.e. A[2] negative, but in this case, A[2] is already negative thus we know that A[2] has been visited before! Which means Abs(A[4]) i.e 3 is the duplicate element.

    Here is a snippet to demonstrate the code for sorting an array in linear time as per the above approach. The exact same approach can be used to solve the other two applications i.e. Finding the Duplicate and Finding The Missing Integer.

            int swap=0;
            for(int i = 0; i < nums.length;){
                if(nums[i] > 0 && nums[i] < nums.length) {
                    if(nums[nums[i]-1] != nums[i]){                     
                        swap = nums[i];
                        nums[i] = nums[nums[i] - 1];
                        nums[swap - 1] = swap;


    If you are preparing for a technical interview in companies like Amazon, Facebook, etc and want help with preparation, please register for a coaching session with us.

    Range sum query- Immutable array

    Range sum query- Immutable array

    Write a service which given an integer array, returns the sum of the elements between indices i and j (i ≤ j), inclusive. Example: nums = [-2, 0, 3, -5, 2, -1]
    sumRange(0, 2) -> 1
    sumRange(2, 5) -> -1
    sumRange(0, 5) -> -3

    Also, the input set does not change during the calls to the sumRange(i,j).

    The brute force solution is to calculate the sum of all the elements A[i] to A[j] whenever a sumRange(i,j) is called. This method has time complexity of O(n). It is OK to have this solution for small scale but as the number of queries goes up, processing of all the numbers from i to j would be inefficient. Also, imagine a case where the array itself is very large, then O(n) complexity for each query will lead to choking of your service.

    Range sum query- Immutable array : thoughts

    There are two hints for optimization is in the question, first, the array is immutable, it does not change. Second, we have to build a service, that means we have a server with us. These two things allow us to pre-compute and store results even before the query is made.

    Now, the question is what do we pre-compute and how do we store it? We can precompute the sum of all the elements between each i and j and store them in a two-dimensional array. range[i][j] stores the sum of all the elements between index i and j. It will use O(n2) additional memory, however, the response time for each sumRange query will be constant. Also, the preprocessing step is O(n2)

    Can we optimize for space as well? If I know the sum of all the elements from index 0 to index i and sum of all the elements from 0 to j, can I find the sum of all the elements between i and j? Of course, we can do it.

     Sum(i,j) = Sum(0,j) - Sum(0,i) + A[i]. 

    Below diagram explains it.
    range sum query array

    However, integer array is not passed in the query request, we cannot use it while calculating the sum. Instead, we will use formula like: Sum(i,j) = Sum(0,j) – Sum(0,i-1), which is equivalent to the above.

    We will pre-calculate the sum of all the elements between index 0 and j for all j>=0 and jImplementation

    class NumArray {
        int[] rangeSum;
        public NumArray(int[] nums) {
            rangeSum = new int[nums.length];
                rangeSum[0] = nums[0]; 
                for(int i=1; i<nums.length; i++){
                    rangeSum[i] = rangeSum[i-1] + nums[i];
        public int sumRange(int i, int j) {
            if(i==0) return rangeSum[j];
            return rangeSum[j] - rangeSum[i-1];

    Now, the preprocessing step is O(n). N additional space is used. At the same time query response time is O(1).

    Please share if there is anything wrong or missing in the post. If you are preparing for an interview and needs coaching to prepare for it, please book a free demo session with us.

    Find combinations which add up to a number

    Combination sum problem

    Given an array of integers (candidates) (without duplicates) and a target number (target), find all unique combinations in candidates where the candidate numbers sums to target.

    Also, same candidate can occur in the combination as multiple times.

    For example, Input: candidates = [4,3,5,9], target = 9, a solution set is:[ [9], [3,3,3], [4,5]]

    How can do we go about it? What happens if I take the coin 4 in the current example? Then all need to find in the candidates array if there is a combination adds up to 9-4 = 5. Seems like a recursion. For recursion, we need a termination condition. In this case, if I have on candidates to add and target is greater than zero, then whatever combination I have till now has no value, so I terminate the recursion in this case.

    Second what if I have already found a combination which adds up to target? Then I will put that combination in the list of combinations and return.

    What happens in recursive implementation? Well, we go through each coin, add that to current combination and see if leads to the target? If it does, it will be added to the result list along with the list of other candidates. If not, we just remove the current coin (backtrack) from the current combination and try the next coin.

    This approach is called exhaustive search and backtracking paradigm of problem-solving where you search the entire input set to see to find the answer. However, in this case, we can prune the search path as soon as we know that the current set of candidates add up more than the target.

    Combination sum : implementation

    class Solution {
        public List<List<Integer>> combinationSum(int[] candidates,
                                                  int target) {
            /* The result list contains all the combination 
               which add up to target.
            List<List<Integer>> result = new ArrayList<List<Integer>> ();
            //We start with the first coin and search exhaustively.
                               new ArrayList<Integer>(),
            return result;
        public void combinationSumUtil(int[] candidates, 
                                      int target,
                                      List<List<Integer>> result,
                                      List<Integer> current, 
                                      int index){
               First termination condition: if there are no coins left
               and required target is more than zero.
            if(target > 0 && index == candidates.length){
               Second termination condition: if target is zero,
               we can add the current combination to the result
            if(target == 0 && index < candidates.length){
                result.add(new ArrayList<>(current));
               Start from the current index, and go through
               all the coins.
            for(int i=index; i<candidates.length; i++){
                   This is where we prune the branches 
                   of our exhaustive search
                if(target - candidates[i] >=0){
                    current.add(candidates[i]); // add to the list
                                       result, current, i);
                    /* Remove the candidate from the list and 
                       check other combinations.
                    if(current.size() > 0)

    The time complexity is C(n,1) + C(n,2) + … + C(n,n) = 2^n – C(n,0) = O(2n).

    The beauty of this solution is that it works with negative candidates as well, where the Dynamic solution for it may not work.

    Maximum area rectangle in a histogram

    A histogram is a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. Below is an example of a histogram.

    maximum area rectangle in histogram

    Given a histogram, whose class interval is 1, find maximum area rectangle in it. Let me explain the problem in more details.

    In the histogram above, there are at least 6 rectangles with areas 2, 1,5,6,2, and 3. Are there more rectangles? Yes, we can make more rectangles by combining some of these rectangles. A few are shown below.

    Apparently, the largest area rectangle in the histogram in the example is 2 x 5 = 10 rectangle. The task is to find a rectangle with maximum area in a given histogram. The histogram will be given as an array of the height of each block, in the example, input will be [2,1,5,6,2,3].

    Maximum area rectangle: thoughts

    First insight after looking at the rectangles above is: block can be part of a rectangle with a height less than or equal to its height. For each block of height h[i], check what all blocks on the left can be part of a rectangle with this block. All the blocks on the left side with a height greater than the current block height can be part of such a rectangle.
    Similarly, all the blocks on the right side with a height greater than the current block height can be part of such a rectangle.
    Idea is to calculate leftLimit and rightLimit and find the area (rightLimit - leftLimit) * h[i].
    Check if this area is greater than previously known area, then update the maximum area else, continue to the next block.

    class Solution {
        public int largestRectangleArea(int[] heights) {
            if(heights.length == 0) return 0;
            int maxArea = Integer.MIN_VALUE;
            for(int i=0; i<heights.length; i++){
                //Find the left limit for current block
                int leftLimit = findLeftLimit(heights, i);
                //Find the right limit for current block
                int rightLimit = findRightLimit(heights, i);
                int currentArea = (rightLimit - leftLimit-1) * heights[i];
                maxArea = Integer.max(maxArea, currentArea);
            return maxArea;
        private int findLeftLimit(int [] heights, int index){
            int j = index-1;
            while (j >= 0 && heights[j] >= heights[index]) j--;
            return j;
        private int findRightLimit(int [] heights, int index){
            int j = index+1;
            while (j < heights.length && heights[j] >= heights[index])
            return j;

    The time complexity of the implementation is O(n2); we will left and right of each block which will take n operations, we do it for n blocks and hence the complexity is quadratic. Can we optimize the time complexity?

    If heights[j] >= heights[i] and leftLimit of index j is already known, can we safely say that it will also be the leftLimit of index i as well?
    Can we say the same thing for rightLimit well? Answers to all the questions are yes. If we store the left and right limit for all indices already seen, we can avoid re-calculating them.

    class Solution {
        public int largestRectangleArea(int[] heights) {
            if(heights.length == 0) return 0;
            int maxArea = Integer.MIN_VALUE;
            //Finds left limit for each index, complexity O(n)
            int [] leftLimit = getLeftLimits(heights);
            //Find right limit for each index, complexity O(n)
            int [] rightLimit = getRightLimits(heights);
            for(int i=0; i<heights.length; i++){
                int currentArea = 
                    (rightLimit[i] - leftLimit[i] -1) * heights[i];
                maxArea = Integer.max(maxArea, currentArea);
            return maxArea;
        private int[] getLeftLimits(int [] heights){
            int [] leftLimit = new int[heights.length];
            leftLimit[heights.length-1] = -1;
            for(int i=0; i<heights.length; i++) {
                int j = i - 1;
                while (j >= 0 && heights[j] >= heights[i]) {
                    j = leftLimit[j];
                leftLimit[i] = j;
            return leftLimit;
        private int[] getRightLimits (int [] heights){
            int [] rightLimit = new int[heights.length];
            rightLimit[heights.length-1] = heights.length;
            for(int i=heights.length-2; i>=0; i--){
                int j = i+1;
                          && heights[j] > heights[i]){
                    j = rightLimit[j];
                rightLimit[i] = j;
            return rightLimit;

    The array leftLimitcontains at index i the closest index j to the left of i such that height[j] < height[i]. You can think about each value of the array as a pointer (or an arrow) pointing to such j for every i. How to calculate leftLimit[i]? Just point the arrow one to the left and if necessary just follow the arrows from there, until you get to proper j. The key idea here to see why this algorithm runs in O(n) is to observe that each arrow is followed at most once.

    Largest area rectangle: stack-based solution

    There is a classic method to solve this problem using the stack as well. Let’s see if we can build a stack-based solution using the information we already have. Let’s we do not calculate the area of the rectangle which includes the bar when we are processing it. When should we process it? Where should this bar be put on? If we want to create a rectangle with a height of this bar, we should find the left and right boundaries of such a rectangle. We should put this bar on a stack.
    Now when you are processing bar j if height[j] is less than the bar on the top of the stack, we pop out the bar at the top. Why? Because this is the first bar on the right which has a height less than the height of the bar at top of the stack. This means if we want to make a rectangle with a height of the bar at the top of the stack, this index means the right boundary. This also gives away that all the blocks on the stack are in increasing order, as we never put a block which has a height less than the height of block at the top on to the stack. It means the next bar on the stack is the first bar which has a height lower than the bar at the top. To calculate the area of the rectangle with height as h[top], we need to take width as current index j - stack.peek() - 1

    So the idea is that:

    1. For each bar, take its height as the rectangle’s height. Then find the left and right boundaries of this rectangle.
    2. The second top bar in the stack is always the first bar lower than the top bar on the stack on the left.
    3. The bar that j points to is always the first bar lower than the top bar in the stack on the right.
    4. After step 2 and 3, we know the left and right boundaries, then know the width, then know the area.
    private int maxAreaUsingStack(int[] heights){
            Stack<Integer> s = new Stack<>();
            int maxArea = 0;
            for(int i=0; i<=heights.length; i++){
                //Handling the last case
                int h = i == heights.length ? 0 : heights[i];
                while(!s.empty() && h < heights[s.peek()]){
                    int top = s.pop();
                    int leftLimit = s.isEmpty() ? -1 : s.peek();
                    int width = i-leftLimit-1;
                    int area = width * heights[top];
                    maxArea = Integer.max(area, maxArea);
            return maxArea;
    The time complexity of the code is O(n) with an additional space complexity of O(n) If you are preparing for a technical interview in companies like Amazon, Facebook, etc and want help with preparation, please register for a coaching session with us.

    System design basics – II

    System design basics – Cache

    In last post, we discussed the design fundamentals for database, in this post let’s talk about caching.

    What is caching? Caching is a mechanism to store and fetch data faster by applications without referring to the original source of data. Usually, applications store data which they think will be used again, instead of doing a round trip to the original source, applications use the stored data to serve requests. These are mostly software caches implemented using main memory.


    A cache is a storage which stores a subset of the entire data based on current usage and probability of its future usage for faster execution of requests. This storage is typically in main memory which has must faster read time compared to disk.

    In typical microprocessors, hardware cache exists which between CPU and main memory, these are very high-speed miniature memories which store instructions and data, which are processed by CPU, referred to as L1, L2 caches.

    In distributed web systems, caching can be done at various places and levels which we will discuss in a while.

    Caching eviction strategy

    As mentioned above, cache stores a subset of the original data. Question is which subset and how does it work when data which is not in the cache is requested? There are different approaches to select which data stays in the cache, these are called cache eviction strategies.

    Before understanding more on eviction strategies, a cache miss is an event where the application did not find data it requested in the cache. A cache hit is when the application did find what requested in the cache.

    1. First in first out : In the event of a cache miss, the entity which came first into the cache is evicted first. Every entity has a timestamp associated with it and the oldest timestamp entity is removed first.
    2. Least recently used: In the case of a cache miss, a page which was used least recently gets evicted from the cache.
    3. Least frequently used: In case of a cache miss, the entity which is least frequently used among all the entities in the cache is thrown out of the cache.

    There are other eviction strategies like minimum used eviction or heuristic-based eviction.

    Cache hit ration is dependent on many parameters, first, the size of cache key-space, the more unique cache keys your application generates, the less chance you have to reuse any one of them. Always consider ways to reduce the number of possible cache keys. second, the number of items you can store in the cache, the more objects you can physically fit into your cache, the better your cache hit ratio. Third, longevity, how long each object can be stored in cache before expiring or being invalidated.

    Caching levels

    Caches can be applied and leveraged throughout various layers of technology including Operating Systems, Networking layers including Content Delivery Networks (CDN) and DNS, web applications, and Databases.
    In a web application, most of the time, user requests need the same data to fulfill those request. If each request starts hitting your database, your servers will be overloaded and response time will slow. To avoid unnecessary load on servers and to decrease the response time, we place caches in between our databases and application. These caches can be on the same servers as the database, completely separate servers or at the application servers. Based on the metric and function you want to optimize, use appropriate caching level.

    – Client caching
    Caching which is done at the client-side like operating system and browser of the user. Typical examples are Address Resolution Protocol and static assets like HTML and CSS. Remember, you did nothing in this case, everything is done by browser and not your system.

    – CDN caching
    Imagine that you want to serve your static content like Javascript files, HTML templates, CSS without going to web servers. Anyways, web servers do not anything with static content than just passing it along. This is where Content Distribution Networks come in the picture. One can create CDNs near geographic locations of users and server static content to users from the nearest CDN, which makes website or app load faster.

    – Server caching
    We could have a cache directly on the servers. Each time a request is made to the service, the server will quickly return local, cached data if it exists. If it is not in the cache, the requesting node will query the data by going to network storage such as a database.
    How this solution will scale as we may grow to many nodes? If we decide to expand to multiple nodes, it’s still quite possible to have each node host its own cache. However, if your load balancer randomly distributes requests across the nodes, the same request will go to different nodes, thus increasing cache misses.

    Another design pattern to handle the cache miss problems is to have a common cache for the entire system which all server write and read from. This pattern scales better and even if the requests in the same session go to multiple servers, the user gets the consistent experience without latency. Trouble is that now you cache layer has become a bottleneck.

    Caching approach

    – Cache-aside
    In this caching approach, we write directly on to the DB. The cache reads the info from DB in case of a miss and then stores it till the eviction policy moves it out of cache. This approach can lead to higher read latency in case of applications which write and re-read the information quickly.

    – Write-through
    Where writes go through the cache and write is confirmed as success only if writes to database and the cache succeed. There is data consistency between cache and database. If your cache crashes due to power failures or other disruptions and restarts, nothing will be lost. However, write latency will be higher in this case as there are writes to two separate systems.

    – Write-behind (write-back)
    In this approach we write on cache and it is confirmed as soon as it is done on cache without writing it on to database. This write is asynchronously synced to database. It results in quick write latency and high write throughput for the write-intensive applications. However, you have a risk of loss of data incase the caching layer dies because the only single copy of the written data is in the cache. We can improve this by having more than one replica acknowledging the write in the cache.

    Advantages of using cache in your system design

    1. Improve Application Performance
    Because memory is orders of magnitude faster than disk (magnetic or SSD), reading data from the in-memory cache is extremely fast (sub-millisecond). This significantly faster data access improves the overall performance of the application.

    2.Reduce Database Cost
    A single cache instance can provide hundreds of thousands of IOPS (Input/output operations per second), potentially replacing a number of database instances, thus driving the total cost down. This is especially significant if the primary database charges per throughput.

    3. Reduce the Load on the Backend
    By redirecting significant parts of the read load from the backend database to the in-memory layer, caching can reduce the load on your database, and protect it from slower performance under load, or even from crashing at times of spikes.

    4. Eliminate Database Hotspots
    In many applications, it is likely that a small subset of data, such as a celebrity profile or popular product, will be accessed more frequently than the rest. This can result in hot spots in your database and may require overprovisioning of database resources based on the throughput requirements for the most frequently used data. Storing common keys in an in-memory cache mitigates the need to overprovision while providing fast and predictable performance for the most commonly accessed data.

    5.Increase Read Throughput (IOPS)
    In addition to lower latency, in-memory systems also offer much higher request rates (IOPS) relative to a comparable disk-based database. A single instance used as a distributed side-cache can serve hundreds of thousands of requests per second.

    System design basics – I

    System design basics

    System design round is one of the rounds in any technical interview. The idea of this round is to know your design skills, analytical and trade-off skills, to see if you can take a fuzzy problem, break it down it in small executable chunks and actually solve the problem. The beauty is that there are no correct or incorrect answers, it is more of a discussion.

    To have a fruitful discussion, one must know the basics and should be able to prove his/her claims with facts. This article discusses what are all those basic technical concepts you should be aware of when you go into a system design round.

    System Design concepts: Databases

    Databases are essentially required in any application/system unless it is a non-authenticated, static, read-only and not frequently changing content only application. It is therefore important to know a few concepts around databases.
    First of all, understand that these concepts are independent of what actual DB used, it can be open source like MySQL or proprietary like MS-SQL Server.

    Imagine a simple application like which is set up like the figure below. What are the problems with setup?

    system design interview

    There are three problems to start with among many others: First, what if the only DB server goes down or we have to take it down for maintenance? It impacts the system’s availability. Second, on the other instance, what if the server crashes and become unrecoverable, all application data in it is also gone, this affects the application’s data availability and persistence. Third, what if there are many reads and writes request coming to the server. The only server will degrade in performance, simple read requests will take a lot of time waiting for complex writes to finish, impacting the performance of the overall system.

    What can we do? Let’s set up a new server, which is nothing but a replica of the original server and store everything which is on the original server. This process is called replication, the original server is called master and the copy server is called slave.

    System design DB replication

    How does it solve the problems: First, having a replicated server gives us the power to immediately switch to a new server in case the master goes down. The system will keep running as if nothing happened and nobody except your database administrators needs to know anything.
    Second, you have the exact copy of the data, so even you lose data on one server, you can always extract that from the other.
    Third, since the servers contain the exact same data, you can read from one server and write on another. This improves the performance of your read requests (which are usually more in any system) and is not blocked on write.

    So in essence, replicating your DB servers gives you system availability, protection against data loss and performance gain .

    Usually, in production systems, there are two masters, (one stand by) and many slaves attached to those masters. Also, you want to keep a server with replication delayed by a few hours (24 hours ideally). Just in case if data on master get corrupted ( developer run a wrong query, file system issues) and it gets replicated immediately to all the servers, all of the data is now corrupted. This delayed replicated server can give you sane data although you lose data of a day.

    System design replication of database

    Read more about replication in MySQL

    Where is the catch, where is the tradeoff here? We improved availability and performance, however, we have risked the consistency. Imagine a case, where data is written on the master. Due to some lag, data replication to slave did not happen quickly. If another service reads from the slave the same data, either it does not get the data or gets an obsolete piece of data.

    Replication improves availability and performance, however, risks consistency.

    A database is essentially a collection of tables which are nothing but rows and columns. The number of rows and columns in tables vary. Let’s you have a very popular application and half of the earth’s population is registered to your application. In this case, your user table has too many rows. On the other hand, you have 10K employee and you want to store everything about those employees in one table called employee, here you have a table which has too many columns.

    Before understanding partitioning, let’s understand why too many rows or columns in the table are problems for a system?
    With too many rows, index size will be large and search performance will degrade, this will impact the overall performance of the service for a specific user. Let’s say a few thousands of users are accessing your service which queries this table, one user query starts degrading the other and your system goes into this spiral of degrading performance. If this big data gets corrupted, it impacts the availability of service to all the users.

    With too many columns, you will be pulling a lot of data in every query even when you need just a few columns out of those. This will indirectly impact the performance and put a lot of unnecessary data on the network.

    What is the solution to too many rows or columns? The answer is partitioning. There are two types of partitioning:
    Horizontal partitioning
    In horizontal partitioning, a table is divided into multiple partitions, each partition stores rows of the table. These partitions can be stored in multiple database clusters. One of the examples is that all the users from Germany, Netherlands, and France in a table called EuropeUser whereas all the users in USA and Canada are stored in a table called NorthAmericaUser and so on for Asia, Africa, etc. If you want to know all the users of your service, all you need to unite all these tables.
    Gain is that now you have to search less number of rows to find a European user compared to earlier, which improves query performance and hence improves the response time of service. If table EuropeUser is corrupted or deleted, your service is still available in North America and Asia.

    system design horizontal partitioning

    Vertical partitioning
    In vertical partitioning, we divide the table into multiple partitions and each partition contains certain columns. For example, an employee table with id, name, email, picture, salary, organization can be partitioned into three one with columns: id, name, and email; second: id, picture; Third: id, org, and salary.

    vertical partitioning

    There are multiple ways you can partition a table:

    Range partitioning – selects a partition by determining if the partitioning key is inside a certain range. It distributes tuples based on the value intervals (ranges) of some attribute.

    List partitioning – a partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen. For example, all rows where the column Country is either Iceland, Norway, Sweden, Finland or Denmark could build a partition for the Nordic countries.

    Composite partitioning – allows for certain combinations of the above partitioning schemes, for example first applying a range partitioning and then a hash partitioning.
    Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed.

    Round-robin partitioning – the simplest strategy, it ensures uniform data distribution. With n partitions, the ith tuple in insertion order is assigned to partition (i mod n). This strategy enables the sequential access to a relation to be done in parallel.

    Hash partitioning – applies a hash function to some attribute that yields the partition number. This strategy allows exact-match queries on the selection attribute to be processed by exactly one node and all other queries to be processed by all the nodes in parallel.

    Partitioning improves your database availability and performance.

    We already know that horizontal partitioning splits a big table into multiple partitions, however, these partitions are stored in the same schema and on the same server. Sharding takes it one step beyond, where partitions can be stored on multiple servers and in different schemas. This division of data and its storage on multiple servers gives flexibility to store more data which does not fit on single servers and fast search responses on the data.

    Algorithmic sharding
    In algorithmic sharding, the client can determine a given partition’s database without any help. Usually, there is a partition key, we can derive cluster id using that partition key. A simple hash function Hash(key) can be used a sharding function. Example of such sharding implementation is Memcached.

    Dynamic sharding
    In dynamic sharding, an external locator service determines the location of data.To read and write data, clients need to consult the locator service first.

    Sharding a database table before it has been optimized locally causes premature complexity. Sharding should be used only when all other options for optimization are inadequate.


    This is true because adding sharding actual may reduce performance if your data is not uniformly distributed by creating hotspots. Also, it adds lot of operational overheads, adding a new node in the system may require re-sharding and moving data between nodes.

    Intersection of two arrays

    Intersection of two arrays

    Given two unsorted arrays of integers, find intersection of these two arrays. Intersection means common elements in the given two arrays. For example, A = [1,4,3,2,5,6] B = [3,2,1,5,6,7,8,10] intersection of A and B is [ 1,3,2,5,6 ].

    Sort array and then use binary search
    As given arrays are unsorted, sort one of the arrays, preferably the larger one. Then search each element of another array in the sorted array using binary search. If the element is present, put it into the intersection array.

    class Solution {
        public int[] intersection(int[] nums1, int[] nums2) {
            int len1 = nums1.length;
            int len2 = nums2.length;
            Set<Integer> result = new HashSet<>();
            for(int i=0; i<len2; i++){
                if(binarySearch(nums1, nums2[i]) != -1){
            int i = 0;
            int[] resultArray = new int[result.size()];
            for(Integer num : result){
                resultArray[i++] = num ;
            return resultArray;
        private int binarySearch(int[] a, int key) {
            for(int i=0; i<a.length; i++){
                if(a[i] == key) return i;
            return -1;

    The time complexity of binary search method to find intersection is O(nlogn) for sorting and then O(mlogn) for searching. Effective time complexity becomes O((n+m)logn) which is not optimal.

    Sort and use merge to find common elements
    Again in this method, sort two arrays first. Then use two pointers to scan both arrays simultaneously. (Please refer to merge part of merge sort ). The difference is we will put only common elements, instead of all.

    The time complexity of merge sort method is O(nlogn) + O(mlogm) for sorting and then O(m+n) for scanning both arrays. It is worst than the binary search method.

    Use hash
    Create a hash with key as elements from the smaller array (saves space). Then scan through other array and see if the element is present in hash. If yes, put into intersection array else do not.

    package AlgorithmsAndMe;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    public class IntersectionTwoArrays {
        public List<Integer> findIntersecton(int[] a, int[] b) {
            List<Integer> result = new ArrayList<>();
            Map<Integer, Boolean> existingElements = new HashMap<>();
            for (int i = 0; i < a.length; i++) {
                existingElements.put(a[i], true);
            for (int i = 0; i < b.length; i++) {
                if (existingElements.containsKey(b[i])) {
            return result;

    Test case

    package Test;
    import AlgorithmsAndMe.DuplicatesInArray;
    import AlgorithmsAndMe.IntersectionTwoArrays;
    import java.util.List;
    import java.util.Set;
    public class IntersectonTwoArraysTest {
        IntersectionTwoArrays intersectionTwoArrays
                 = new IntersectionTwoArrays();
        public void testIntersectionTwoArrays() {
            int [] a = {1,6,3};
            int [] b = {1,2,3};
            List<Integer> result = intersectionTwoArrays.findIntersecton(a,b);
            result.forEach(s -> System.out.println(s));

    This method has the complexity of O(n) where n is the number of elements in the larger array and extra space complexity of O(m) where m is the number of elements in the smaller array.

    These methods to find the intersection of two arrays do not work when there are duplicate elements in any of the array as they will be part of intersection array only once.

    Please share if there is something wrong or missing. we would love to hear from you.

    Minimizing maximum lateness

    Minimizing maximum lateness : Greedy algorithm

    Since we have chosen the greed, let continue with it for one more post at least. Today’s problem is to minimize maximum lateness of a task. Let me clarify the problem: given a processor which processes one process at a time and as always given a list of processes to be scheduled on that processor, with the intention that maximum late process should be minimized. Contrary to previous problems, this time, we are not provided with start time and end time, but we are given length of time ti process will run and deadline it has to meet di, fi is actual finish time of process completion.

    Lateness of a process is defined as
    li = max{0, fi − di}, i.e. the length of time past its deadline that it finishes.
    Goal here to schedule all tasks to minimize maximum lateness L = max li For example:

    minimize maximum lateness

    Minimizing maximum lateness : algorithm

    Let’s decide our optimization strategy. There is some order in which jobs can be decided: shortest job first, earliest deadline first, least slack time first.

    Let’s see if any of the above strategies work for the optimal solution. For shortest processing time first, consider example P1 = (1,100) P2 = (10, 10). If we schedule the shortest job first as in order (P1, P2), lateness will be 91, but if we take them as (P2, P1), lateness will be 0. So, clearly taking the shortest process first does not give us an optimal solution.

    Check for the smallest slack time approach. See if you can come up with some counterexample that it does not work.

    That leaves us with only one option, take the process which has the most pressing deadline, that is the one with the smallest deadline and yet not scheduled. If you have noticed, the example given for the problem statement is solved using this method. So, we know it works.

    1. Sort all job in ascending order of deadlines
    2. Start with time t = 0
    3. For each job in the list
      1. Schedule the job at time t
      2. Finish time = t + processing time of job
      3. t = finish time
    4. Return (start time, finish time) for each job

    Minimizing maximum lateness : implementation

    from operator import itemgetter
    jobs = [(1, 3, 6), (2, 2, 9), (3, 1, 8), (4, 4, 9), 
            (5, 3, 14), (6, 2, 15)] 
    def get_minimum_lateness():
    	schedule =[];
    	max_lateness = 0
    	t = 0;
    	sorted_jobs = sorted(jobs,key=itemgetter(2))
    	for job in sorted_jobs:
    		job_start_time = t
    		job_finish_time = t + job[1]
    		t = job_finish_time
    		if(job_finish_time > job[2]):
    			max_lateness =  max (max_lateness, (job_finish_time - job[2]))
    		schedule.append((job_start_time, job_finish_time))
    	return max_lateness, schedule
    max_lateness, sc = get_minimum_lateness();
    print "Maximum lateness will be :" + str(max_lateness)
    for t in sc:
    	print t[0], t[1]

    The complexity of implementation is dominated by sort function, which is O(nlogn), rest of processing takes O(n).

    Please share your suggestions or if you find something is wrong in comments. We would love to hear what you have to say. If you find this post interesting, please feel free to share or like.

    Coin change problem : Greedy algorithm

    Coin change problem : Greedy algorithm

    Today, we will learn a very common problem which can be solved using the greedy algorithm. If you are not very familiar with a greedy algorithm, here is the gist: At every step of the algorithm, you take the best available option and hope that everything turns optimal at the end which usually does. The problem at hand is coin change problem, which goes like given coins of denominations 1,5,10,25,100; find out a way to give a customer an amount with the fewest number of coins. For example, if I ask you to return me change for 30, there are more than two ways to do so like

    Amount: 30
    Solutions : 3 X 10  ( 3 coins ) 
                6 X 5   ( 6 coins ) 
                1 X 25 + 5 X 1 ( 6 coins )
                1 X 25 + 1 X 5 ( 2 coins )

    The last solution is the optimal one as it gives us a change of amount only with 2 coins, where as all other solutions provide it in more than two coins.

    Solution for coin change problem using greedy algorithm is very intuitive and called as cashier’s algorithm. Basic principle is : At every iteration in search of a coin, take the largest coin which can fit into remaining amount we need change for at the instance. At the end you will have optimal solution.

    Coin change problem : Algorithm

    1. Sort n denomination coins in increasing order of value.
    2. Initialize set of coins as empty. S = {}
    3. While amount is not zero:
    3.1 Ck is largest coin such that amount > Ck
    3.1.1 If there is no such coin return “no viable solution”
    3.1.2 Else include the coin in the solution S.
    3.1.3 Decrease the remaining amount = amount – Ck

    Coin change problem : implementation

    #include <stdio.h>
    int coins[] = { 1,5,10,25,100 };
    int findMaxCoin(int amount, int size){
    	for(int i=0; i<size; i++){
    	    if(amount < coins[i]) return i-1;
    	return -1;
    int findMinimumCoinsForAmount(int amount, int change[]){
    	int numOfCoins = sizeof(coins)/sizeof(coins[0]);
    	int count = 0;
    	    int k = findMaxCoin(amount, numOfCoins);
    	    if(k == -1)
                    printf("No viable solution");
                    amount-= coins[k];
    		change[count++] = coins[k];
    	return count;
    int main(void) {
    	int change[10]; // This needs to be dynamic
    	int amount = 34;
    	int count = findMinimumCoinsForAmount(amount, change);
    	printf("\n Number of coins for change of %d : %d", amount, count);
    	printf("\n Coins : ");
    	for(int i=0; i<count; i++){
    		printf("%d ", change[i]);
    	return 0;

    What will the time complexity of the implementation? First of all, we are sorting the array of coins of size n, hence complexity with O(nlogn). While loop, the worst case is O(amount). If all we have is the coin with 1-denomination. Overall complexity for coin change problem becomes O(n log n) + O(amount).

    Will this algorithm work for all sort of denominations? The answer is no. It will not give any solution if there is no coin with denomination 1. So be careful while applying this algorithm.

    Please share if you have any suggestion or if you want me to write on a specific topic. If you liked the post, share it!