Longest Common Subsequence

Longest common subseuence

A subsequence of a string is set of all the characters which are left to right order and not necessarily contiguous. For example, string ABCDEG has ABC, ADG, EG, BCDEG subsequences; whereas BDA is not a subsequence of the given string, even though all the characters are present in the string, they do not appear in the same order.

longest common subsequence lcs

Given two strings X and Y, find longest common subsequence (LCS) Z. For example, X = ABCDSEFGD Y = ACFEFXVGAB; LCS Z would be ACEFG.

Longest common subsequence: line of thoughts

First of all, notice that it is an optimization problem, it is a hint that it may be a dynamic programming problem but we are not sure yet.

Let’s say that the length of the string 1 and the string of 2 are N and M. Can I know the longest common subsequence in length N and M if I already know the LCS in N-1 and M-1? The direct question is can I divide the original problem into subproblems and solve those subproblems to get the answer for original problem? In this case, the answer is yes. (This is the second hint towards dynamic programming application, optimal subproblem substructure).

How can we divide the problem into subproblems? The length of X is N and length of Y as M. Start from the end of both strings. Check if X[N] == Y[M]. If yes, the problem reduces to find the longest common subsequence in X[1..N-1] and Y[1..M-1].

What if they are not equal? Then one by one we have to exclude character from string X and Y. Why?
First, we exclude the character from the X and find LCS in remaining characters of X and all the characters of Y. The problem reduces to X[1..N-1] and Y[1..M]. Next, we exclude a character from Y, the problem reduces to X[1..N] and Y[1..M-1]. Any of the two choices can give us the longest common subsequence, so we would take maximum from both the cases.

LCS(i,j)  =  1 + LCS[i-1, j-1] if X[i] == Y[j]
  =   MAX (LCS[i-1,j], LCS[i, j-1]) if X[i] != Y[j]
=   0 if i or j is 0

Interesting to see why LCS(i,j) is 0 when either i or j is 0? Because the longest common subsequence in two strings, when one string is empty is 0.

Can we implement the recursive function?

    public int longestCommonSubsequence(String s1, String s2, int i, int j){

        //If any of the string is nul return 0
        if(s1 == null || s2 == null) return 0;

        //We reached at the end of one of the string
        if(i == s1.length() ||  j == s2.length())
            return 0;

        if(s1.charAt(i) ==  s2.charAt(j)){
            return  1 + longestCommonSubsequence(s1, s2, i+1, j+1);
        }

        return Integer.max(longestCommonSubsequence(s1, s2, i+1, j),
                longestCommonSubsequence(s1, s2, i, j+1));

If we follow the execution cycle of the above code, we will see something like below

longest common subsequence lcs

It is evident from the partial tree that there are some problems which are solved again and again. This is the third hint (overlapping subproblems) that we can apply dynamic programming.

It will be more evident if you implement the recursive function with reverse traversal of the strings. In that implementation, the base case will be when one of the string is empty, and at that point, LCS of two strings will be 0. If we take a two dimensional table such that T[i][j] represents longest common subsequence till ith and jth characters of string S1 and S2 then T[0][i] = 0 and T[i][0] = 0.

T[i][j] = T[i-1][j-1] + 1 if X[i] = Y[j]
T[i][j] = max(T[i-1][j], T[i][j-1]) if X[i] != Y[j]

Dynamic programming implementation of LCS

package com.company;

/**
 * Created by sangar on 4.2.19.
 */
public class LongestCommonSubsequence {

    public int longestCommonSubsequenceDP(String s1, String s2){

        //If any of the string is nul return 0
        if(s1 == null || s2 == null) return 0;

        int len1 = s1.length();
        int len2 = s2.length();

        int[][] table = new int[len1+1][len2+1];

        for (int i=0; i<=len1; i++){
            for (int j=0; j<=len2; j++) {
                if (j == 0 || i == 0) {
                    table[i][j] =  0;
                }

                else if (s1.charAt(i-1) == s2.charAt(j-1)) {
                    table[i][j] = 1 + table[i - 1][j - 1];
                } else {
                    table[i][j] = Integer.max(table[i - 1][j], table[i][j - 1]);
                }
            }
        }

        return table[len1][len2];
    }
}

Above implementation has time and space complexity of O(n2). Please share if there is anything wrong or missing.

What is dynamic programming?

What is Dynamic Programming or DP

Dynamic programming is an approach to solve a larger problem with the help of the results of smaller subproblems. It is a technique used to avoid computing multiple time the same subproblem in a recursive algorithm. I find a lot of students asking me question around, how do I know this problem is a dynamic programming problem? There is a definite way to arrive at the conclusion if a problem is a dynamic programming problem or not?

The first thing I would recommend you to read before going down is this beautiful explanation of dynamic programming to 4 years old.

The first thing you will notice about dynamic programming problems (not all problems) is they are optimization problem. Either it will be finding minimum or maximum of some entity. For example, find minimum edit between two strings or find longest common subsequence etc. However, problems like Fibonacci series are not exactly like an optimization problem, these are more like Combinatorial problems. Still, this can be a good hint that a problem can be a DP problem.

Second, you will notice that the problem can be divided into a pattern like an fn(n) = C + fn(n-k) where k can be anything between 1 and n.
This property is called optimum subproblem structure, where an optimum solution to the subproblem leads to the optimum solution to the larger problem.
Once you get the equation, it is very easy to come with a recursive solution to the problem. I would advise you to write the recursive solution and try to calculate the complexity of the solution. It will exponential in big-O notation.

Then why did recursion work so well with a divide and conquer approach? The key point is that in divide and conquer, a problem is expressed in terms of subproblems that are substantially smaller, say half the size. For instance, mergesort sorts an array of size n by recursively sorting two subarrays of size n/2. Because of this sharp drop in problem size, the full recursion tree has only logarithmic depth and a polynomial number of nodes. In contrast, in a typical dynamic programming formulation, a problem is reduced to subproblems that are only slightly smaller than the original. For instance, fn(j) relies on fn(j − 1). Thus the full recursion tree generally has polynomial depth and an exponential number of nodes.
However, it turns out that most of these nodes are repeats, that there are not too many distinct subproblems among them. Efficiency is therefore obtained by explicitly enumerating the distinct subproblems and solving them in the right order.
Reference

This will lead us to the third property, which is overlapping subproblems. Once, you draw the execution tree of the recursive solution of the problem, it will appear that a lot of problems are being solved again and again at different levels of recursion.

The intuition behind dynamic programming is that we trade space for time, i.e. to say that instead of calculating all the subproblems taking a lot of time but no space, we take up space to store the results of all the subproblems to save time later. The typical characteristics of a dynamic programming problem are optimization problems, optimal substructure property, overlapping subproblems, trade space for time, implementation via bottom-up/memoization.

Dynamic programming in action

Enough of theory, let’s take an example and see how dynamic programming works on real problems. I will take a very commonly used but most effective problem to explain DP in action. Problem is known as the Fibonacci series. Fibonacci series is a series of integers where each integer is the sum of previous two integers. For example, 1,1,2,3,5,8,13,17 is a Fibonacci series of eight integers. Now, the question is given a number n, output the integer which will be at the nth integer in Fibonacci series. For example for n = 4, the output should be 3 and for n=6, it should 8.

First hint: It is a combinatorial problem, so maybe a DP problem. Second, it is already given in the problem that current integer depends on the sum of previous two integers, that means f(n) = f(n-1) + f(n-2). This implies that the solution to subproblems will lead to a solution to the bigger problem which is optimal substructure property.

Next step is to implement the recursive function.

 public int fibonacci (int n) {
    if (n < 2) //base case
        return 1;

    return fibonacci(n-1) + fibonacci(n-2);
 }

Great, next step is to draw the execution tree of this function. It looks like below for n = 6. It is apparent how many times the same problem is solved at different levels.

what is dynamic programming
Recursive tree of Fibonacci series function

So, now we know three things about the Fibonacci problem: It is combinatorial problem, there is optimal substructure and there are overlapping subproblems. As in dynamic programming, we side with more space than time, we will try to use extra space to avoid recalculating subproblems.

The first way is to use a case, which stores the value of fab(n) if it is already calculated. This is called memoization or top-down approach.

Map<Integer, Integer> cache = new HashMap<>();

public int fibonacci(int n){
    if (n == 0)
       return 0;
    if (n == 1)
        return 1;

    if(cache.containsKey(n))
        return cache.get(n);

    cache.put(n, fibonacci(n - 1) + fibonacci(n - 2));

    return cache.get(n);
}

Another approach is bottom up, where the smaller problems are solved in an order which helps us with solving bigger problems. Here also, we use memoization but in a different way. We store the solution of smaller subproblems and directly use this to build the solution.

int[] fib = new int[n];
fib[0] = fib[1] = 1;
public int fibonacci(int n){
   for(int i=2; i<=n; i++){
       fib[n] = fib[n-1] + fib[n-2];
   }
   return fib[n];
}

Above solution requires extra O(n) space, however, the time complexity is also reduced to O(n) with each subproblem solved only once.

Follow longest increasing subsequence problem, how we have applied the same pattern while we solved the problem.

Final thoughts
Where to apply dynamic programming : If you solution is based on optimal substructure and overlapping sub problem then in that case using the earlier calculated value will be useful so you do not have to recompute it. It is bottom up approach. Suppose you need to calculate fib(n) in that case all you need to do is add the previous calculated value of fib(n-1) and fib(n-2)

Recursion : Basically subdividing you problem into smaller part to solve it with ease but keep it in mind it does not avoid re computation if we have same value calculated previously in other recursion call.

Memoization : Basically storing the old calculated recursion value in table is known as memoization which will avoid re-computation if its already been calculated by some previous call so any value will be calculated once. So before calculating we check whether this value has already been calculated or not if already calculated then we return the same from table instead of recomputing. It is also top down approach.
Reference: Answer by Endeavour

Please share if there is something wrong or missing. If you are preparing for an interview and need help with preparation, please book a free session with us to guide you through it.

0/1 knapsack problem using dynamic programming

0/1 Knapsack problem

0/1 Knapsack is typical problem which is used to demonstrate application of greedy algorithm as well as dynamic programming. There are cases when applying greedy algorithm does not give optimal solution. There are many flavors in which Knapsack problem can be asked.  

1. A thief enters a museum and want to steal artifacts from there. Every artifact has a weight and value associated with it. Thief carries a knapsack (bag) which can take only a specific weight. Problem is to find the combination of artifacts thief steals so that he gets maximum value and weight of all taken artifacts is less capacity of  knapsack he has. Thief cannot take any artifact partially. Either he takes it or leaves it. Hence the problem is 0/1 knapsack.

2. We have N files each having a size say Si. We have a total storage capacity of W bytes. For each file to be stored re-computation cost is Vi. Problem is to store as many files on storage that combined size of all files is less than W and their re-computation value is maximum. We can either store or leave a file, we cannot store partial file. Hence this is a case of 0/1 knapsack problem.

0/1 knapsack problem : Line of thoughts

Brute force method would try all subsets of set of items, whose weight adds up to maximum capacity of knapsack and see which one gives maximum value. Complexity of brute force algorithm would be of exponential order as there will be 2n possible subsets for n items.

Can we do better?  If we consider each item, there are two possibilities associated with it.
First, current item is included in optimal subset. Then we need to find out all the items in remaining N-1 items which can optimize the subproblem for weight W-wk. Value of this item is added to candidate maximum value.

Second, current item is not included in optimal subset. In that case, we need to find out items in remaining N-1 items which can optimize the the original problem. Value of current item is not added into candidate maximum value.

Inclusion depends on two conditions :

  1. Weight of the item is less than the total capacity of knapsack.
  2. Inclusion of item increases current max value with K-1 items with W-Wk weight.  

Since every steps reduces the problem to a smaller problem in terms of items of weight, recursive solution would be our first refuge. To implement this problem, what are the base cases? First, we cannot add any items to knapsack capacity is zero i.e. W == 0. Second, no item can be taken if there are no items remaining, i.e. n == 0.

Recursive implementation of 0/1 knapsack problem

package com.company;
/**
	* Created by sangar on 19.8.18.
*/
public class KnapsackProblem {
	static int knapSack(int W, int[] weights, int[] val, int n) {
		/*
			If there is no item or weight that can be carried is
			zero, return zero
		*/
		if (n &lt; 0 || W == 0)
			return 0;

		/* 
			If weight of the nth item is more than Knapsack 
			capacity W,then this item cannot be included
			in the optimal solution
		*/
		if (weights[n] &gt; W)
			return knapSack(W, weights, val, n - 1);

		/* Consider two cases, including item and excluding item.*/
		else return Integer.max(
			(val[n]
				+ knapSack(W - weights[n], weights, val, n - 1)),
			(knapSack(W, weights, val, n - 1))
		);
	}

	public static void main(String args[]) {
		int[] val = {60, 100, 120};
		int[] wt = {10, 20, 30};
		int W = 50;
	
		int n = val.length;
		System.out.println(knapSack(W, wt, val, n - 1));
	}
}

If we look at the execution trace of the function, it looks like this.

0/1 knapsack problem

There are seven problems to be solved at the leaf level. For N = 3, there are 7 problems to be solved before we start optimizing for the max value. For N, in general, it will take 2N subproblems to be solved. Hence, complexity of recursive implementation is O(2N).

If we take another example, it will become evident that there are some subproblems which are solved again and again. Overlapping subproblems is one of the criteria, we should be thinking about dynamic programming. Also, notice that optimal solution to a smaller problem leads to optimal solution to bigger problem, which is second condition for DP.  This problem satisfy both these conditions, hence let’s design DP solution for it.

0/1 knapsack problem : Dynamic programming approach

We define two dimensional array V[N,W] where N is number of items and W is capacity. For 1<= i <= n and  0<=w<=W, V[i,w] represents the optimal solution for items I1, I2, ..In with maximum weight of w.  If we can compute all the entries of this array, then the array entry V[N,W] is the solution to our problem

For i =0 and w=0, all values will be zero. So, first column and first row will be filled with all zero values.
Recursively, we can fill the table bottom up as follows.

V[i, w ] = max (V[i-1, w], V[i-1, w-w[i]) + V[i] )
V[0, w ] = 0; there are no items
V[i, 0 ] = 0; no items can be picked.
package com.company;

/**
	* Created by sangar on 19.8.18.
*/
public class KnapsackProblem {
	public static int knapsackDP(int W, int[] weights, int[] val, int n) {
		int[][] V = new int[n+1][W + 1];
		for(int i = 1 ; i &lt; V[0].length; i++){
			/*
				If weight of item is less than current value
				we can achieve minimum value V[i] with 1..i items
			*/
			if(weights[0] &lte; i){
				V[0][i] = val[0];
			}else{
				V[0][i] = 0;
			}
		}

		//Loop for all items
		for (int i = 1; i &lt; V.length; i++) {
			for (int j = 1; j &lt; V[i].length; j++) {
				/*if a weight is more than the allowed weight, 
				that weight cannot be picked. */
				if(weights[i] &gt; j){
					V[i][j] = V[i-1][j];
				}else{
					V[i][j] = Math.max(V[i-1][j], 
							val[i] + V[i-1][j-weights[i]]);
				}
			}
		}
		return V[V.length-1][W];
	}

	public static void main(String args[]) {
		int[] val = {60, 100, 120};
		int[] wt = {10, 20, 30};
		int W = 50;

		int n = val.length;
		System.out.println(knapsackDP(W, wt, val, n - 1));
	}
}

One similar problem which can be solved with same approach is minimum number of coins to be used to get change of a particular amount. I am skipping the whole analysis and directly pasting the code here.  

Complexity of the dynamic programming implementation of knapsack problem is O(N *W). Space complexity is again O(N*W). It is thumb rule that we trade space for time in dynamic programming.

Please share if there is something is wrong or missing. If you want to contribute to website, please reach out to us at communications@algorithmsandme.com

Longest increasing subsequence : Dynamic Programming

Longest increasing subsequence

Given an array of integers, find the longest increasing subsequence i.e a subsequence such that every element of subsequence satisfy this condition i < j  and a[i] < a[j].For example, in array {2,4,6,3,5,7,9} longest increasing subsequence is of length 5  = {2,4,6,7,9} longest increasing subsequence
More examples :

Input  : arr[] = {3, 10, 2, 1, 20}
Output : Length of LIS = 3
The longest increasing subsequence is 3, 10, 20

Input  : arr[] = {3, 2}
Output : Length of LIS = 1
The longest increasing subsequences are {3} and {2}

Input : arr[] = {50, 3, 10, 7, 40, 80}
Output : Length of LIS = 4
The longest increasing subsequence is {3, 7, 40, 80}

Longest common subsequence : Line of Thoughts

We have to find longest increasing subsequence till last element of array, i.e Nth element, question is does depend on LIS till N-1 element? Idea is to see if any increasing subsequence already present till current index i,  can include  A[i] and still remain increasing? To confirm that, check every element at index j such that j>=0 and j < i and A[j] < A[i].
If element A[j] is less than A[i], then A[i] can be part of increasing subsequence ending with element j. Length of such increasing subsequence can be (length of increasing subsequence ending at j )+1. Check for each such element and take maximum length.Let’s see an example and see how it works.

Before we start solution,, let’s think why are we applying dynamic programming here. First because, solution to bigger problem depends on optimal solution of subproblems and second, because if we do not store solutions to subproblems we may end up calculating them again and again. Both Optimal subproblem property and overlapping subproblem property are satisfied for this problem and hence we will use dynamic programming to solve it.

Longest increasing subsequence : Implementation

Define an array LIS of size N, LIS[i] will represent longest increasing subsequence length till element i.

LIS[i] = 1 + max(LIS[j]) for all  0<=j<i and A[j]<A[i] 
       = 1 if no such element exists where j< i and A[j]<A[i]

Below is C code to implement it.

#include <stdio.h>
#include <stdlib.h>

int maximumLIS(int a[], int end, int *lis){
    for (int i=0; i<end; i++){
        if( a[i] < a[end] && lis[i] > lis[end] )
            lis[end] = lis[i];
    }
    return lis[end];
}

int lis(int a[], int size){

    int *lis = (int *)malloc(sizeof(int)*size);
    lis[0] = 1;

    for(int i=1; i<size; i++){
    	lis[i] = 1 + maximumLIS(a,i,lis);
    }
    
    int result = lis[size-1];
    free(lis);
    
    return result;
}

int main(void) {
	int a[] = { 2,4,6,3,5,7,9 };
	int size = sizeof(a)/sizeof(a[0]);

	printf("Length of Longest increasing subsquence : %d" , lis(a, size));
	
	return 0;
}

Java implementation for the same algorithm

package com.company;

/**
 * Created by sangar on 7.1.18.
 */
public class LIS {

    private static int maximumLIS(int a[], int end, int [] LIS){
        for (int i=0; i<end; i++){
            if( a[i] < a[end] && LIS[i] > LIS[end] )
            LIS[end] = LIS[i];
        }
        return LIS[end];
    }

    private static int lis(int [] A){

        int [] LIS = new int[A.length];
        LIS[0] = 1;

        for(int i=1; i<A.length; i++){
            LIS[i] = 1 + maximumLIS(A,i,LIS);
        }

        return LIS[A.length - 1];
    }
    public static void main(String[] args) {
        int[] A = {2,4,6,3,5,7,9};
        System.out.println("Longest increasing sybsequence : " + lis(A));
    }
}

Let’s take an example and see how this code works? For example, given array A = {2,4,6,3,5,7,9}

Initialize LIS[0] =1, that means there is an increasing subsequence of length 1 at index 0.
For i = 1 i.e 4, check for j=0 to 1, excluding index 1. A[0] < A[1], hence LIS length is 2 (LIS[0] + 1 ).

For i = 2, i.e. 6 , check j = 0 to 2.  and check that LIS[0] = 1 and LIS[1] =2. Max LIS[j] for j=0 to  2 is 2. LIS[2] = 3 (LIS[1] +1).
For i =3 i.e 3, check from 0 to 3, Max LIS till index 3 will be LIS[3] = 2 because only A[0] is less than A[3]. Hence longest subsequence ending with i = 3 will have length only of 2.  LIS[3] = 2
For i = 4, i.e.5 Max LIS is again 3 which includes {2,4,5} or {2,3,5}
For i = 5, i.e 7, Max LIS is 4 which includes {2,4,5,7} or {2,3,5,7} or {2,4,6,7}
For i = 6, i.e 9, Max LIS is 5 which includes {2,4,5,7,9} or {2,3,5,7,9} or {2,4,6,7,9}

Therefore, longest increasing subsequence is 6 for given array.

Other problems which are variance of longest increasing subsequence and can be solved by finding longest increasing subsequence are :

1. Given two river banks (visualization : two parallel lines), one bank has numbers written (1….n) in sorted order. On the other bank the numbers (1…n) are arranged randomly. A bridge can be formed from the ith point from bank 1 to ith point in bank 2. Find the max number of non-intersecting bridges you can form?
Just find longest increasing subsequence in non ordered number and that will be the solution.

2. Given a set of n types of rectangular 3-D boxes, where the ith box has height h(i), width w(i) and depth d(i) (all real numbers). You want to create a stack of boxes which is as tall as possible, but you can only stack a box on top of another box if the dimensions of the 2-D base of the lower box are each strictly larger than those of the 2-D base of the higher box.

3. Another problem which borrow heavily from approach is find longest zigzag subsequence in an array

Algorithm to find longest increasing subsequence works in O(N^2) in time complexity with O(N) space complexity.

Please share if you find anything wrong or missing, we would love to hear from you. If you want to be part of editors and contributors team at Algorithms and Me, please drop us an email at communications@algorithmsandme.com

Longest palindrome substring dynamic programming

Find longest palindrome substring in string

Given a string, find the longest palindrome substring in that string. For example, in the string given below, the longest palindrome substring is DCBABCD

longest palindrome substring dp

longest palindrome substring dynamic programming

Brute force solution of the problem is pretty easy. Idea is to start from each character in the string and go on checking on the left and right side of that character until they are same. Once the left and right side characters differ, we check if the number of characters in substring centered character is greater than earlier such substring for any other character processed earlier. If it is greater, then we update the length and look for subsequent characters till the end of the string.

Implementation

#include <stdio.h>
#include <string.h>

#define true 1
#define false 0

int longestPalindrome(char *s){

  int i,j,k, n;
  int longestEnd =0, longestStart=0;

  n = strlen(s);
/* This is case which handles odd length string */
   for(i=0; i<n; i++){
       for(j=i-1, k=i+1; j>=0 && k<=n; ){
   /* If characters are equal, update left and right index. */
           if(s[j] == s[k] ){
                k++;
                j--;
           }
           else
                break;
       }
  /* Check if current sub-string length is greater 
    than earlier max, If yes, update it */
      if(longestEnd - longestStart < k-j)
      {
          longestEnd = k;
          longestStart = j;
      }
 }
/* This is case which handles even length string */
  for(i=1; i<n; i++){
       for(j=i, k=i+1; j>=0 && k<=n; ){
            if(s[j] == s[k] ){
                k++;
                j--;
            }
            else
                break;
       }
       if(longestEnd - longestStart < k-j)
       {
            longestEnd = k;
            longestStart = j;
       }
  }
  return longestEnd - longestStart - 1;
}

int main()
{
    char str[] = "ABCDCBEA";
    printf("\nLength is: %d\n", longestPalindrome( str ) );
    return 0;
}

The complexity of the implementation is O(n2).

Longest palindrome string: a dynamic programming approach

To apply dynamic programming to any problem, there are two conditions which must satisfy: First, the optimal subproblem property meaning that solution to smaller subproblems leads to the solution to the larger problem. Second, overlapping subproblem substructure meaning there are subproblems which are solved again and again in a recursive way.

The first property gives us the recurrence relationship, whereas the second property points us towards memoization.

Let’s take an example and see if we can come up with an algorithm. We have to find if a string of length 4 is palindrome or not. To string to be a palindrome, it’s first and the last character should be the same. There are two paths here, either the first and last characters are same or they are not.

If the first and last characters are not the same, we can safely say that string is not a palindrome. If first and last characters are same, can we say the string is a palindrome? No, not yet. We have to check if the substring from the second character to the second last character is palindrome or not.

longest palindrome substring

This is optimum subproblems property here. If substring s[i+1..j-1] is a palindrome, then string s[i..j] is palindrome if and only if s[i] = s[j].

Now, substring s[i+1..j-1] will be calculated for all the indices k where 0<=k<=i and for the indices k where j <=k<=length. This hints at overlapping substructure property. Can we precalculate this stuff before and save it in memory?
We can store this information in a matrix M, where M[i][j] stores if a substring starting at i and length j is palindrome or not. Notice that meaning of j is length and not an index in original string.

M[i][1] will be always true as a character is a palindrome in itself. M[i][2] is true only if s[i]=S[i+1]

For M[i][3], the first character of the substring will be s[i], the last character will be s[i+3-1], if s[i] = s[i+2] and if M[i+1][2] is true (substring starting at i+1 and length 2 is palindrome or not), then only M[i][3] can be true.

We fill the table bottom up, starting with substrings with length 1 and 2 from each character in the string.

M[i][j] = (s[i] == [j+i-1] && M[i+1][j-2]) for i = 3 to n-j+1 and j = 1 to n

Longest palindrome substring: DP implementation

package com.company;

/**
 * Created by sangar on 2.1.19.
 */
public class LongestPalindromeSubstring {


    public int longestPalindromeSubstring(String s){
        int n = s.length();

        int longestBegin = 0;
        int maxLen = 1;

        boolean[][] table  = new boolean[n+1][n+1];

        for (int i=0; i<=n; i++) {
            for (int j=0; j<=n; j++) {
                table[i][j] = false;
            }
        }

        //Each character is palindrome in itself
        for (int i=0; i<=n; i++) {
            table[i][1] = true;
        }

        //Two character substring is palindrome only if they are same
        for (int i = 0; i < n-1; i++) {
            if (s.charAt(i) == s.charAt(i+1)) {
                table[i][2] = true;
                longestBegin = i;
                maxLen = 2;
            }
        }

        for (int j=3; j<=n; j++) {
            for (int i=0; i<n-j+1; i++) {
                if (s.charAt(i) == s.charAt(i+j-1)
                        && table[i+1][j-2]) {
                    table[i][j] = true;
                    longestBegin = i;
                    maxLen = j;
                }
            }
        }

        System.out.println(s.substring(longestBegin, longestBegin+maxLen));
        return maxLen;
    }
}

Test cases

package test;

import com.company.LongestPalindromeSubstring;
import org.junit.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
/**
 * Created by sangar on 2.1.19.
 */
public class LongestPalindromeSubstringTest {

    LongestPalindromeSubstring tester = new LongestPalindromeSubstring();

    @Test
    public void longestPalindromeSubstringTest() {
        assertEquals(4, tester.longestPalindromeSubstring("ABBA"));
    }

    @Test
    public void longestPalindromeSubstringOneCharTest() {
        assertEquals(1, tester.longestPalindromeSubstring("ACDB"));
    }

    @Test
    public void longestPalindromeSubstringInMidTest() {
        assertEquals(4, tester.longestPalindromeSubstring("CABBAD"));
    }

    @Test
    public void longestPalindromeSubstringOddLenTest() {
        assertEquals(5, tester.longestPalindromeSubstring("ABCBA"));
    }
}

The complexity of the dynamic programming approach is much better than brute force algorithm and is O(n2), It used extra memory though in order O(n2).

Please share if there is something wrong or missing. If you are preparing for an interview and want help with dynamic programming techniques, please reach out to our team at communications@algorthmsandme.com

Merge k sorted arrays

Merge k sorted arrays

Given k sorted arrays each having n elements, merge k sorted arrays into one n*k element array in sorted order. For example, given 3 arrays are as below

merge k sorted arrays
merge k sorted arrays

Result array should be like

Merge k sorted arrays

Merge k sorted arrays

Since all the input arrays are sorted, the first element in result array will be among the first elements of input arrays. How can we find the minimum among all the elements plucked from the first index of each array ? Easy, take those k elements (there are k arrays, so k first elements) and build a min heap. The root of the min heap the least element among the first elements of all arrays, so it will be the first element in the result array.

Once, we add the first element into the result array, we have to find the second element. Second element can be from the set of first elements of all input arrays except one array from which the first element of result array was added. So, we will take second element of that array.

In order to know which array gave the minimum element at a particular time, we will store additional information of about array and index at which minimum element was.

If i represents the array number, and j represents the index of the minimum number in heap in ith array, then we add (j+1)th element to the min heap next and re-heapify. If j goes out of bound for ith array, we take min heap with k-1 size and go on, till we have no elements left in heap.

Follow the procedure for (n-1)*k times. When all array elements are processed, result array will be in the sorted array.

Merge k sorted arrays: algorithm

  • Build min heap with the first element of all k arrays.
  • Pick the root of min element and put it in the result array.
  • If there are remaining elements in the array,  put next element at the root of min heap and heapify again
  • If all elements are already of an array are processed, reduce the size of min heap by 1.
  • Repeat step 2, 3 and 4 till min heap is empty.

Merge k sorted arrays: implementation

package com.company;

import java.util.PriorityQueue;

/**
 * Created by sangar on 2.12.18.
 */
public class MergeKSortedArrays {
    private class HeapNode{
        public int arrayNum;
        public int index;
        public int value;

        public HeapNode(int arrayNum, int index, int value){
            this.arrayNum = arrayNum;
            this.index = index;
            this.value = value;
        }
    }

    public int [] mergeKSortedArrays(int[][] arrays){

        if(arrays == null) return null;

        PriorityQueue<HeapNode> minHeap =
			new PriorityQueue<>(arrays.length,
                (HeapNode a,HeapNode b)-> a.value - b.value);

        int size = 0;
        for(int i =0; i<arrays.length; i++){
            size += arrays[i].length;
        }
        int[] result = new int[size]; // k * n

        //add first elements in the array to this heap
        for(int i=0; i<arrays.length; i++){
            minHeap.add(new HeapNode(i, 0, arrays[i][0]));
        }

        //Complexity O(n * k * log k)
        for(int i=0; i< size; i++){
            //Take the minimum value and put into result
            HeapNode node = minHeap.poll();

            if(node != null){
                result[i] = node.value;
                if(node.index + 1 < arrays[node.arrayNum].length) {
                    //Complexity of O(log k)
                    minHeap.add(new HeapNode(node.arrayNum,
                           node.index + 1,
                           arrays[node.arrayNum][node.index + 1]));
                }
            }
        }
        return result;
    }
}

Test cases

package test;

import com.company.MergeKSortedArrays;
import org.junit.jupiter.api.Test;

import java.util.Arrays;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class MergeKSortedArraysTest {

    MergeKSortedArrays tester = new MergeKSortedArrays();

    @Test
    public void mergeKSortedArraysTest() {

        int[][] input  ={
            { 1, 2, 3, 4 }, { 5, 6, 7, 8 }, { 9, 10, 11, 12 }
        };

        int[] expectedOutput = {1,2,3,4,5,6,7,8,9,10,11,12};

        int [] output = tester.mergeKSortedArrays(input);

        System.out.println(Arrays.toString(output));
        assertEquals(Arrays.toString(expectedOutput), 
					Arrays.toString(output));
    }

    @Test
    public void mergeKSortedArraysWithUnequalSizeTest() {

        int[][] input  ={
                { 1, 2 }, { 5, 6, 7}, { 9, 10, 11, 12 }
        };

        int[] expectedOutput = {1,2,5,6,7,9,10,11,12};

        int [] output = tester.mergeKSortedArrays(input);

        System.out.println(Arrays.toString(output));
        assertEquals(Arrays.toString(expectedOutput),
			Arrays.toString(output));
    }

    @Test
    public void mergeKSortedArraysWithNullTest() {

        int [] output = tester.mergeKSortedArrays(null);

        assertEquals(null, output);
    }
}

Complexity of code to merge k sorted arrays is O(n * k * log k) along with space complexity of O(k).

Please share if there is something wrong or missing. If you are preparing for an interview, please sign up to receive interview preparation kit for free.

Merge two sorted linked lists

Merge two sorted linked lists

Problem statement is simple : Merge two sorted linked lists, without using extra space. To refer to the basics of linked list, please follow the post : Linked list data structure. This problem is commonly asked in telephonic round of Amazon and Microsoft. Let’s take an example and understand what is required as solution. Given two linked lists as following,

merge two sorted linked lists
Two sorted linked lists

Result should be like this:

merge two sorted linked list
Resultant linked list.

Merge two sorted linked lists : Thoughts

Consider following two steps to merge sorted linked lists. First, figure out which node should be head of result list. Compare head nodes of two give lists, which ever is smaller, that should be the head of result list.

Second, compare two nodes, one from each list and decide which should go next in result linked list.  Advance the pointer to next node of the node which is added to result list.

As no new node is allocated during this merge, we have to make sure that all the references are maintained when nodes are added to merged linked list.

merge two sorted linked list
Two sorted list to be merged

We can start with one list as merge list and add nodes from second list at appropriate place in that list. Let’s say L1 is our merged list and we always compare node on L2 to see if it should be placed in L1 at current position. L1 grows as more nodes are sorted in merge list.

We compare first two nodes L1 and L2, and decide that node(2) has to go in merged list as head. If it was head of L2, we would have swapped L1 and L2 heads and still L1 will be head of merged list. Why? Because we want that L1 always points to last node in merged list and L1 to represent sorted merged list till this point and L2 switches between two input lists.

As L1 always points to the last node of merged linked list, next node to compare should be L1.next i.e node(4) and L2 i.e node(3).

As L1 follows the merged linked list, we will move L1.next to point node(3), however doing it directly will lead to lose of entire linked list following it. So we do it in four steps : store L1 next as temp; link L2 to L1.next; L2 points to temp and then move L1 to L1.next

Node temp = L1.next;
L1.next = L2;
L2 = temp;
L1 = L1.next
merge two sorted linked lists

Next nodes to be compared are node(5), which is L1.next and node(5) which is L2.

Comparing node 4 and 5 to add in sorted merge list

Of course node(4) has to be added to merged linked list, what should we do? First save L1.next in temp, temp now points to node(5). Second, point L1.next to L2, point L2 to temp, and at last, L1 moves to L1.next. State of two sorted linked lists looks as follows.

merge two sorted linked lists

By this time you must have noticed that L1 and L2 do not point to the one list all the time, L1 always points to the last node of merged list and L2 points to first node of separated list which needs to be merged.

Now, L1.next which is node(7) and L2 which is node(5) will be compared.

node(5) is to be added in merged sorted list. Again same set of steps. L1.next stored as temp, L1.next points to L2 i.e. node(5) and then L2 points to temp i.e. node(7)

merge two sorted linked lists

Again, node(9) which is L1.next will be compared to L2 i.e node(7). L1.next should point to L2. Final state will be as follows

At this point, L1.next i.e node(8) is less than L2, this is simple case, where we just move L1 to L1.next and L2 remains as is.

merge two sorted linked lists

Next two nodes follow the same pattern and added to merged sorted linked list.

At this point, special condition occurs which is L1.next is null. In this case, point L1.next to L2 and two linked lists are merged.

Two sorted linked lists are merged into a sorted list

Merge two sorted linked lists : Implementation

#include <stdio.h>
#include <stdlib.h>

typedef struct node{
    int data;
    struct node *next;
} Node;
 
Node * createNode(int val){
  Node * temp = (Node *)malloc(sizeof(Node));
  if(temp){
    temp->data = val;
    temp->next = NULL;
  }
  return temp;
}
/* This function inserts node at the head of linked list */
void push(Node **headRef, int data){
	Node * newNode  = createNode(data);
	newNode->next = *headRef;
	*headRef  = newNode;
}

void printList(Node * head){
    while(head){
        printf("%d->" , head->data );
        head = head->next;
    }

    printf("NULL");
    printf("\n");
}
Node * MergeLists(Node *list1, Node *list2) {
  if (!list1) return list2;
  if (!list2) return list1;

  Node *head;
	//Chosing head of merged list
  if (list1->data < list2->data) {
    head = list1;
  } else {
    head = list2;
    list2 = list1;
    list1 = head;
  }
	
  while(list1->next && list2) {
    if (list1->next->data > list2->data) {
	//Step 1. Save the next pointer
      Node *tmp = list1->next;
	//Step 2. Change next pointer to point L2
      list1->next = list2;
	//Step 3. Move L2 to temp
      list2 = tmp;
    }
	//Step 4. Move L1 ahead
    list1 = list1->next;
  } 
  if (!list1->next) list1->next = list2;
  return head;
}
int main(){
        Node * L1 = NULL;
        Node * L2 = NULL;
        Node * result = NULL;
        int carry = 0 ;
        /* creating list 1 */
        push(&L1,7);
        push(&L1,6);
        push(&L1,4);
        push(&L1,3);
	
        /* creating list 2 */
        push(&L2,10);
        push(&L2,8);
        push(&L2,1);
 
        L1 = MergeLists(L1,L2); 
        printList(L1);
 
        return 0;
}

Complexity of this method to merge two sorted lists into one is O(n+m) where n and m are number of nodes in two sorted linked lists.

Recursive implementation to merge two sorted linked lists

#include<stdlib.h>
#include<stdio.h>
 
typedef struct node{
    int data;
    struct node *next;
} Node;
 
Node * mergeSort(Node *a, Node *b){
    Node *result = NULL;
    if(a ==  NULL)
        return b;
    else if(b == NULL)
        return a;

    /* For the first node, we would set the result to either a or b */
      if(a->data <= b->data){
         result = a;
        /* Result's next will point to smaller one in lists 
           starting at a->next  and b */
         result->next = mergeSort(a->next,b);
      }
      else {
        result = b;
       /*Result's next will point to smaller one in lists 
         starting at a and b->next */
        result->next = mergeSort(a,b->next);
      }
      return result;
}

Node * createNode(int val){
  Node * temp = (Node *)malloc(sizeof(Node));
  if(temp){
    temp->data = val;
    temp->next = NULL;
  }
  return temp;
}
/* This function inserts node at the head of linked list */
void push(Node **headRef, int data){
	Node * newNode  = createNode(data);
	newNode->next = *headRef;
	*headRef  = newNode;
}

void printList(Node * head){
    while(head){
        printf("%d->" , head->data );
        head = head->next;
    }

    printf("NULL");
    printf("\n");
}

/* Driver program to run above code */
int main(){
        Node * L1 = NULL;
        Node * L2 = NULL;
        Node * result = NULL;
        int carry = 0 ;
        /* creating list 1 */
        push(&L1,7);
        push(&L1,6);
        push(&L1,4);
        push(&L1,3);
        /* creating list 2 */
        push(&L2,10);
        push(&L2,8);
        push(&L2,1);
      
        L1 = mergeSort(L1,L2); 
        printList(L1);
        
        return 0;
}

Please share if there is something wrong or missing. If you want to take personalized coaching from our expert teachers, please signup for free demo class.

Find Kth smallest element in array

Kth smallest element in array

Given an array of integers which is non sorted, find kth smallest element in that array. For example: if input array is A = [3,5,1,2,6,9,7], 4th smallest element in array A is 5, because if you sort the array A, it looks like A = [1,2,3,5,6,7,9] and now you can easily see that 4th element is 5.

This problem is commonly asked in Microsoft and Amazon interviews as it has multiple layers and there is some many things that can be measured with this one problem.

Kth smallest element : Line of thought

First of all, in any interview, try to come up with brute force solution. Brute force solution to find Kth smallest element in array of integers would be to sort array and return A[k-1] element (K-1 as array is zero base indexed).

What is the complexity of brute force solution? It’s O(n2)? Well, we have sort algorithms like merge sort and heap sort which work in O(n log n) complexity. Problem with both searches is that they use additional space. Quick sort is another sort algorithm. It has problem that it’s worst case complexity will be O(n2), which happens when input is completely sorted.
In our case, input is given as unsorted already, so we can expect that quick sort will function with O(n log n) complexity which is it’s average case complexity. Advantage of using quick sort is that there is no additional space complexity.

Optimising quick sort

Let’s see how quick sort works and see if we can optimize solution further?
Idea behind quick sort is to find correct place for the selected pivot. Once pivot is at correct position, all the elements on left side of pivot are smaller and on right side of pivot are greater than pivot. This step is partitioning.

If after partitioning, pivot is at position j, can we say that pivot is actually jth smallest element of the array? What if j is equal to k? Well problem solved, we found the kth smallest element.

If j is less than k, left subarray is less than k, we need to include more elements from right subarray, therefore kth smallest element is in right subarray somewhere. We have already found j smallest elements, all we need to find is k-j elements from right subarray.

What if j is greater than k? In this case, we have to drop some elements from left subarray, so our search space would be left subarray after partition.

Theoretically, this algorithm still has complexity of O(n log n), but practically, you do not need to sort the entire array before you find k smallest elements.

Algorithm to find K smallest elements in array

  1. Select a pivot and partition the array with pivot at correct position j
  2. If position of pivot, j, is equal to k, return A[j].
  3. If j is less than k, discard array from start to j, and look for (k-j)th smallest element in right sub array, go to step 1.
  4. If j is greater than k, discard array from j to end and look for kth element in left subarray, go to step 1

Let’s take an example and see if this algorithm works? A =  [4, 2, 1, 7, 5, 3, 8, 10, 9, 6 ], and we have to find fifth smallest element in array A.

Kth smallest element in array

Start with pivot as first index of array, so pivot = 0, partition the array into two parts around pivot such that all elements on left side of pivot element, i.e. A[pivot] are smaller and all elements on right side are greater than A[pivot].

Start with pivot as first index of array, so pivot = 0, partition the array into two parts around pivot such that all elements on left side of pivot element, i.e. A[pivot] are smaller and all elements on right side are greater than A[pivot].

In our example, array A will look like below after pivot has found it’s correct position.

k smallest element
After partition, correct position of pivot is index 3

If pivot == k-1 (array is represented as zero base index), then A[pivot] is kth smallest element. Since pivot (3) is less than k-1 (4), look for kth smallest element on right side of the pivot.

k remains as it is as opposed to k-j mentioned in algorithm as pivot is given w.r.t entire array and not w.r.t subarray.

In second iteration, pivot = 4 (index and not element). After second execution of quick sort array A will be like

After partition of right subarray, correct position of pivot is index 4

pivot(4) which is equal to k-1(5-1). 5th smallest element in array A is 5.

Implementation

package com.company;

/**
	* Created by sangar on 30.9.18.
*/
public class KthSmallest {
	private void swap(int[] a, int i, int j){
		int temp = a[i];
		a[i] = a[j];
		a[j] = temp;
	}
	private int partition(int[] a, int start, int end){
		int pivot = a[start];
		int i  = start+1;
		int j  = end;

		while(i < j){
			while(a[i] < pivot) i++;
			while(a[j] > pivot) j--;

			if(i < j) {
				swap(a, i, j);
			}
		}
		swap(a, start, j);
		return j;
	}

	public int findKthSmallestElement(int a[], int start, 
				int end, int k){
		if(start < end){
		int p = partition(a, start, end);
		if(p == k-1){
			return a[p];
		}
		if(p > k-1)
			return findKthSmallestElement(a, start, p, k);
		if(p < k-1)
			return findKthSmallestElement(a, p+1, end, k);
		}
		return -1;
	}
}
package test;

import com.company.KthSmallest;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 28.8.18.
 */
public class KthSmallestTest {

	KthSmallest tester = new KthSmallest();
	private int[] a = {4, 2, 1, 7, 5, 3, 8, 10, 9};
	@Test
	public void kthSmallest() {
		assertEquals(7, tester.findKthSmallestElement(a,0,8,6));
	}

	@Test
	public void firstSmallest() {
		assertEquals(1, tester.findKthSmallestElement(a,0,8,1));
	}

	@Test
	public void lastSmallest() {
		assertEquals(10, tester.findKthSmallestElement(a,0,8,9));
	}

	@Test
	public void kGreaterThanSize() {
		assertEquals(-1, tester.findKthSmallestElement(a,0,8,15));
	}
	@Test
	public void emptyArray() {
		int[] a = {};
		assertEquals(-1, tester.findKthSmallestElement(a,0,0,1));
	}

	@Test
	public void nullArray() {
		assertEquals(-1, tester.findKthSmallestElement(null,0,0,1));
	}
}

Complexity of using quick sort algorithm to find kth smallest element in array of integers in still O(n log n).

Kth smallest element using heaps

Imagine a case where there are a billion integers in array and you have to find 5 smallest elements from that array. Complexity of O(n log n) is too costly for that use case. Above algorithm using quick sort does not take into consideration disparity between k and n.

We want top k elements, how about we chose those k elements randomly, call it set A and then go through all other n-k elements, call it set B, check if element from set B (n-k elements) can displace element in set A (k elements)?

What will be condition for an element from set B to replace an element in set A? Well, if the new element is less than maximum in set A, than maximum in set A cannot be in set of k smallest elements right?  Maximum element in set A would be replaced by the new element from set B.

Now, problem is how to quickly find maximum out of set A. Heap is the best data structure there. What kind of heap: min heap or max heap? Max heap as it store the maximum of set at the root of it.

Let’s defined concrete steps to find k smallest elements using max heap. 

  1. Create a max heap of size k from first k elements of array.
  2. Scan all elements in array one by one.
    1.  If current element is less than max on heap, add current element to heap and heapify.
    2. If not, then go to next element.
  3. At the end, max heap will contain k smallest elements of array and root will be kth smallest element.

Let’s take an example and see if this algorithm works? Input array is shown below and we have to find 6th smallest element in this array.

kth smallest element using heaps
input array

Step 1 : Create a max heap with first 6 elements of array.

Create a max heap with set A

Step 2 : Take next element from set B and check if it is less than root of max heap. In this case, yes it is. Remove the root and insert the new element into max heap.

Element from set B removes root from max heap and added to max heap

Step 2 : It continues to 10, nothing happens as new element is greater than root of max heap. Same for 9.  At 6, again root of max heap is greater than 6. So remove the root and add 6 to max heap.

Again, new element from set B is less than root of max heap. Root is removed and new element is added.

Array scan is finished, so just return root of max heap, 6 which is sixth smallest element in given array.

	public int findKthSmallestElementUsingHeap(int a[], int k){
	//https://stackoverflow.com/questions/11003155/change-priorityqueue-to-max-priorityqueue
	PriorityQueue<Integer>  maxHeap =
			new PriorityQueue<>(k, Collections.reverseOrder());

		if(a == null || k > a.length) return -1;
		//Create max with first k elements
		for(int i=0; i<k; i++){
			maxHeap.add(a[i]);
		}
		/*Keep updating max heap based on new element
		If new element is less than root, 
		remove root and add new element
		*/
		for(int i=k; i<a.length; i++){
			if(maxHeap.peek() > a[i]){
				maxHeap.remove();
				maxHeap.add(a[i]);
			}
		}
		return maxHeap.peek();
	}

Can you calculate the complexity of above algorithm? heapify() has complexity of log(k) with k elements on heap. In worst case, we have to do heapify() for all elements in array, which is n, so overall complexity of algorithm becomes O(n log k). Also, there is additional space complexity of O(k) to store heap.
When is very small as compared to n, this algorithm again depends on the size of array.

We want k smallest elements, if we pick first k elements from a min heap, will it solve the problem? I think so. Create a min heap of n elements in place from the given array, and then pick first k elements.
Creation of heap has complexity of O(n), do more reading on it. All we need to do is delete k times from this heap, each time there will be heapify(). It will have complexity of O(log n) for n element heap. So, overall complexity would be O(n + k log n).

Depending on what you want to optimize, select correct method to find kth smallest element in array.

Please share if there is something wrong or missing. If you are interested in taking coaching sessions from our experienced teachers, please reach out to us at communications@algorithmsandme.com

Quick sort algorithm

Quick sort Algorithm

Quick sort like merge sort is a sorting algorithm under divide and conquer paradigm of algorithms like merge sort. Basic idea of algorithm is to divide inputs around a pivot and then sort two smaller parts recursively and finally get original input sorted.

Selection of pivot

Entire idea of quick sort revolves around pivot. Pivot is an element in input around which input is arranged in such a way that all elements on left side are smaller and all elements on right side are greater than pivot. Question is how to find or select pivot and put it into correct position.

To make things simpler to start with, let’s assume first element of input is pivot element.

To put this pivot at correct position in input, start with next element of pivot in input space and find first element which is greater than pivot. Let that be ith position.

At the same time, start from end of array and find first element which is smaller than pivot. Let it be jth position.

If i and j have not crossed each other i.e i < j, then swap element at ith and jth positions, and continue moving right on input to find element greater than pivot and moving left to find element smaller than pivot.
Once i and j cross each other, swap pivot with element at jth position.  After this step, pivot will be at its correct position and array will be divided into two parts. All elements on left side will be less than pivot and all elements on right side will be greater than pivot.

Quick sort partition example

This is too much to process, I know! Let’s take an example and see how it does it work? We have an array as follows

quick sort

Let’s select first element as pivot, pivot = 3.

quick sort pivot selection

Start from next element of pivot, move towards right of array, till we see first element which is greater than pivot i.e. 3.

From end of array, move towards left till you find an element which is less than pivot.

Now, there are two indices, i and j, where A[i] > pivot and A[j] < pivot. See that i and j not yet crossed each other. Hence, we swap A[i] with A[j]. Array at the bottom of pic, shows resultant array after swap.

quick sort partition

Again, start with i+1 and follow the same rule : Stop when you find element greater than pivot. In this case, 10 is greater than 3, hence we stop.

Similarly, move left from end again, till we find an element which is less than pivot. In this case, we end up at index = 2 which is element 1.

Since, i > j, than means paths have been crossed. At this time, instead of swapping element at i and j index, swap element at j index with pivot.

After swapping pivot with jth index, we have array divided into two parts, pivot as boundary. All elements on left side of pivot are smaller (they may not be sorted) and all elements on right side of pivot are greater than pivot (again may not be sorted).

quick sort partitions

We, apply this same partition process to left and right arrays again, till base condition is hit. In this case, base condition would be if there is only one element in array to be partitioned.

Quick sort algorithm

quickSort([], start, end)
 1. If array has more than one elements i.e (start < end):
    1.1 Find correct place for pivot.
    pivot = partition(arr, low, high)
    1.2. Apply same function recursively to left of pivot index
         quickSort(arr, start, pivot -1 )
         and to the right of pivot index
         quickSort(arr, pivot + 1, end)

Quick sort implementation

#include <stdio.h>

void swap(int a[], int i, int j){
	int temp = a[i];
	a[i] = a[j];
	a[j] = temp;
}

int partition(int a[], int start, int end){
	
	int pivot = a[start];
	int i  = start+1;
	int j  = end;
	
	while(i < j){
	    while(a[i] < pivot) i++;
            while(a[j] > pivot) j--;
	
            if(i < j) {
                swap(a, i, j);
	    }
	}
	swap(a, start, j);
	return j;
}

void quickSort(int a[], int start, int end){
    if(start < end){
        int p = partition(a, start, end);
	quickSort(a,start, p-1);
	quickSort(a, p+1, end);
    }
}

int main(void) {
	int a[]= {4,3,2,5,6,8,1};
	int size = sizeof(a)/sizeof(a[0]);
	
	quickSort(a, 0, size-1);
	
	for(int i=0; i < size; i++){
		printf(" %d", a[i]);
	}
	return 0;
}

There is another implementation which is based on Lomuto partition scheme, in this scheme, we make last element as pivot. The implementation is compact but complexity is bit higher than the original partition methods in terms of number of swaps.

#include<stdlib.h>
#include<stdio.h>
 
void swap(int *a, int *b){
    int temp = *a;
    *a = *b;
    *b = temp;
}
 
int partition(int a[], int low, int high)
{
    // set pivot as highest element
    int x  = a[high];
 
    //Current low points to previous of low of this part of array. 
    int i = low - 1;
 
    for (int j = low; j <= high-1; j++)
    {
    	/*Move in the array till current node data is 
        less than the pivot */
        if (a[j] <= x){
            //set the current low appropriately
            i++;
            swap(&a[i], &a[j]);
        }
    }
    //Now swap the next node of current low with pivot
 
    swap(&a[i+1], &a[high]);
 
    printf("\n Pivot : %d\n", a[i+1]);
    for(int j=0; j<=high; j++){
 
    	printf("%d ", a[j]);
    }
    //return current low as partitioning point.
    return i+1;
}
 
/* A recursive implementation of quicksort for linked list */
void quickSortUtil(int a[], int low, int high)
{
    if (low < high)
    {
        int p = partition(a,low, high);
        quickSortUtil(a,low, p-1);
        quickSortUtil(a, p+1, high);
    }
}
 
/* Driver program to run above code */
int main(){
 
    int a[] = {5,4,2,7,9,1,6,10,8};
 
    int size = sizeof(a)/sizeof(a[0]);
    quickSortUtil(a, 0, size-1);
 
    for(int i=0; i<size; i++){
    	printf("%d ", a[i]);
    }
    return 0;
}

Complexity analysis of quick sort algorithm

If pivot splits original array into two equal parts (which is the intention), complexity of quick sort is O(n log n). However, worst case complexity of quick sort happens when input array is already sorted in increasing or decreasing order. In this case, array is partitioned into two subarrays, one with size 1 and other with size n-1. Similarly, subarray with n-1 elements, it again is divided into two subarrays of size 1 and n-2. In order to completely sort array it will split for n-1 times and each time it requires to traverse n element to find correct position of pivot. Hence overall complexity of quick sort comes out as O(n2).

There is a very interesting question, which tests your understanding of system basics. Question is what is space complexity of this algorithm? There is no apparent memory is used. However, recursive implementation internally puts stack frames on stack for partitioned indices and function call return address and so on. In worst case, there can be n stack frames, hence worst case complexity of quick sort will be O(n).

How can we reduce that? If the partition with fewest elements is (recursively) sorted first, it requires at most O(log n) space. Then the other partition is sorted using tail recursion or iteration, which doesn’t add to the call stack. This idea, was described by R. Sedgewick, and keeps the stack depth bounded by O(log n) and hence space complexity will be O(log n).

Quick sort with tail recursion

Quicksort(A, p, r)
{
 while (p < r)
 {
  q = Partition(A, p, r)
  Quicksort(A, p, q)
  p = q+1
 }
}

Selection of Pivot
If array is completely sorted, then worst case behavior of quick sort is O(n2), so there comes another problem. How can we select pivot so that two subarrays are almost equal size. There are many solutions proposed.
1. Taking median of array as pivot. So how to select median of an unsorted array. We look into this problem separately, but yes it guarantees two halves of same size.
2. Selecting pivot randomly. This requires heuristics algorithms to select pivot.

Please leave your comment in case you find something wrong or you have some improved version.