Blog

What is dynamic programming?

What is Dynamic Programming or DP

Dynamic programming is an approach to solve a larger problem with the help of the results of smaller subproblems. It is a technique used to avoid computing multiple time the same subproblem in a recursive algorithm. I find a lot of students asking me question around, how do I know this problem is a dynamic programming problem? There is a definite way to arrive at the conclusion if a problem is a dynamic programming problem or not?

The first thing I would recommend you to read before going down is this beautiful explanation of dynamic programming to 4 years old.

The first thing you will notice about dynamic programming problems (not all problems) is they are optimization problem. Either it will be finding minimum or maximum of some entity. For example, find minimum edit between two strings or find longest common subsequence etc. However, problems like Fibonacci series are not exactly like an optimization problem, these are more like Combinatorial problems. Still, this can be a good hint that a problem can be a DP problem.

Second, you will notice that the problem can be divided into a pattern like an fn(n) = C + fn(n-k) where k can be anything between 1 and n.
This property is called optimum subproblem structure, where an optimum solution to the subproblem leads to the optimum solution to the larger problem.
Once you get the equation, it is very easy to come with a recursive solution to the problem. I would advise you to write the recursive solution and try to calculate the complexity of the solution. It will exponential in big-O notation.

Then why did recursion work so well with a divide and conquer approach? The key point is that in divide and conquer, a problem is expressed in terms of subproblems that are substantially smaller, say half the size. For instance, mergesort sorts an array of size n by recursively sorting two subarrays of size n/2. Because of this sharp drop in problem size, the full recursion tree has only logarithmic depth and a polynomial number of nodes. In contrast, in a typical dynamic programming formulation, a problem is reduced to subproblems that are only slightly smaller than the original. For instance, fn(j) relies on fn(j − 1). Thus the full recursion tree generally has polynomial depth and an exponential number of nodes.
However, it turns out that most of these nodes are repeats, that there are not too many distinct subproblems among them. Efficiency is therefore obtained by explicitly enumerating the distinct subproblems and solving them in the right order.
Reference

This will lead us to the third property, which is overlapping subproblems. Once, you draw the execution tree of the recursive solution of the problem, it will appear that a lot of problems are being solved again and again at different levels of recursion.

The intuition behind dynamic programming is that we trade space for time, i.e. to say that instead of calculating all the subproblems taking a lot of time but no space, we take up space to store the results of all the subproblems to save time later. The typical characteristics of a dynamic programming problem are optimization problems, optimal substructure property, overlapping subproblems, trade space for time, implementation via bottom-up/memoization.

Dynamic programming in action

Enough of theory, let’s take an example and see how dynamic programming works on real problems. I will take a very commonly used but most effective problem to explain DP in action. Problem is known as the Fibonacci series. Fibonacci series is a series of integers where each integer is the sum of previous two integers. For example, 1,1,2,3,5,8,13,17 is a Fibonacci series of eight integers. Now, the question is given a number n, output the integer which will be at the nth integer in Fibonacci series. For example for n = 4, the output should be 3 and for n=6, it should 8.

First hint: It is a combinatorial problem, so maybe a DP problem. Second, it is already given in the problem that current integer depends on the sum of previous two integers, that means f(n) = f(n-1) + f(n-2). This implies that the solution to subproblems will lead to a solution to the bigger problem which is optimal substructure property.

Next step is to implement the recursive function.

 public int fibonacci (int n) {
    if (n < 2) //base case
        return 1;

    return fibonacci(n-1) + fibonacci(n-2);
 }

Great, next step is to draw the execution tree of this function. It looks like below for n = 6. It is apparent how many times the same problem is solved at different levels.

what is dynamic programming
Recursive tree of Fibonacci series function

So, now we know three things about the Fibonacci problem: It is combinatorial problem, there is optimal substructure and there are overlapping subproblems. As in dynamic programming, we side with more space than time, we will try to use extra space to avoid recalculating subproblems.

The first way is to use a case, which stores the value of fab(n) if it is already calculated. This is called memoization or top-down approach.

Map<Integer, Integer> cache = new HashMap<>();

public int fibonacci(int n){
    if (n == 0)
       return 0;
    if (n == 1)
        return 1;

    if(cache.containsKey(n))
        return cache.get(n);

    cache.put(n, fibonacci(n - 1) + fibonacci(n - 2));

    return cache.get(n);
}

Another approach is bottom up, where the smaller problems are solved in an order which helps us with solving bigger problems. Here also, we use memoization but in a different way. We store the solution of smaller subproblems and directly use this to build the solution.

int[] fib = new int[n];
fib[0] = fib[1] = 1;
public int fibonacci(int n){
   for(int i=2; i<=n; i++){
       fib[n] = fib[n-1] + fib[n-2];
   }
   return fib[n];
}

Above solution requires extra O(n) space, however, the time complexity is also reduced to O(n) with each subproblem solved only once.

Follow longest increasing subsequence problem, how we have applied the same pattern while we solved the problem.

Final thoughts
Where to apply dynamic programming : If you solution is based on optimal substructure and overlapping sub problem then in that case using the earlier calculated value will be useful so you do not have to recompute it. It is bottom up approach. Suppose you need to calculate fib(n) in that case all you need to do is add the previous calculated value of fib(n-1) and fib(n-2)

Recursion : Basically subdividing you problem into smaller part to solve it with ease but keep it in mind it does not avoid re computation if we have same value calculated previously in other recursion call.

Memoization : Basically storing the old calculated recursion value in table is known as memoization which will avoid re-computation if its already been calculated by some previous call so any value will be calculated once. So before calculating we check whether this value has already been calculated or not if already calculated then we return the same from table instead of recomputing. It is also top down approach.
Reference: Answer by Endeavour

Please share if there is something wrong or missing. If you are preparing for an interview and need help with preparation, please book a free session with us to guide you through it.

0/1 knapsack problem using dynamic programming

0/1 Knapsack problem

0/1 Knapsack is typical problem which is used to demonstrate application of greedy algorithm as well as dynamic programming. There are cases when applying greedy algorithm does not give optimal solution. There are many flavors in which Knapsack problem can be asked.  

1. A thief enters a museum and want to steal artifacts from there. Every artifact has a weight and value associated with it. Thief carries a knapsack (bag) which can take only a specific weight. Problem is to find the combination of artifacts thief steals so that he gets maximum value and weight of all taken artifacts is less capacity of  knapsack he has. Thief cannot take any artifact partially. Either he takes it or leaves it. Hence the problem is 0/1 knapsack.

2. We have N files each having a size say Si. We have a total storage capacity of W bytes. For each file to be stored re-computation cost is Vi. Problem is to store as many files on storage that combined size of all files is less than W and their re-computation value is maximum. We can either store or leave a file, we cannot store partial file. Hence this is a case of 0/1 knapsack problem.

0/1 knapsack problem : Line of thoughts

Brute force method would try all subsets of set of items, whose weight adds up to maximum capacity of knapsack and see which one gives maximum value. Complexity of brute force algorithm would be of exponential order as there will be 2n possible subsets for n items.

Can we do better?  If we consider each item, there are two possibilities associated with it.
First, current item is included in optimal subset. Then we need to find out all the items in remaining N-1 items which can optimize the subproblem for weight W-wk. Value of this item is added to candidate maximum value.

Second, current item is not included in optimal subset. In that case, we need to find out items in remaining N-1 items which can optimize the the original problem. Value of current item is not added into candidate maximum value.

Inclusion depends on two conditions :

  1. Weight of the item is less than the total capacity of knapsack.
  2. Inclusion of item increases current max value with K-1 items with W-Wk weight.  

Since every steps reduces the problem to a smaller problem in terms of items of weight, recursive solution would be our first refuge. To implement this problem, what are the base cases? First, we cannot add any items to knapsack capacity is zero i.e. W == 0. Second, no item can be taken if there are no items remaining, i.e. n == 0.

Recursive implementation of 0/1 knapsack problem

package com.company;
/**
	* Created by sangar on 19.8.18.
*/
public class KnapsackProblem {
	static int knapSack(int W, int[] weights, int[] val, int n) {
		/*
			If there is no item or weight that can be carried is
			zero, return zero
		*/
		if (n &lt; 0 || W == 0)
			return 0;

		/* 
			If weight of the nth item is more than Knapsack 
			capacity W,then this item cannot be included
			in the optimal solution
		*/
		if (weights[n] &gt; W)
			return knapSack(W, weights, val, n - 1);

		/* Consider two cases, including item and excluding item.*/
		else return Integer.max(
			(val[n]
				+ knapSack(W - weights[n], weights, val, n - 1)),
			(knapSack(W, weights, val, n - 1))
		);
	}

	public static void main(String args[]) {
		int[] val = {60, 100, 120};
		int[] wt = {10, 20, 30};
		int W = 50;
	
		int n = val.length;
		System.out.println(knapSack(W, wt, val, n - 1));
	}
}

If we look at the execution trace of the function, it looks like this.

0/1 knapsack problem

There are seven problems to be solved at the leaf level. For N = 3, there are 7 problems to be solved before we start optimizing for the max value. For N, in general, it will take 2N subproblems to be solved. Hence, complexity of recursive implementation is O(2N).

If we take another example, it will become evident that there are some subproblems which are solved again and again. Overlapping subproblems is one of the criteria, we should be thinking about dynamic programming. Also, notice that optimal solution to a smaller problem leads to optimal solution to bigger problem, which is second condition for DP.  This problem satisfy both these conditions, hence let’s design DP solution for it.

0/1 knapsack problem : Dynamic programming approach

We define two dimensional array V[N,W] where N is number of items and W is capacity. For 1<= i <= n and  0<=w<=W, V[i,w] represents the optimal solution for items I1, I2, ..In with maximum weight of w.  If we can compute all the entries of this array, then the array entry V[N,W] is the solution to our problem

For i =0 and w=0, all values will be zero. So, first column and first row will be filled with all zero values.
Recursively, we can fill the table bottom up as follows.

V[i, w ] = max (V[i-1, w], V[i-1, w-w[i]) + V[i] )
V[0, w ] = 0; there are no items
V[i, 0 ] = 0; no items can be picked.
package com.company;

/**
	* Created by sangar on 19.8.18.
*/
public class KnapsackProblem {
	public static int knapsackDP(int W, int[] weights, int[] val, int n) {
		int[][] V = new int[n+1][W + 1];
		for(int i = 1 ; i &lt; V[0].length; i++){
			/*
				If weight of item is less than current value
				we can achieve minimum value V[i] with 1..i items
			*/
			if(weights[0] &lte; i){
				V[0][i] = val[0];
			}else{
				V[0][i] = 0;
			}
		}

		//Loop for all items
		for (int i = 1; i &lt; V.length; i++) {
			for (int j = 1; j &lt; V[i].length; j++) {
				/*if a weight is more than the allowed weight, 
				that weight cannot be picked. */
				if(weights[i] &gt; j){
					V[i][j] = V[i-1][j];
				}else{
					V[i][j] = Math.max(V[i-1][j], 
							val[i] + V[i-1][j-weights[i]]);
				}
			}
		}
		return V[V.length-1][W];
	}

	public static void main(String args[]) {
		int[] val = {60, 100, 120};
		int[] wt = {10, 20, 30};
		int W = 50;

		int n = val.length;
		System.out.println(knapsackDP(W, wt, val, n - 1));
	}
}

One similar problem which can be solved with same approach is minimum number of coins to be used to get change of a particular amount. I am skipping the whole analysis and directly pasting the code here.  

Complexity of the dynamic programming implementation of knapsack problem is O(N *W). Space complexity is again O(N*W). It is thumb rule that we trade space for time in dynamic programming.

Please share if there is something is wrong or missing. If you want to contribute to website, please reach out to us at communications@algorithmsandme.com

Longest increasing subsequence : Dynamic Programming

Longest increasing subsequence

Given an array of integers, find the longest increasing subsequence i.e a subsequence such that every element of subsequence satisfy this condition i < j  and a[i] < a[j].For example, in array {2,4,6,3,5,7,9} longest increasing subsequence is of length 5  = {2,4,6,7,9} longest increasing subsequence
More examples :

Input  : arr[] = {3, 10, 2, 1, 20}
Output : Length of LIS = 3
The longest increasing subsequence is 3, 10, 20

Input  : arr[] = {3, 2}
Output : Length of LIS = 1
The longest increasing subsequences are {3} and {2}

Input : arr[] = {50, 3, 10, 7, 40, 80}
Output : Length of LIS = 4
The longest increasing subsequence is {3, 7, 40, 80}

Longest common subsequence : Line of Thoughts

We have to find longest increasing subsequence till last element of array, i.e Nth element, question is does depend on LIS till N-1 element? Idea is to see if any increasing subsequence already present till current index i,  can include  A[i] and still remain increasing? To confirm that, check every element at index j such that j>=0 and j < i and A[j] < A[i].
If element A[j] is less than A[i], then A[i] can be part of increasing subsequence ending with element j. Length of such increasing subsequence can be (length of increasing subsequence ending at j )+1. Check for each such element and take maximum length.Let’s see an example and see how it works.

Before we start solution,, let’s think why are we applying dynamic programming here. First because, solution to bigger problem depends on optimal solution of subproblems and second, because if we do not store solutions to subproblems we may end up calculating them again and again. Both Optimal subproblem property and overlapping subproblem property are satisfied for this problem and hence we will use dynamic programming to solve it.

Longest increasing subsequence : Implementation

Define an array LIS of size N, LIS[i] will represent longest increasing subsequence length till element i.

LIS[i] = 1 + max(LIS[j]) for all  0<=j<i and A[j]<A[i] 
       = 1 if no such element exists where j< i and A[j]<A[i]

Below is C code to implement it.

#include <stdio.h>
#include <stdlib.h>

int maximumLIS(int a[], int end, int *lis){
    for (int i=0; i<end; i++){
        if( a[i] < a[end] && lis[i] > lis[end] )
            lis[end] = lis[i];
    }
    return lis[end];
}

int lis(int a[], int size){

    int *lis = (int *)malloc(sizeof(int)*size);
    lis[0] = 1;

    for(int i=1; i<size; i++){
    	lis[i] = 1 + maximumLIS(a,i,lis);
    }
    
    int result = lis[size-1];
    free(lis);
    
    return result;
}

int main(void) {
	int a[] = { 2,4,6,3,5,7,9 };
	int size = sizeof(a)/sizeof(a[0]);

	printf("Length of Longest increasing subsquence : %d" , lis(a, size));
	
	return 0;
}

Java implementation for the same algorithm

package com.company;

/**
 * Created by sangar on 7.1.18.
 */
public class LIS {

    private static int maximumLIS(int a[], int end, int [] LIS){
        for (int i=0; i<end; i++){
            if( a[i] < a[end] && LIS[i] > LIS[end] )
            LIS[end] = LIS[i];
        }
        return LIS[end];
    }

    private static int lis(int [] A){

        int [] LIS = new int[A.length];
        LIS[0] = 1;

        for(int i=1; i<A.length; i++){
            LIS[i] = 1 + maximumLIS(A,i,LIS);
        }

        return LIS[A.length - 1];
    }
    public static void main(String[] args) {
        int[] A = {2,4,6,3,5,7,9};
        System.out.println("Longest increasing sybsequence : " + lis(A));
    }
}

Let’s take an example and see how this code works? For example, given array A = {2,4,6,3,5,7,9}

Initialize LIS[0] =1, that means there is an increasing subsequence of length 1 at index 0.
For i = 1 i.e 4, check for j=0 to 1, excluding index 1. A[0] < A[1], hence LIS length is 2 (LIS[0] + 1 ).

For i = 2, i.e. 6 , check j = 0 to 2.  and check that LIS[0] = 1 and LIS[1] =2. Max LIS[j] for j=0 to  2 is 2. LIS[2] = 3 (LIS[1] +1).
For i =3 i.e 3, check from 0 to 3, Max LIS till index 3 will be LIS[3] = 2 because only A[0] is less than A[3]. Hence longest subsequence ending with i = 3 will have length only of 2.  LIS[3] = 2
For i = 4, i.e.5 Max LIS is again 3 which includes {2,4,5} or {2,3,5}
For i = 5, i.e 7, Max LIS is 4 which includes {2,4,5,7} or {2,3,5,7} or {2,4,6,7}
For i = 6, i.e 9, Max LIS is 5 which includes {2,4,5,7,9} or {2,3,5,7,9} or {2,4,6,7,9}

Therefore, longest increasing subsequence is 6 for given array.

Other problems which are variance of longest increasing subsequence and can be solved by finding longest increasing subsequence are :

1. Given two river banks (visualization : two parallel lines), one bank has numbers written (1….n) in sorted order. On the other bank the numbers (1…n) are arranged randomly. A bridge can be formed from the ith point from bank 1 to ith point in bank 2. Find the max number of non-intersecting bridges you can form?
Just find longest increasing subsequence in non ordered number and that will be the solution.

2. Given a set of n types of rectangular 3-D boxes, where the ith box has height h(i), width w(i) and depth d(i) (all real numbers). You want to create a stack of boxes which is as tall as possible, but you can only stack a box on top of another box if the dimensions of the 2-D base of the lower box are each strictly larger than those of the 2-D base of the higher box.

3. Another problem which borrow heavily from approach is find longest zigzag subsequence in an array

Algorithm to find longest increasing subsequence works in O(N^2) in time complexity with O(N) space complexity.

Please share if you find anything wrong or missing, we would love to hear from you. If you want to be part of editors and contributors team at Algorithms and Me, please drop us an email at communications@algorithmsandme.com

Longest palindrome substring

Longest palindrome substring

Given a string, find the longest palindrome substring in that string. For example, in the string given below, the longest palindrome substring is DCBABCD

longest palindrome substring dp

longest palindrome substring dynamic programming

Brute force solution to the problem is pretty easy. Idea is to start from each character in the string and go on checking on the left and right side of that character until they are same. Once the left and right side characters differ, we check if the number of characters in substring centered character is greater than earlier such substring for any other character processed earlier. If it is greater, then we update the length and look for subsequent characters till the end of the string.

Implementation

    private String longestPalindrome(String s) {

        int longest = Integer.MIN_VALUE;
        int start = 0; int end = -1;
        
        for(int i=0; i<s.length(); i++){
            int currentLength = 1;
            for(int j= i-1, k= i+1; j>=0 && k < s.length() 
                              && s.charAt(j) == s.charAt(k); j--, k++){
                    currentLength+= 2;
            }
            if(currentLength >= longest) {
                longest = currentLength;
                start = i - currentLength / 2;
                end = i + currentLength / 2;
            }
        }

        for(int i=0; i<s.length(); i++){
            int currentLength = 0;
            for(int j= i, k= i+1; j>=0 && k < s.length()
                          && s.charAt(j) == s.charAt(k); j--, k++){
                    currentLength+=  2;
            }

            if(currentLength >= longest) {
                start = (i - currentLength/2) + 1;
                end = (i + currentLength /2 );
                longest = currentLength;
            }
        }

        return s.substring(start,end+1);
    }

The complexity of the implementation is O(n2).

Longest palindrome string: a dynamic programming approach

To apply dynamic programming to any problem, there are two conditions which must satisfy: First, the optimal subproblem property meaning that solution to smaller subproblems leads to the solution to the larger problem. Second, overlapping subproblem substructure meaning there are subproblems which are solved again and again in a recursive way.

The first property gives us the recurrence relationship, whereas the second property points us towards memoization.

Let’s take an example and see if we can come up with an algorithm. We have to find if a string of length 4 is a palindrome or not. To string to be a palindrome, it’s first and the last character should be the same. There are two paths here, either the first and last characters are same or they are not.

If the first and last characters are not the same, we can safely say that string is not a palindrome. If first and last characters are same, can we say the string is a palindrome? No, not yet. We have to check if the substring from the second character to the second last character is a palindrome or not.

longest palindrome substring

This is optimum subproblems property here. If substring s[i+1..j-1] is a palindrome, then string s[i..j] is palindrome if and only if s[i] = s[j].

Now, substring s[i+1..j-1] will be calculated for all the indices k where 0<=k<=i and for the indices k where j <=k<=length. This hints at overlapping substructure property. Can we precalculate this stuff before and save it in memory?
We can store this information in a matrix M, where M[i][j] stores if a substring starting at i and length j is palindrome or not. Notice that meaning of j is length and not an index in original string.

M[i][1] will be always true as a character is a palindrome in itself. M[i][2] is true only if s[i]=S[i+1]

For M[i][3], the first character of the substring will be s[i], the last character will be s[i+3-1], if s[i] = s[i+2] and if M[i+1][2] is true (substring starting at i+1 and length 2 is palindrome or not), then only M[i][3] can be true.

We fill the table bottom up, starting with substrings with length 1 and 2 from each character in the string.

M[i][j] = (s[i] == [j+i-1] && M[i+1][j-2]) for i = 3 to n-j+1 and j = 1 to n

Longest palindrome substring: DP implementation

class Solution {
    public String longestPalindromeSubstring(String s) {
        if(s.length() == 0 ) return "";

        int longest = 0;
        int start = 0;
        int end = 0;
        boolean[][] table = new boolean[s.length()][s.length()];
        table[0][0] = true;

        for (int i = 1; i < s.length(); i++) {
            //All single characters are palindrome in itself
            table[i][i] = true;
   
            //All two characters are palindrome if two characters are equal
            table[i - 1][i] = s.charAt(i - 1) == s.charAt(i);
            if(table[i-1][i] && longest <= 2){
                longest = 2;
                start = i-1;
                end = i;
            }

        }

        for (int L = 3; L <= s.length(); L++) {
            for (int i = 0; i <= s.length()-L; i++) {
                int j = i + L - 1;
                table[i][j] = table[i + 1][j - 1] 
                               && (s.charAt(i) == s.charAt(j));

                if(table[i][j] && longest <= L){
                    longest = L;
                    start = i;
                    end = j;
                }
            }
        }

        return s.substring(start, end+1);

    }
}

Test cases

package test;

import com.company.LongestPalindromeSubstring;
import org.junit.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
/**
 * Created by sangar on 2.1.19.
 */
public class LongestPalindromeSubstringTest {

    LongestPalindromeSubstring tester = new LongestPalindromeSubstring();

    @Test
    public void longestPalindromeSubstringTest() {
        assertEquals(4, tester.longestPalindromeSubstring("ABBA"));
    }

    @Test
    public void longestPalindromeSubstringOneCharTest() {
        assertEquals(1, tester.longestPalindromeSubstring("ACDB"));
    }

    @Test
    public void longestPalindromeSubstringInMidTest() {
        assertEquals(4, tester.longestPalindromeSubstring("CABBAD"));
    }

    @Test
    public void longestPalindromeSubstringOddLenTest() {
        assertEquals(5, tester.longestPalindromeSubstring("ABCBA"));
    }
}

The complexity of the dynamic programming approach is much better than brute force algorithm and is O(n2), It used extra memory though in order O(n2).

You can test this solution on the leetcode problem

Please share if there is something wrong or missing. If you are preparing for an interview and want help with dynamic programming techniques, please reach out to our team at communications@algorthmsandme.com

Merge k sorted arrays

Merge k sorted arrays

Given k sorted arrays each having n elements, merge k sorted arrays into one n*k element array in sorted order. For example, given 3 arrays are as below

merge k sorted arrays
merge k sorted arrays

Result array should be like

Merge k sorted arrays

Merge k sorted arrays

Since all the input arrays are sorted, the first element in result array will be among the first elements of input arrays. How can we find the minimum among all the elements plucked from the first index of each array ? Easy, take those k elements (there are k arrays, so k first elements) and build a min heap. The root of the min heap the least element among the first elements of all arrays, so it will be the first element in the result array.

Once, we add the first element into the result array, we have to find the second element. Second element can be from the set of first elements of all input arrays except one array from which the first element of result array was added. So, we will take second element of that array.

In order to know which array gave the minimum element at a particular time, we will store additional information of about array and index at which minimum element was.

If i represents the array number, and j represents the index of the minimum number in heap in ith array, then we add (j+1)th element to the min heap next and re-heapify. If j goes out of bound for ith array, we take min heap with k-1 size and go on, till we have no elements left in heap.

Follow the procedure for (n-1)*k times. When all array elements are processed, result array will be in the sorted array.

Merge k sorted arrays: algorithm

  • Build min heap with the first element of all k arrays.
  • Pick the root of min element and put it in the result array.
  • If there are remaining elements in the array,  put next element at the root of min heap and heapify again
  • If all elements are already of an array are processed, reduce the size of min heap by 1.
  • Repeat step 2, 3 and 4 till min heap is empty.

Merge k sorted arrays: implementation

package com.company;

import java.util.PriorityQueue;

/**
 * Created by sangar on 2.12.18.
 */
public class MergeKSortedArrays {
    private class HeapNode{
        public int arrayNum;
        public int index;
        public int value;

        public HeapNode(int arrayNum, int index, int value){
            this.arrayNum = arrayNum;
            this.index = index;
            this.value = value;
        }
    }

    public int [] mergeKSortedArrays(int[][] arrays){

        if(arrays == null) return null;

        PriorityQueue<HeapNode> minHeap =
			new PriorityQueue<>(arrays.length,
                (HeapNode a,HeapNode b)-> a.value - b.value);

        int size = 0;
        for(int i =0; i<arrays.length; i++){
            size += arrays[i].length;
        }
        int[] result = new int[size]; // k * n

        //add first elements in the array to this heap
        for(int i=0; i<arrays.length; i++){
            minHeap.add(new HeapNode(i, 0, arrays[i][0]));
        }

        //Complexity O(n * k * log k)
        for(int i=0; i< size; i++){
            //Take the minimum value and put into result
            HeapNode node = minHeap.poll();

            if(node != null){
                result[i] = node.value;
                if(node.index + 1 < arrays[node.arrayNum].length) {
                    //Complexity of O(log k)
                    minHeap.add(new HeapNode(node.arrayNum,
                           node.index + 1,
                           arrays[node.arrayNum][node.index + 1]));
                }
            }
        }
        return result;
    }
}

Test cases

package test;

import com.company.MergeKSortedArrays;
import org.junit.jupiter.api.Test;

import java.util.Arrays;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class MergeKSortedArraysTest {

    MergeKSortedArrays tester = new MergeKSortedArrays();

    @Test
    public void mergeKSortedArraysTest() {

        int[][] input  ={
            { 1, 2, 3, 4 }, { 5, 6, 7, 8 }, { 9, 10, 11, 12 }
        };

        int[] expectedOutput = {1,2,3,4,5,6,7,8,9,10,11,12};

        int [] output = tester.mergeKSortedArrays(input);

        System.out.println(Arrays.toString(output));
        assertEquals(Arrays.toString(expectedOutput), 
					Arrays.toString(output));
    }

    @Test
    public void mergeKSortedArraysWithUnequalSizeTest() {

        int[][] input  ={
                { 1, 2 }, { 5, 6, 7}, { 9, 10, 11, 12 }
        };

        int[] expectedOutput = {1,2,5,6,7,9,10,11,12};

        int [] output = tester.mergeKSortedArrays(input);

        System.out.println(Arrays.toString(output));
        assertEquals(Arrays.toString(expectedOutput),
			Arrays.toString(output));
    }

    @Test
    public void mergeKSortedArraysWithNullTest() {

        int [] output = tester.mergeKSortedArrays(null);

        assertEquals(null, output);
    }
}

Complexity of code to merge k sorted arrays is O(n * k * log k) along with space complexity of O(k).

Please share if there is something wrong or missing. If you are preparing for an interview, please sign up to receive interview preparation kit for free.

Heaps : Algorithms

Heap

Heap is a kind of data structures based on complete tree principle. By definition, a complete binary search tree of N levels has at least 2^N-1 nodes in it. 
There two properties heap holds :
1. Structural property : This means every level of heap will be completely full except the last level.
Last level will be filled from left to right order.
2. Heap property : It means parent node will be either greater or smaller than its children nodes.

There are two kinds of heaps based on second property:

Max Heap : It maintains the property that every parent node is greater than its children. Root of the max heap is greatest element.

Heap
Max Heap

Min Heap : It maintains the property that every parent node is less than its children. Root of the min heap is least element.

min heap
Min Heap

Usually, heaps are implemented with an array, which eases traversing from child to parent and parent to child; and viewed as tree. Height of a heap with N elements will be O(logN). 

If we want to find child of a parent node i, it would be 2i or 2i+1.

If we want to find parent of a node say i, it would be [i/2].

Given that multiply and divide by 2 can be very efficiently implemented by left and right shifting, these operations are very efficient.

Maximum elements in a heap with height h will be 2^h -1 while minimum number of elements in heap with same height will be 2^(h-1) +1.(One more than nodes with height h-1)

Watch video on heaps : Insertion and deletion in heaps

[youtube=http://www.youtube.com/watch?v=Fx_Bk7-zuxI&w=320&h=266]

Some procedures on heaps

1. Insertion in heap

Insertion in heap happens at the end of heap. Once node is added, heapify property is restored using heapify routines.

Let’s work out an example
Current state of heap is as following, all nodes are satisfying max heap property.

Current Max heap
Now node 10 is added to the heap. We need to check if new node violates max heap property. 
We need to check that parent of currently added node is greater than it. In this case, that does not hold.
So we will swap parent node with current node.
Max heap property violated

After swapping nodes as explained in last step, we should check for if new parent in this case node 14, satisfies heap property. If not, we again swap the largest child and repeat step till we are at root node.. In our case, node 14 satisfies max heap property so we stop there. 

restored heap property

Above mentioned steps are self explanatory and there is no need of algorithm. In above heapify method we are moving down the tree, in insert operation we will move up.

Code

Complexity of this procedure is for O(logN).

2. Deletion in heap

Deletion of node is usually done at root. For deletion of root, replace root node with the last node in heap and then do down shift. As new node replacing root may violate heap property, we need to check with its children. In Max heap, check if new node is greater than both its children. If not then swap it with largest child and then again repeat it till node replacing root finds its valid place.

Algorithm (This is for max heapify)
  1. Let i be the index which we doubt that might violate heap property.
  2. left = 2i, right = 2i+1, largest = a[i]
  3. Check if left child is with in array limits, if yes, check if it is greater than the a[i].
  4. If left is greater than a[i], then largest till now is left, largest <– left.
  5. Check if right is with in array limit, if yes, check if it is greater than largest, then change the largest, largest <– right.
  6. Swap largest and a[i].
  7. Now, repeat step 1 to 6 with largest, till we see an element which does not violate heap property.

 Let’s take an example:

Current heap


Now let’s say we delete node 14. 

First step is to replace it with last node. In this case 6 will replace 14.

Last node replacing root

Node 6 violates max heap property. So, find the largest child of 6, node 12 in this case.Swap node 6 and node 12.

Heap after swapping

Again repeat step with node 6 and node 7. Final heap will be like

Final heap

Code

3. Building a heap from a given array of elements.

This operation can easily be done using heapify methods explained above. 

Algorithm

  1. Start from the middle element of the array, let’s say i
  2. Heapify with given index.
  3. Decrease index by one. Repeat step 2 till we reach first element.

Code

Complexity of this procedure is O(N).How this complexity becomes O(N) while for adding each node will need log N time to heapify and if there are N nodes, it should be O(N log N).
Idea here that nodes at the lower most level do not move at all. Nodes at one level above lower most, move at most by one level and so on.
If height of heap is h, number of node will be 2^h. So 2^h/2 = 2^h-1 nodes never move.
2^h-2 nodes move by one level and so on.

4. Heapsort

Heap sort combines property of both insertion sort (no extra space required) and merge sort (time complexity being O(NlogN)). 
It can be easily achieved by using above two procedures.

Algorithm

  1. Build MAX heap from the given array.
  2. Swap first and last element of the array. Last element is now at its proper position.
  3. Decrease the effective size of heap to be heapify.
  4. Heapify with first element of the array.
  5. Repeat step 2 , 3 and 4 till we reach the second element of the array.

Code

Complexity of above procedure is O(NlogN).

Strings : Permutations and combinations of string

Permutations and combinations of string

Permutation is arrangement of objects in various positions where order of objects will matter i.e. ab is not same as ba.

Analysis

Let us start with an example  : S  = “ABC”  What are the possible permutation?
ABC      ACB     CAB    CBA     BCA     BCA    

Fix the first character and find permutation of N-1 characters.
We fix A, permutation of BC are BC and CB.
Now place A at all the places possible, in this case it will be at start, middle and at the end.
BC leads to ABC, BAC, BCA
CB leads to ACB, CAB, CBA.
It can be easily code with the help of recursion.
Take each character starting with 0, let say current character is Kth character in the string, we need to fit this character in all the substrings of length N-K (N is length of string).  
We first swap the Kth character with the a ith position where i starts from K and reaches to N (position in remaining substring). Before going for all position of Kth character, we will move forward and check for K+1 character for its all positions in N-K-1 length substring.
Base case would be when we reach at the end of string. Hence remaining substring would be of length zero. Return.

So when at penultimate character of string, we have already varied all the positions of last character. Similarly when at Kth character back, we would have adjusted all permutations of N-K characters following it.

If we see closely at code, we would have each character of string as each position of string at some point of time as we are swapping each character with every position when moving forward.
When there are duplicate characters in string, just sort the string and check if are processing duplicate character. If yes then just skip the character.

Code to print permutations of string

Similar logic can be applied to combination problem also where we have to first output the string with the character and then without it. Following code does that.

Code to print combinations of string

Complexity of algorithm to print permutations and combinations of string is O(N!) as we need to generate N! permutations.

Remove particular combination of characters

Remove particular combination of characters from string.

Given a string, we need to remove a particular combination of characters, for example, if s = “aacbbdccaac” and if combination is “ac” then our output will be “abbdcca”. If s = “aacaaccd” and if combination is “ac”, then output should be “acd”. Following will be the execution  aacaaccd ——-> aacd ——>ad 
Note that this has to be done in linear time and without any extra space.

Here there are two subparts of the problem: One, we need to keep track we have already encountered the first character in the combination of characters needed to be removed.
Second we need to keep track of the position where the next character should be placed in the string if it is not to be eliminated. First problem can be easily solved by maintaining a state machine, where we have two states and one moves from one state to another based on the input. In current example, we have states like state “INITIAL” and “MOVED”, Whenever we encountered “a”, we move from INITIAL to MOVED. Then if we get “c” in “MOVED” state we are sure that we have encountered the whole pattern and they need to be removed. If we are in “MOVED” state and if we don’t get “c”, we move back to “INITIAL” state.

For the second part, we can simply follow the approach we have used in this post, that is to maintain two pointers, one to point to the character to be processed, and other to point position where current character to be placed if it is not be eliminated.
The problem can be extended with multiple characters in pattern, with change in state machine accordingly.Same state machine method can be used to count number of words in a given string. Whenever we encounter the separator, we move to OUT state and as soon as we see first character in OUT state we move to IN state till we again see separator.

#define INITIAL 1
#define MOVED 2
char * remove_pattern(char *s, char * pattern){
    int state = INITIAL;
    if(s == NULL) return NULL;
    if(pattern == NULL) return s;

    char * source = s;
    char * destination  = s;
    while(*source != ''){

        /*If character is not equal to the first character of the pattern,
          copy it as such */
          if(state == INITIAL && *source != *pattern){
               *destination = *source;
               destination++;
          }
          else if( *source == *pattern && state == INITIAL){
              /* Move state to MOVED */
              state = MOVED;
          }else if(state == MOVED && *source != *(pattern + 1)){
                /* If next character is not as per pattern, 
                   copy previous character first. */
                *destination =  *pattern;
                destination++;
                /* If next character is first character of the pattern,
                   move state to INITIAL */
                   if(*source != *pattern){
                       *destination  = *source;
                       destination++;
                       state =  INITIAL;
                   }
             }
         /* If we hit the pattern, start again and move state to INITIAL */
          else if(state == MOVED && *source == *(pattern + 1)){
                state = INITIAL;
          }
          /* After moving the characters, check if they 
            don't make pattern again */
          if((int)(destination-s) >= 2
              && *(destination -2) == *pattern
              && *(destination-1) == *(pattern +1)){
              destination -=2;
          }
          source++;
    }
    *destination = '';
    return s;
}

Test cases
1. When pattern is present
s= “aaacacacbbb” pattern = “ac”

2. When pattern is not pattern
s =”aaaabdcaaa” pattern  = “ac”
3. When movement leads to pattern again
s= “abacaccdc” pattern =”ac”

4. When string is NULL or pattern is NULL s= NULL pattern = NULL
5. When return string is empty s = “acacacac” pattern =”ac”

6. Just first character in pattern s= “aaaaaaaaa” pattern =”ac”

The complexity of the above code is O(N) and no extra space used.

Strings : Remove duplicates from a string

A string is a collection of characters terminated by a special character ”. String can have duplicate characters in it. Today’s problem requires purging of those duplicate characters from string and return the string.

Problem statement 

Given a string S, remove all duplicate characters from it. 
For example,
S = “aaabbacbccd”, then output should be “abcd”. Note the output is not  “d”, that means we need to maintain once instance of the duplicate character. 

Analysis

There are two parts involved in this problem.
First, we need to keep track whether we have already processed a particular character.
Second we need to properly place the character in destination string.

For the first part hashing is the perfect technique. We can create a hash with key as character (total 256 characters) and set the value when we first encounter the character. Next time when we encounter the character, we would find the value set against that character and hence we can skip that character.

For second part we can have an auxiliary string to store the non-duplicate characters.
We can do it in-place too.
Keep two indexes, one which keeps track the character being processed in input string, other which points to the next place in the input string where the current character can be put if it is not processed already.

Code

Test cases

1. When input string contains duplicate characters
S  = “aaabaaabbbcccd”

2. When input string does not contain duplicate characters
S = “abcdefg”

3. Input string with no characters
S = “”

4 Input string pointer is NULL
S = NULL;

Execution of these test cases can be seen here.

Complexity Analysis 

Complexity of code is O(N) in time and constant memory for hash as it does not depend on the size of the input string.

Strings : Toeknize a string separated by delimiter

Continuing the previous post, let’s look at the second problem.

Problem statement

Find all tokens in a string which are separated by given delimiter.

Analysis

We have to traverse the string and for each character, check if that character is present in delimiter string. If it is, then it is end of the token started after previous encounter of the delimiter. Here forward we should scan all characters till the time we first encounter a character which is not a part of given delimiter string, that would be start of the next token.

Let’s look at it step by step.

Step 1 : Assume we have string as s and delimiter string as delim.
Search in s the first character which is not present in delim.
If there is no such byte, we are done, return NULL.
If there is such byte, then that would be start of our first token.

Step 2: Scan through the found token.
From the character found in step 1, scan through s till we again encounter a character present in delim.
If there is no such character, we are done, we have only one token and return the start position of that token.

If we have such character, that would be end of the token.

Great we have found one token, How about next one? 

For next token to be generated, we need to keep track where to start looking from in the string as we would have scanned a part of it in the previous tokens. So we keep track of the pointer where to start looking from.

Step 3 : Once we have found the end of token in step 2, again scan out all subsequent characters which are part of delim. Take the index of first non-delimiter character. This would be the starting point of our next token search.

Implementation note : To distinguish between whether it is first call to the function or subsequent call, we pass NULL pointer to the source string parameter indicating that it is subsequent call and not initial call.

Similar approach can be used to remove all characters which are present in a string from another string.

Code

Test cases

1. String and delimiter with one characters
S = “aaab_dddcc”
delim = “_”

2. String and delimiter with multiple characters
S = “aaab_dd?dcc”
delim = “_?”

3. Empty string
S = “”
delim = “_”

4. Empty delimiter
S = “aaab_dddcc”
delim = “”

5. String starting with delimiters
S = ___aaaaabbb
s = “_”

6. String ending with delimiters
S =”aaaa___”
s = “_”

7. String with no delimiter character
S = “aaaaaaaa”
s = “_”

8. String with all character as delimiter characters
S =”________”
s =”_”

Complexity analysis

Complexity of above code is O(N * M), where N is length of string and M is length of delimiter string.