Longest common substring

Longest Common Substring

Given two string A and B, find longest common substring in them. For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”. Below figure shows longest common substring.

longest common substring

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m).

Can we do better than that?

Longest common substring : Line of thoughts

We have to find longest common substring in strings of length M and length N. Can we find longest common substring till length M-1 and N-1 and then derive longest common substring for M and N?  Yes, we can find. The length either grows by one if last characters are equal or reset to zero if last characters are not equal. Why so?

First see why we need to reset to zero when characters are different. This because we are looking for common substring which means characters should be consecutive, any different character restart the the entire search because with those two  different characters, there can’t be any common substring.

What if characters are same? In that case we increment by one, because, longest common substring in N-1 and M-1 would be either 0 or some number based on how any consecutive common characters were till N-1 and M-1.

What will be longest common substring when one of the strings is empty? It will be zero.

So, do you see recursion here? So, let’s write recursion relation and then implement it.

LCS(i,j) = 1+LCS(i-1, j-1) if S[i] = T[j] 
         =  0 otherwise

This recursion relation has optimal subproblem property that solution to the problem actually depends on solutions to subproblems. Also, there are subproblems which will be calculated again and again, which is called overlapping subproblems. These two properties are required for dynamic programming. To not to calculate subproblems, we will use memoization, for that  create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j]. And since solution for i-1 and and j-1 is required before solution of i and j, this matrix will be filled bottom up.

Longest common substring using dynamic programming

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 
   1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
( Because new character is added to already common substring, 
     if any, till A[0...i-1] and B[0,,j-1])
   1.2 if both characters are not same, LCS[i][j] = 0,
       ( Because if characters are not same, there cannot be any
         common substring including A[i] and B[j].

Implementation

#include <stdio.h>
#include <string.h>

int max(int a, int b){
	return a>b ? a:b;
}
 int longestCommonSubstring(char * A, char * B){
     int lenA = strlen(A);
     int lenB = strlen(B);
     int LCS[lenA+1][lenB+1];

     for (int i=0; i<= lenA; i++){
         LCS[i][0] = 0;
     }

     for (int j=0; j <= lenB; j++){
         LCS[0][j] = 0;
     }
	
     int maxLength = 0;
     for (int i=1; i<= lenA; i++){
        for (int j=1; j <= lenB; j++){
            if (A[i] == B[j]){
                LCS[i][j] = 1 + LCS[i-1][j-1];		
                maxLength = max( maxLength, LCS[i][j] );
            } 
            else {
               LCS[i][j] = 0;
            }
         }
     }
     return maxLength;
}

int main(void) {
    char *a = "ABCDEFGSE";
    char *b = "EBCDEFGV";
	
    printf("\n Longest common substring : %d",
			longestCommonSubstring(a,b));
    return 0;
}
package com.company;

/**
 * Created by sangar on 5.1.18.
 */
public class LCS {

    public  static int longestCommonSubstring(String A, String B){
        int lenA = A.length();
        int lenB = B.length();

        int [][] LCS = new int[lenA][lenB];

        for (int i=0; i<lenA; i++){
            LCS[i][0] = 0;
        }

        for (int j=0; j<lenB; j++){
            LCS[0][j] = 0;
        }

        int maxLength = 0;
        for (int i=1; i<lenA; i++){
            for (int j=1; j<lenB; j++){
                if (A.charAt(i) == B.charAt(j)){
                    LCS[i][j] = 1 + LCS[i-1][j-1];
                    maxLength = Integer.max(maxLength, LCS[i][j]);
                }
                else {
                    LCS[i][j] = 0;
                }
            }
        }

        for (int i=0; i<lenA; i++){
            System.out.println();
            for (int j=0; j<lenB; j++){
                System.out.print(" " + LCS[i][j]);
            }
        }
        return maxLength;
    }

    public static void main(String[] args) {
	    String a = "ABCDEFGS";
	    String b = "EBCDEFG";

        System.out.println("Longest common substring :" +
                longestCommonSubstring(a,b));
    }
}

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

longest common substring dynamic programming

In next post, we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Please share if you find something wrong or missing. If you want to contribute to site, please refer contact us. We would be happy to publish your work and in turn will pay you too.

Interleaved string

Interleaved string

Given string A,B and C, find if string C is interleaved string of A and B.

C is said to be interleaved if it contains all characters of A and B and order of characters in respective string is maintained. For example string C in figure is interleaved string of A and B

Interleaved string : Line of thoughts

Consider length of C as length of A + length of B. If it is not true return false (why?). Let’s say we start with first character of C, A and B. If they match, move to second character of C and A. Keep B at first character.
If above condition is not true, check if first character of C and B match, then move to second character C and B. A remains at first character.
Once done, again do above steps with new first characters of strings, while character in C matches character in A or B.
If both above conditions are false, return false.

Now conditions problem reduces to smaller problem with C with length N-1, one of A or B with M-1.
From above description, we can figure out that recursion can be used to solve this problem.

isInterleaved(A,B,C)  = isInterleaved(A+1, B, C+1) If character of C matches with character of A
|| isInterleaved(A, B+1,C+1) If character of C matches with character of B

What shall be the base case?
If we reach at the end of C, we have considered all characters, we can return true if all characters in other two strings are considered. If not returned false.

Recursive implementation of interleaved string of two strings

#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 

int isInterleaved(char *c, char *a, char *b){

    if(!(*c) && !(*a) && !(*b))
        return true;

    if(*c == '\0'){
        return false;
    }
	 // if character of a and c match
    return ((*c == *a) && isInterleaved(c+1,a+1,b)) || 
    		// if character of b and c match
            ((*c == *b) && isInterleaved(c+1,a,b+1)); 
}

int main(void) {
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Iterative implementation

#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 

int isInterleaved(char *c, char *a, char *b){

    while(*c != '\0'){
    	if(*c == *a){
            a++;
        }
        else if(*c == *b){
            b++;
        }
        else{
            return false;
        }
        c++;
    }
    if(*a != '\0' || *b != '\0')
        return false;

    return true;
}

int main(void) {
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Iterative implementation will not work with input where there are common characters in string A and B, for example A = XXY and B = XXZ and if C = XXZXXY will return false

Complexity of above code will be linear O(N) N being length of string C, where as complexity of recursive solution will b O(2N) but it does not fail in above mentioned case.

Dynamic programming approach for interleaved string

If we look closely, we can see that there are many sub problems which are being calculated again and again. Let’s look at recursion tree for input A = XXY and B = XXZ and C = XXZXXY
Interleaved strings

We get the idea that we need to store result of smaller sun problems, so that we do not calculate it again and again.

We create a two dimensional table. Table(i,j) = true only if C[i+j-1] is interleaved string if A[i] and B[j].
Empty string is interleaved of two other strings so,
Table[0][0] = true

If one of the strings was empty:
Table(i,0) = A[i] == C[i] && Table(i-1, 0) that is to say if till i-1 characters C was interleaved of A, then for ith character it will be true if ith character matches ith character of A. Note that B is null here
Again if string A is empty, then Table(0,j) = Table(0, j-1) . With same argument above.

With these base cases, we can fill table bottom up as follows

Table(i,j) = Table(i-1,j)  if (A[i] == C[i+j]) && (B[j] != C[i+j])
Table(i,j) = Table(i,j-1) (B[i] == C[i+j]) && (A[i] != C[i+j])

Table(i,j) = Table(i-1,j) || Table(i, j-1) if (A[i] == C[i+j]) && (B[j] == C[i+j])
#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 
int isInterleaved(char *c, char *a, char *b){

    int lenA = strlen(a);
    int lenB = strlen(b);
    int i,j;

    int Table[lenA+1][lenB+1];
    // Initialization
	for(i=0; i<=lenA; i++){
        for(j=0; j<=lenB; j++){
        	Table[i][j] = false;
        }
	}
    for(i=0; i<=lenA; i++){
        for(j=0; j<=lenB; j++){
        	// Both strings are empty
            if(i==0 && j==0)
                Table[i][j] = true;
    		// string A is empty, compare characters in C and B
            if(i==0 && c[j-1] == b[j-1]){
                Table[i][j] =  Table[i][j-1];
            }
            // string B is empty, compare characters in C and A
	        else if(j==0 && c[i-1] == a[i-1]){
                Table[i][j] =  Table[i-1][j];
            }
            // Both strings are not empty
            //1. If character of A matches with character of C
            // but not of B
            else if (a[i-1] == c[i+j-1] && b[j-1] != c[i+j-1]){
                Table[i][j] = Table[i-1][j];
            }
            //2. If character of B matches with character of C
            // but not of A
            else if (a[i-1] != c[i+j-1] && b[j-1] == c[i+j-1]){
                Table[i][j] = Table[i][j-1];
            }
            //1. If character of A matches with character of C
            // and charactetr of B also matches with C
            else if (a[i-1] == c[i+j-1] && b[j-1] == c[i+j-1]){
                Table[i][j] = Table[i-1][j] || Table[i][j-1];
            }
        }
    }
    return Table[lenA][lenB];
}

int main(void) {
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Complexity of above code will be O(N2).

Please share if there is something is wrong or missing. If you want to contribute to website, please write to us on communications@algorithmsandme.com

Largest sum contiguous subarray

Largest sum subarray (Kadane’s algorithm)

Given an array of integers (positive and negative), find largest sum subarray, that is contiguous elements in array, which add up to maximum sum. This problem is solved using Kadane’s algorithm. For example, for array {-1, 3, -5, 4, 6, -1, 2, -7, 13, -3}, largest sum subarray will be {4,6,-1,2,-7,13} with sum = 17.

What will be the brute force solution, without considering any time complexity? Well, scan all the subarrays of array and take the one which has the maximum sum. How many such subarrays can be there? If size of array is n, n * (n -1 ) / 2 subarrays, hence complexity of brute solution will O(n2)

package com.company;

/**
	* Created by sangar on 20.8.18.
	*/
public class KadaneAlgorithm {
	public static int largestSumSubarray (int[] a){
		int maxSum = Integer.MIN_VALUE;

		for(int i=0; i < a.length; i++){
			int currentSum = 0;
			for(int j=i; j < a.length; j++){
				currentSum+= a[j];
				if(maxSum < currentSum){
					maxSum = currentSum;
				}
			}
		}
		return maxSum;
	}
	public static void main(String args[]) {
		int[] a = {-1, 3, -5, 4, 6, -1, 2, -7, 13, -3};
		System.out.println(largestSumSubarray(a));
	}
}

Maximum sum subarray with dynamic programming

Dynamic programming builds solutions from the bottom up by breaking each problem down into smaller, problems that you solve first. Recursion also breaks problems down into subproblems but does this from the top down. One advantage of dynamic programming over recursion is that it prevents possible duplicate solutions of the same subproblem, which uses less CPUs and generally makes your code run faster.

Code Chef Wiki

How can we fill create the solution bottom up? Consider a case where there is only one element in the array. What will be the largest sum subarray? Haha, it the element itself. 
Assume now, that there is one more element in the same array, what will be the condition that you will include the second element into largest sum subarray? Obviously, if adding the second element increase the previous sum.

Also, there is a possibility that the second element itself is the largest sum subarray or it starts from the second element. If the first number was negative and the second one is positive, the second number itself is bigger than the sum, no matter what. So, the second element becomes the current sum.

Rule is that, at each element j, take the maximum of sum of the current element and accumulated sum till previous index and current element.

current_sum  = max ( current_sum + A[j], A[j])

Now, you have the current sum at element A[j], see if it is greater than global maximum till now? If yes, replace global maximum with current sum.

if(maxSum < currentSum )
	maxSum = currentSum;

Let’s see how this algorithm works with an example. we have an array as [-1, 4, 2, -1]

  1. Set a maxSum and currentSum, both equal to first element of array to start
  2. Now, look at the next element which is 4, so subarray under consideration is [-1, 4], the maximum is either current element or is the sum of the current element plus the previous maximum (-1 + 4 = 3)
  3. Since 4 > 3, the currentSum is 4, and since 4 > -1, we can update the maxSum as well
  4. With the value of the next index (2), the maximum is either 2 or is the sum of the current element plus the previous maximum (4 + 2 = 6)
  5. Next element is -1, which is less than currentSum, so currentSum remains as it is, so is maxSum.

Hope this example helps to understand the beauty of this Kadane’s algorithm to find largest sum subarray in an array.

Kadane’s algorithm does not work with an array with all elements being negative. What we can do is that scan all elements of array prior to application of algorithm and if there is at least one positive number. Also during this phase keep track of the largest number seen. If all elements are negative, just return largest number.

Largest sum subarray : Kadane’s algorithm implementation

package com.company;

/**
	* Created by sangar on 20.8.18.
*/
public class KadaneAlgorithm {
	public static int kadaneAlgorithm (int[] a){
		int maxSum = a[0];
		int currentSum = a[0];
		
		for(int i=1; i<a.length; i++) {
			currentSum = Integer.max(a[i], currentSum + a[i]);
			if (maxSum < currentSum) {
				maxSum = currentSum;
			}
		}
		return maxSum;
	}
	public static void main(String args[]) {
		int[] a = {-1, 3, -5, 4, 6, -1, 2, -7, 13, -3};
		System.out.println(kadaneAlgorithm(a));
	}
}

Complexity of finding largest sum subarray in an array is O(N) in time and O(1) in space.

Please share if there is something wrong or missing. If you are interested in contributing to website, please reach out to us at communications@algorithmsandme.com

References:

  • https://www.codechef.com/wiki/tutorial-dynamic-programming
  • https://www.hackerrank.com/challenges/maxsubarray/problem

Minimum edit distance between two strings

Minimum edit distance between two strings

Minimum edit distance between two strings is minimum number of operations one need to perform on one string so that it transforms into another. Operation allowed are insertion of character, deletion of character and substitution of a character. For example, String S1  = EXPONENTIAL String S2 = POLYNOMIAL

minimum edit distance between two strings

From above example we can see we have to find the best possible alignment of two strings. However, there are so many alignments possible with two string, it will be very costly for consider each and every alignment and look for the best.

Can we break the problem in smaller and easy to solve subproblems? Problem at hand is to find minimum edit distance between X[1…n] and Y[1…m] strings, where n and m are lengths of two strings. Consider prefix of each string X[1…i] and Y[1…j], let’s find edit distance for these prefixes and lets call it Edit(i,j). Finally we need to calculate Edit(n,m).   At each character we have three choices, Let’s consider each case one by one:

If character X[i]  != Y[j], we make the choice delete character X[i], which costs us 1. Now we have i-1 characters in X and j characters in Y to consider which is nothing but Edit(i-1,j).

Second choice we have is to add a character from X, which costs 1.  In this, we have an extra character but have not processed any of the original characters in X, however, character Y[j] now matches with new character inserted, so no need to include that. Problem reduces to Edit(i, j-1).

Third choice is to replace the character X[i] with Y[j]. Cost of replacing character is 1 if X[i] != X[j], however, if X[i] == Y[j], cost is 0. In any case, problem reduces to Edit(i-1, j-1).

We do not know which one to pick to start with, so we will try all of them on each character and pick the one which gives us the minimum value. Original  problem can be defined in terms of subproblems as follows:

Edit(i,j) = min ( 1 + Edit(i,j-1), 
1 + Edit(i-1,j),
Edit(i-1, j-1) if X[i] == Y[j]
1 + Edit(i-1, j-1) if X[i] != Y[j]
)

What will be the base case? If both strings are of length zero, cost will be 0.
If one string is of length 0, then cost will be length of other string.

Recursive implementation of edit distance problem

#include&amp;lt;stdio.h&amp;lt;
#include&amp;lt;string.h&amp;lt;

int min(int a, int b) {
	int min = a &amp;gt; b ? b : a;
	return min;
}

int editDistance(char *s1, char *s2, int length1, int length2){
	
	printf("\nlength1 = %d, length2 = %d" , length1, length2);
	
	if(length1 == 0 &amp;&amp; length2 == 0) return 0;
	
	if(length1 == 0) return length2;
	
	if(length2 == 0) return length1;
	
	int isCharacterEqual = s1[length1] == s2[length2] ? 0 : 1;
	return min( min(( 1 + editDistance(s1,s2, length1-1, length2)),
					( 1 + editDistance(s1,s2,length1, length2-1))
				),
				(isCharacterEqual + editDistance(s1,s2, length1-1,
				length2-1)
	);
}
int main(){
	char *s = "EXPONENTIAL";
	char *d = "POLYNOMIAL";
	printf("Minimum distance between two strings is : %d",
		editDistance(s,d, strlen(s), strlen(d)));
	
	return 0;
}

If we look at the execution trail, it is evident that we are solving same subproblems again and again.

execution trail of recursive solution to find minimum edit distance

Now, we know two things. First optimal solution to original problem depends on optimal solution of subproblems (see recursive relation above). Second, there are overlapping subproblems, which are recalculated again and again. How can we avoid solving same problem again? Well, store it for later use. That concept is called Memoization and used in dynamic programming.

To implement above formula in dynamic programming, a two dimensional table is required where Table(i,j) stores Edit(i,j) and every cell can be calculated with bottom up approach. At the end Table(n,m) gives the final minimum edit distance. Does not matter, if we fill table row wise or column wise, when we reach at cell (i,j), we will have all the required cells already filled in. To start with Table[0,i]  = i and Table[j,0] = j.Why? Look at the base case for recursive relation.

minimum edit distance between two strings using dynamic programming

Edit distance between two strings : Dynamic programming implementation

int editDistance(char *s1, char *s2){
	int n = strlen(s1);
	int m = strlen(s2);

	int minimumDistance = 0;
	int currentMinimum  = 0;
	int Table[n+1][m+1] ;

	memset(Table,0, sizeof(Table));
	
	//Intitialization
	for(int i=0; i&amp;le;n; i++)
		Table[i][0] =i;

	for(int i=1; i&amp;le;m; i++)
		Table[0][i] = i;

	for(int i=1; i&amp;le;n; i++){
		for(int j=1; j&amp;le;m; j++){
			//Case 3 : Possibility 1 :If X[i] == Y[j] 
			if(s1[i-1] == s2[j-1]){
				currentMinimum = Table[i-1][j-1];
			}
			//Case 3 : Possibility 2 :If X[i] != Y[j] 
			else{
				currentMinimum =  Table[i-1][j-1] + 1;
			}
			//Case 1 : Deletion of character from S1 
			if(Table[i][j-1] &amp;gt; Table[i-1][j]){
				minimumDistance = Table[i-1][j] + 1;
			}
			//Case 2 : Addition of character on S1 
			else {
				minimumDistance = Table[i][j-1] + 1;
			}
			if(currentMinimum &amp;lt; minimumDistance){
				minimumDistance = currentMinimum;
			}
			Table[i][j] = minimumDistance;
		}
	}
	return Table[n-1][m-1];
}

Complexity of algorithm to find minimum edit distance between two strings is O(n2) with extra space complexity of O(n2).

Please share if there is something wrong or missing. If you are interested in contributing to website, please reach out to us on communications@algorithmsandme.com

Coin change problem

Coin change problem

Given a number S and coins of values V = {V1,V2,V3, V4}. Find number of ways change can be made for S using these coins.We have infinite supply of these coins. Commonly, this problem is known as coin change problem.

For example, for N = 4 and S = {1,2,3}, there are four solutions: {1,1,1,1},{1,1,2},{2,2},{1,3}. So output should be 4.

coin-change-problem

Coin change problem : Line of thought

As always, let’s start with brute force solution. We have two choices for a coin : Either we can include the coin in solution, or we can not include it in solution. If coin m is included in solution, problem reduces to find change for value N-V(m) using K coins. Notice that we can again use all the coins for reduced problem too.
If coin is not included in solution when value to changed is less than the denomination of coin. At this time, there is reduction in value to be changed, but we lose one coin, hence problem stands : Change value V with N-1 coins.
If we look at it, it is simple recursive formulation.

C(N,m) = C(N,m-1) + C(N- V(m), m)

coin change problem

When do we stop then? When do we know that there is no more solution going forward. If change is required for value zero, there is only one solution possible, which is to include no coin.

C(N,m) = 1 if N == 0

Also, what is solution if we do not have any coins with us at all and there is value to be changed. There is no solution then.

C(N,m) =0 if N > 0 and m < 0

Also, if value required is less than zero, then there is again no solution, given that all coins have positive values.

C(N,m) = 0 if N < 0

We see that problem reduces to smaller problems and then there are point when we can stop going forward. That’s a perfect condition for recursive solution.

Coin change problem : recursive implementation

package com.company;

/**
 * Created by sangar on 5.5.18.
 */
public class CoinChange {

    public static int coinChange(int value, int[] coins, int consideredCoin){
        /* If value to be changed is negative, since coins are positives,
        there is no way to change them
         */
        if(value < 0) return 0;

        /*When there may or may not be coins, and no value is
        required, there is one solution which is not to include any
        coin */
        if(value == 0) return 1;

        /* When there is value required by no coins left,
           there is not solution
         */
        if(value > 0 && consideredCoin == coins.length) return 0;
                // When coin is included
        return coinChange(value-coins[consideredCoin],coins, consideredCoin)
               //When coin is not included
            + coinChange(value,coins, consideredCoin + 1);
    }

    public  static void main(String[] args){
        int value = 4;
        int[] coins = {1,2,3};

        int count = coinChange(value, coins, 0);
        System.out.println(count);
    }
}

Complexity of recursive implementation of coin change solution is exponential O(2m) where m is number of coins.
For every coin we have 2 options, either we include it or exclude it so if we think in terms of binary, its 0(exclude) or 1(include). so for example if we have 2 coins, options will be [ 00, 01, 10, 11 ]. so its 22. For m coins , it will be 2m. In all these options we will be checking whether that selection has made the change which is required
Reason for this exponential complexity is that we are calculating smaller subproblems again and again.

coin change problem recursive tree

In discussion so far, we see two properties of the solution : First, there is optimal substructure, that is optimal solution to subproblems provides optimal solution to bigger problems. This is known as optimal substructure property. Second, subproblems are overlapping. These two condition as necessary for applying dynamic programming approach.
To avoid calculating same subproblem again and again, that we can avoid using simple memoization. Let’s create a two dimensional table Coins. Coins(i, j) represents the number of ways in which change for i can be made using j coins.
Now if jth coin is included, then numbers of ways will be Coins(i- v[j], j-1)
If jth coin is not included, number of ways will be  Coins (i, j-1). Adding both of them will give us

Coins(i,j) = Coins(i-v[j]) + Coins(i, j-1).

For all i = 0 Cost(i,j) = 1, for all j = 0 Cost(i,j) will 1 for all i > 0.
We can start bottom up and fill the table as per the formula above. Once we have filled the entire table, Coins(N,m) will be answer to the problem.

package com.company;

/**
 * Created by sangar on 5.5.18.
 */
public class CoinChange {

    public static int coinChangeDP(int value, int[] coins){

        int[][] numWays = new int[value+1][coins.length+1];

        for(int i = 0; i <= coins.length; i++ ){
            numWays[0][i] = 1;
        }

        for(int i = 1; i <= value; i++ ){
            numWays[i][0] = 0;
        }

        for(int i = 1; i <= value; i++){
            for(int j = 1; j <= coins.length; j++){
                numWays[i][j] = (i -coins[j-1] >= 0 ? numWays[i-coins[j-1]][j] : 0 )
                                 + numWays[i][j-1];
            }
        }

        return numWays[value][coins.length];
    }

    public  static void main(String[] args){
        int value = 4;
        int[] coins = {1,2,3};

        int count = coinChangeDP(value, coins);

        System.out.println(count);
    }
}

Example : for Value = 4 and coins {1,2,3}, below is the bottom up filling of table.

Complexity of dynamic programming implementation of coin change problem is O(Nm) where N is value and m is number of coins. Additional O(Nm) space is also required for memoization.

Please share if there is something missing or wrong. If you want to share your knowledge with thousands of learners across the world, please reach out to us on communications@algorithmsandme.com

Number of binary search trees with N nodes

Number of binary search trees with n nodes

Given a number N, calculate number of binary search trees with n nodes those can be formed using number 1 to N as nodes. For example, with N = 3, there are 5 different trees which can be created as shown below.

number of binary search trees with n nodes

To solve the problem, let’s reduce the problem. What if there is only one node i.e N = 1.  There is only one tree possible with given node as root node and no children.

number of binary search trees with n nodes

How about if N = 2, that is we have two nodes (1 and 2).  There are two binary search trees possible

number of binary search trees with n nodes

Now let’s take N =3. Five BST are possible as shown above in figure in example.

One observation is that every node becomes root at least one time. And when a number becomes root, all elements greater than the node form right subtree and all numbers less than root, for left subtree.

For every node i as root, all nodes on its left side  (from 1 to i-1 ) can form left subtree. Root of left subtree will be all numbers from 1 to i-1.
For node i  as root, all nodes on right side from i+1 to N will form right subtree. Root of right subtree of node(i) will be all numbers between i+1 to N.

Calculate number of left subtrees possible with given node i , lets call it l. Then calculate number of right subtrees possible with i as root node and call it r. Now for each left subtree, there are r right subtrees with given root node. So total number of trees which can be formed using given node as root will be (l * r)

Consider each node as root node and calculate number of trees possible. Add them together and we come to our solution.

Number of binary search trees with n nodes : Implementation

package com.company.BST;

/**
 * Created by sangar on 11.5.18.
 */
public class NumberOfBST {

    public static int numberOfTrees(int n){
        if(n <= 1) return 1;

        int sum = 0;
        int left = 0, right = 0;

        for(int i=1; i<=n; i++){
            left = numberOfTrees(i-1);
            right = numberOfTrees(n-i);
            sum +=  (left * right);
        }
        return sum;
    }

    public static void main (String[] args){
        System.out.print("Number of tress possible :" + numberOfTrees(3));
    }
}

Simple way to check if answer for program is correct or not, calculate a number called as Catalan number. Here we need to calculate Nth Catalan number. Mathematically, Catalan number can be represented as

Look at the execution of above implementation, see that some of subproblems are solved again and again.

number of bst with n nodes

We know that optimal solution to subproblems gives optimal solution to the original problem. Also, there are overlapping subproblems which are solved multiple times. These two conditions are the perfect for thinking in dynamic programming.  To avoid solving same problem multiple times, we use technique called memoization, where we will store solutions to subproblems which are already solved.

Let’s say T[i] represents number of binary search trees with i as root node.
What will be T[0] ? As there is one tree possible, empty tree with zero nodes, T[0] = 1. Again what about T[1]? We already saw that only one BST is possible with one node. T[1] = T[0] = 1

T[i] = Sum ( T[j] * T[i-j-1]) for j = o to i

package com.company.BST;

/**
 * Created by sangar on 11.5.18.
 */
public class NumberOfBST {

    public static int numberOfTrees(int n){
        if(n <= 1) return 1;

        int sum = 0;
        int left = 0, right = 0;

        for(int i=1; i<=n; i++){
            left = numberOfTrees(i-1);
            right = numberOfTrees(n-i);
            sum +=  (left * right);
        }
        return sum;
    }

    public static int numberOfTreesDP(int n){
        if(n <= 1) return 1;

        int[] T = new int[n+1];
        T[1] = T[0] = 1;

        for(int i=2; i<=n; i++){
            int sum = 0;
            for(int j=0; j<i; j++){
                sum += T[j] * T[i-j-1];
            }
            T[i] = sum;
        }
        return T[n];
    }

    public static void main (String[] args){
        System.out.print("Number of tress possible :" + numberOfTreesDP(4));
    }
}

Complexity of dynamic programming approach is O(n2) along with addition space complexity of O(n)

For more dynamic programming problem please refer : What is DP?

Please share if there is something missing or wrong. If you want to contribute and share your knowledge with thousands of learners across world, please reach out to us on communications@algorithmsandme.com

Longest Arithmetic Progression

Longest Arithmetic Progression

Given a set of integers in sorted order, find length of longest arithmetic progression in that set.

Arithmetic progression is set of numbers in which difference between two consecutive numbers is constant. Mathematical formula for arithmetic progression is

Tn = a + (n – 1) d where a is first element, T(n) is nth element and d is constant.

1,2,3 is AP with d = 1
3,7,11,15 is AP with d = 4

Let’s define longest arithmetic progression problem in detail first.  Problem statement is to find longest sequence of indices, 0 < i1 < i2 < … < ik < n such that sequence A[i1], A[i2], …, A[ik] is an arithmetic progression.

Longest Arithmetic Progression  thoughts

What will be the brute force solution? In any arithmetic progression,  difference between any two consecutive elements should be same as the difference between first and second element. We can pick each pair of numbers from set as first two elements in AP, then scan the remaining array to find all numbers which satisfy the condition.

There are n(n-1) pairs for a set of n elements, for each pair, we linearly scan the array for more elements in AP. Overall complexity of brute force algorithm to find length of longest arithmetic progression would be O(n3).

Can we do better the cubic complexity? Let’s understand a more simpler problem first. Given three numbers, what is most efficient way to find if they form an arithmetic progression?

A[i], A[j] and A[k] form an AP if 2* A[j] = A[i] + A[k] where k>j and i<j.

For example, 1,2,3 are AP as 2*2 = 1 + 3. Also, 7,11,15 is AP as 2*11 = 15 +7.

How can we use this information to find if there is an arithmetic progression with 3 numbers in a set of integer?  This is very similar problem to find pair of numbers in sorted array which sum up to X. We have to find i and k such that A[i] + A[k] = 2*A[j], where 1<j<n-1.

  • For each 0<j<n-1:
  • Initialize i as j-1 and k as j+1
    • If A[i] + A[k] is equal to 2*A[j], then we are done.
    • If A[i] + A[k] > 2*A[j], then decrease i by 1.
    • Else if A[i] + A[k] < 2*A[j], then increment k by 1.

This will give answer to question if there exist three numbers in set which form AP.

If set contains more than two or more elements, minimum length of longest AP will be 2. Why?  Any number will always form AP of length 2 with last element of set. Can we combine all this to come up with the solution for original problem?

Let’s say L[i][j] store the length of longest arithmetic progression with A[i] and A[j] as first two elements of AP where i < j. If j == n, then L[i][j] = 2, that means bottom most column in matrix will be all 2. Why?

Now, if we fix j, we find i and k such that A[i], A[j] and A[k] form AP, then

L[i][j] = 1 + L[j][k].

This recurrence relation means that we must have L[j][k] before L[i][j]. As per relationship, i<j<k, hence the table L will be filled from bottom right to top left.

Algorithm to find length of longest arithmetic progression

  1. For j = n L[i][j] = 2 for 0<i<n, bottom most column filled with 2.
  2. Fix j = n-1 to 1 and for each j do below steps
    • Find all i and k such that A[i], A[j] and A[k] form AP. Algorithm given above.
      • Fill L[i][j] = 1 + L[j][k]
      • Check if L[i][j] is longer than current max length, if yes, update it.
    • Slight change for optimization, if A[i] + A[k] is greater than 2*A[j], we can safely fill L[i][j] as 2
    • While i > 0 even after k > n, fill all L[i][j] =2.
#include<stdlib.h>
#include<stdio.h>

#define max(a,b) (a>b) ? a:b

int longestArithmeticProgression(int a[], int n){
	int i,j,k;
	int Table[n][n];
	int longestAP = 2;
	
	for(i=0;i<n; i++)
		Table[i][n-1] =2;
		
	for(j= n-2; j>=1; j-- ){
		i = j-1;
		k = j+1;
		
		while(i>=0 && k<n){
			if(2* a[j] > a[i] + a[k]){
				k++; // Table[j][k]is already filled 
			}
			else if (2* a[j] < a[i] + a[k]){
             /*Table[i][j] needs to be filled before we move up */
             	Table[i][j] =2; 
             	i--;
            }
            else{
            	Table[i][j] = Table[j][k] +1;
             	longestAP = max(longestAP, Table[i][j]);
             	i--;
             	k++;
            }
        }
        while(i>=0){
        	Table[i][j] =2; 
        	i--;
        }
    }
    return longestAP;
}

int main(){
	int array[] = {1,7,10,13,16,19};
	int n = sizeof(array)/sizeof(array[0]);
	printf("Lenght of longest arithemetic progration is : %d",
	          longestArithmeticProgression(array,n));
     return 0;
}

Complexity of dynamic programming approach to find length of longest arithmetic progression is O(n2) with additional space complexity of O(n2).

Reference
http://www.cs.uiuc.edu/~jeffe/pubs/pdf/arith.pdf

Please share if there is something wrong or missing. Reach out to us at communications@algorithmsandme.com if you are interested in taking personalized coaching sessions.

Count all possible paths in maze

Count all possible paths in maze

Count all possible path in maze is one more grid problem which is asked in Amazon and Microsoft interviews and can be solved using dynamic programming. Before going into details of solution, let’s understand the problem. Problem statement is find total number of paths in a given maze or grid to reach at right most bottom cell from leftmost top cell. You can move right, down and diagonally and not left. For example, one of the path to reach bottom most cell is shown below

all possible paths in grid
One possible path to reach to destination

All possible paths in grid : Recursive approach

Typical property of a maze problem is that it reduces to a smaller problem as soon as we make one move. Another problem which uses the ame concept is Minimum cost path in grid . In any problem, once move is made, solve for smaller subproblem, in this case, try to figure out how many paths are possible from that new cell to destination cell. Also once we have decided to move in a particular direction (let’s say right) from a cell, that does not mean we need not count paths possible by going to other directions (down and diagonal). Hence for each cell, we have to count possible path if  we move right, possible paths if we move down and possible paths if we move diagonally and add them up.

Since, problem reduces to smaller problem with each move, think of applying recursion. What will be the base case for recursion?
Base will be when we reach at rightmost bottom cell. If we take i and j as row and column of maze, base case would be

(i == m && j ==n) return 1

Recursion formulation for maze problem would be

    count(i,j) =  count(i-1, j) + count(i, j-1) + count(i-1, j-1)

Recursive implementation would be

#include <stdio.h>

int PossiblePaths(int i,int j, int m, int n){
	if(i > m || j > n) return 0; 
	
	if(i == m && j == n) return 1;
	
	return PossiblePaths(i+1,j, m,n) 
			+ PossiblePaths(i, j+1, m,n) 
			+ PossiblePaths(i+1, j+1,m,n);
}

int main(void) {
	
	int m = 4;
	int n = 4;
	printf("\n Number of paths in maze : %d",PossiblePaths(0,0,m,n) );
	return 0;
}

Let’s see how the execution happens. We take a 3×3 maze so m and n equal to 3. To start with we take i and j  equal to 0.

all possible paths in a grid

Execution of above recursive implementation. From above figure, see that there are some subproblems which are calculated again and again. These will increase as we go down the tree.
Two basic conditions those should satisfy before we apply dynamic programming:

1. There should be optimal subproblem, which reduce original problem to smaller one and solving smaller problems lead to solution of bigger problem.
2. There should be overlapping subproblems which asks for tabulation of the results from subproblems to be used for solution further.

All possible paths in grid : DP implementation

To store solutions of subproblems, we would use two dimensional table.
Each cell Table(i,j) will store number of paths which are possible to reach cell(i,j). Our answer will be table(m,n).
Cell (i,j) can be reached at by either coming from (i-1,j) (Moving down) or by coming from cell (i,j-1) (Moving right) or by coming from cell (i-1, j-1) (Moving diagonally).

So our Table(i,j) can be calculated as Table(i,j) = Table(i-1,j) + Table(i,j-1) + table(i-1,j-1)
Also, to reach any cell in first column is 1 and cell in first row is also 1. Hence, Table(i,0) = Table(0,j) = 1

#include<stdlib.h>
#include<stdio.h>

int PossiblePaths(int m,int n){
	int Table[m+1][n+1];
	int i,j;
	
	for(i=0;i<=m; i++){
		Table[i][0] =1;
	}
	for(i=0;i<=n; i++){
		Table[0][i] =1;
	}
	for(i=1; i<=m; i++ ){
		for(j=1; j<=n; j++){
			Table[i][j] = Table[i-1][j] + Table[i][j-1] + Table[i-1][j-1];
		}
	}
	return Table[m][n];
}

int main(){
   printf("%d",PossiblePaths(4,4));
   return 0;
}

Space optimized version (Thanks to Jakube).

#include<stdlib.h>
#include<stdio.h>
 
int PossiblePaths(int m,int n){
	int Table[n+1];
	int diagonalSum = 0;
 
	for(int i=0;i<=n; i++){
		Table[i] = 1;
	}
	for(int i=1; i<=m; i++ ){
		int diagonalSum = 1;
		for(int j=1; j<=n; j++){
			int temp = Table[j];
			Table[j] = Table[j] +  Table[j-1] +  diagonalSum;
			diagonalSum = temp;
		}
	}
	return Table[n];
}
 
int main(){
   printf("%d",PossiblePaths(4,4));
   return 0;
}

Dynamic programming approach to find all possible paths in a grid takes extra O(N2) memory but reduces exponential time complexity to O(N2).

Please share if there is something wrong or missing, we would love to hear for you. If you want to contribute to website, please reach out to us on communications@algorithmsandme.com

Balanced partition problem

Balance partition problem

Given a set of integers, partition those integers into two parts where the difference between the two parts is minimum. This problem is known as balanced partition problem. For example, array A = {1,7,4,11}, two subsets can be: {1,11} and {7,4}, two have a difference of 1, which is the minimum difference we can get by splitting this array.

Mathematically, you have a set of n integers each in the range 0, . . . , K. Partition these integers into two subsets such that you minimize |S1 − S2|, where S1 and S2 denote the sums of the elements in each of the two subsets.

Balance partition problem can be asked in many other ways, for instance,
given a list of 22 players and their strengths, divide those 22 players into two teams so that both teams are balanced.
Another version can be that you have “n” candy, each candy has a value associated with it. You want to distribute those candies between two kids as equally as possible.

No matter what version is asked, the approach remains the same.

Balance partition problem: thoughts

The brute force method will be to list down all the subsets of the given set and find the sum of each one of them. Then scan through the sum of all the subsets and find the two closest ones. For a set of n elements, there can be 2n subset. Therefore, the complexity of this brute force solution is already exponential.

Let me tweak balance partition problem a bit. We find if there are two subsets of the set of integers such that the difference between “sum” of these two subsets is zero.
Essentially, this is a special case of the original problem. If the difference between the sum of two subsets is zero that means the sum of both subsets should be exactly equal to half of the sum of all elements in the set.

So problem reduces to a smaller problem that is to find if there is a subset of integers which add up to half the sum of all integers in the set? This is the subset sum problem which we have already solved. 

How can we use information provided by subset set problem above?
Let’s say S is the sum of all the integers in the set. S/2 will be half of that sum. We have to find a subset with sum i such that S/2 -i is minimum.

Whether or not, there is a subset with sum i in the set is given by solving subset sum problem. For the sums, i, which are possible with subsets of the set, find the one which is the least distance from S/2. That will give us other subsets which is least greater than half of the sum of all elements of the set and that will be minimal difference possible between two subsets.

So,  expression would be as

min(S/2 - i) where T[n][i] = True and i>=0 and i<=S/2

Why we took i >=0 and i<S/2? Because, we want to be balanced, so i cannot be more than half of the total sum in any case.

Balanced partition problem: implementation

package com.company;

/**
 * Created by sangar on 25.11.18.
 */
public class BalancedPartition {
    public int findBalancePartition(int[] a){

        // Calculate sum of all the elements in set 
        int S = 0;
        for (int i=0; i<a.length; i++)
            S += a[i];

        boolean T[][] = new boolean[a.length + 1][S + 1];

        /* Initialize first column as true. 
            0 sum is possible with all elements. 
        */
        for (int i=0; i<=a.length; i++)
            T[i][0] = true;

        /*  Initialize top row, except dp[0][0], 
            as false. With 0 elements, no other 
            sum except 0 is possible
        */
        for (int i=1; i<=S; i++)
            T[0][i] = false;

        
        for (int i = 1; i <= a.length; i++) {
            for (int j = 1; j <= S; j++) {
                // If ith element is excluded 
                T[i][j] = T[i - 1][j];

                // If ith element is included 
                if (a[i - 1] <= j)
                    T[i][j] |= T[i - 1][j - a[i - 1]];
            }
        }

        // Initialize difference of two sums. 
        int diff = Integer.MAX_VALUE;

        for (int j = S/2; j >= 0; j--) {
            // Find the 
            if (T[a.length][j] == true)
            {
                diff = S - 2 * j;
                break;
            }
        }
        return diff;
    }
}

Once, we get the nearest sum, we can always backtrack the table and find elements of the subset itself. Actually, this problem is now reduced to 0/1 knapsack problem, where maximum value we can get is j from the set of integers.

Complexity to split set into two balanced partitions is O(n * S) with a space complexity of O(n * S), where S will be the max value array can have.

Please reach out if there is anything wrong or missing in the post. If you are preparing for an interview, please signup for free interview preparation kit.

Minimum jumps to reach at end

Minimum jumps to reach end of array

Given an array of integers, find minimum jumps to reach end of the array. Condition is that you can maximum jump a[i] indices from index i.

For example, in following array, minimum jumps required are 2.

Original array

At index 1, we can either jump 0, 1 or 2 indices ahead. If we jump 2 indices, we would require two more jumps (at 1 and 1) to reach at 4. So total number of jumps would be 3.

You jump maximum at start, but at the end, more number of jumps required.

However if we jump only 1 index ahead, next A[i] will allow us to jump 3 indices ahead, doing so we will reach at the end of the array. So minimum number of jumps to reach at the end of array is 2.

minimum jumps required
Not starting with maximum jump actually save one jump to reach at the end

Minimum number of jumps : thought process

What would be the brute force method to solve this? At each index, you try all possible jumps and get the combination which gives you the minimum jumps. This method will have exponential complexity which we do not want.

What is the original problem? It’s minJumps(start, end) Of all the jumps possible from start, let’s say we go to index k, then what how does problem reduces? Well, now we have to find minimum number of jumps from k to end. How to decide on k now? We try all k values from start+1 to start + a[i].

minJumps(start, end) = Min ( minJumps(k, end) )
for all k reachable from start 

Now, we have clear recursion relationship, what should be the base case? When k + A[k] > end, or k == end, we should return 1 as there would be only one jump required from k to end now.

package com.company;

/**
 * Created by sangar on 10.10.18.
 */
public class MinimumJumps {

    public int minimumNumberOfJump(int[] a, int start, int end){
        //If start == end, we reached the end, return 0.
        if(start == end) return 0;

        //if current element is 0, you cannot jump to end at all
        if(a[start] == 0) return Integer.MAX_VALUE;

        int minimumJumps = Integer.MAX_VALUE;

        for(int k=start+1; k<=start+a[start] && k<=end; k++){
            /*
            For each K from start+1 to end, find the minimum jumps.
             */
            int jumps = minimumNumberOfJump(a,k,end);
            if(jumps != Integer.MAX_VALUE && jumps + 1 <; minimumJumps){
                minimumJumps  = jumps + 1;
            }
        }
        return minimumJumps;
    }
}

Test cases for above function

package test;

import com.company.MinimumJumps;
import org.junit.jupiter.api.Test;

import static org.junit.Assert.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class MinimumJumpTest {

    MinimumJumps tester = new MinimumJumps();

    @Test
    public void baseTest() {

        int[] a = {1, 3, 5, 8, 9, 2, 6, 7, 6, 8, 9};
        assertEquals(3,
			tester.minimumNumberOfJump(a,0, a.length-1));
    }

    @Test
    public void arrayContainsZeroTest() {

        int[] a = {1, 3, 0, 0, 0, 2, 6, 7, 6, 8, 9};
        assertEquals(Integer.MAX_VALUE, 	  
			tester.minimumNumberOfJump(a,0, a.length-1));
    }

    @Test
    public void nullArrayTest() {

        assertEquals(0, tester.minimumNumberOfJump(null,0, 0));
    }

    @Test
    public void arrayWithTwoElementsTest() {

        int[] a = {1, 0};
        assertEquals(1,
			tester.minimumNumberOfJump(a,0, a.length-1));
    }
}

Let’s see execution trace of above function for an input.

Nodes in red are re-calculated

From the above execution tree, we notice that some subproblems are calculated again and again. This is typically known as overlapping subproblems.
Also, optimal solution to subproblem actually lead us to optimal solution for original problem which is optimal subproblem structure. These two property are must to apply dynamic programming to a problem.

What if we store minimum number of jumps required to reach a particular index. To reach first index, jumps required is 0. Jump[i] represents the number of reach index i. Solution to reach at the end of the array would be Jump[n-1]. How do we feel this array? For each i,  from  j = 0 to i-1 and check if j+a[j] <= i, if yes, update jump[i] = min (jump[i], jump[j]+1).

Minimum number of jumps: dynamic programming approach

package com.company;

/**
 * Created by sangar on 10.10.18.
 */
public class MinimumJumps {

    public int minimumNumberOfJumpDP(int[] a){

        if(a == null || a.length == 0) return 0;

        if(a[0] == 0) return Integer.MAX_VALUE;

        int[] jump = new int[a.length];

        //no jumps required for first element
        jump[0] = 0;

        for(int i=1; i<a.length;i++){
            jump[i] = Integer.MAX_VALUE;

            for(int j=0; j<i; j++){
                if(j+a[j]>=i && jump[j] != Integer.MAX_VALUE ){
                    jump[i] = Integer.min(jump[i], 1 + jump[j]);
                }
            }
        }
        return jump[a.length-1];
    }
}

Complexity of dynamic programming approach to find minimum number of jumps to reach end of an array is O(n2) with space complexity of O(n)

If you are interested to solve this problem in O(n) time, please visit stack overflow discussion 

Please share if there is something wrong or missing. If you are interested in taking coaching from one of our experienced teachers, please reach out to us at communications@algorithmsandme.com