Minimum edit distance between two strings

Minimum edit distance between two strings is minimum number of operations one need to perform on one string so that it transforms into another. Operation allowed are insertion of character, deletion of character and substitution of a character. For example, String S1  = EXPONENTIAL String S2 = POLYNOMIAL

minimum edit distance between two strings

From above example we can see we have to find the best possible alignment of two strings. However, there are so many alignments possible with two string, it will be very costly for consider each and every alignment and look for the best.

Can we break the problem in smaller and easy to solve subproblems? The problem at hand is to find minimum edit distance between X[1…n] and Y[1…m] strings, where n and m are lengths of two strings. Consider prefix of each string X[1…i] and Y[1…j], let’s find edit distance for these prefixes and let us call it Edit(i,j). Finally, we need to calculate Edit(n,m).   At each character we have three choices, Let’s consider each case one by one:

If character X[i]  != Y[j], we make the choice delete character X[i], which costs us 1. Now we have i-1 characters in X and j characters in Y to consider which is nothing but Edit(i-1,j).

Second choice we have is to add a character from X, which costs 1.  In this, we have an extra character but have not processed any of the original characters in X, however, character Y[j] now matches with new character inserted, so no need to include that. Problem reduces to Edit(i, j-1).

Third choice is to replace the character X[i] with Y[j]. Cost of replacing character is 1 if X[i] != X[j], however, if X[i] == Y[j], cost is 0. In any case, problem reduces to Edit(i-1, j-1).

We do not know which one to pick to start with, so we will try all of them on each character and pick the one which gives us the minimum value. The original problem can be defined in terms of subproblems as follows:

Edit(i,j) = min ( 1 + Edit(i,j-1), 
1 + Edit(i-1,j),
Edit(i-1, j-1) if X[i] == Y[j]
1 + Edit(i-1, j-1) if X[i] != Y[j]
)

What will be the base case? If both strings are of length zero, cost will be 0.
If one string is of length 0, then cost will be length of other string.

Recursive implementation of edit distance problem

#include<stdio.h<
#include<string.h<

int min(int a, int b) {
	int min = a > b ? b : a;
	return min;
}

int editDistance(char *s1, char *s2, int length1, int length2){
	
	printf("\nlength1 = %d, length2 = %d" , length1, length2);
	
	if(length1 == 0 && length2 == 0) return 0;
	
	if(length1 == 0) return length2;
	
	if(length2 == 0) return length1;
	
	int isCharacterEqual = s1[length1] == s2[length2] ? 0 : 1;
	return min( min(( 1 + editDistance(s1,s2, length1-1, length2)),
					( 1 + editDistance(s1,s2,length1, length2-1))
				),
				(isCharacterEqual + editDistance(s1,s2, length1-1,
				length2-1)
	);
}
int main(){
	char *s = "EXPONENTIAL";
	char *d = "POLYNOMIAL";
	printf("Minimum distance between two strings is : %d",
		editDistance(s,d, strlen(s), strlen(d)));
	
	return 0;
}

If we look at the execution trail, it is evident that we are solving same subproblems again and again.

execution trail of recursive solution to find minimum edit distance

Now, we know two things. The first optimal solution to the original problem depends on optimal solution of subproblems (see recursive relation above). Second, there are overlapping subproblems, which are recalculated again and again. How can we avoid solving the same problem again? Well, store it for later use. That concept is called Memoization and used in dynamic programming.

To implement above formula in dynamic programming, a two dimensional table is required where Table(i,j) stores Edit(i,j) and every cell can be calculated with bottom up approach. At the end Table(n,m) gives the final minimum edit distance. Does not matter, if we fill table row wise or column wise, when we reach at cell (i,j), we will have all the required cells already filled in. To start with Table[0,i]  = i and Table[j,0] = j.Why? Look at the base case for recursive relation.

minimum edit distance between two strings using dynamic programming

Edit distance between two strings : Dynamic programming implementation

int editDistance(char *s1, char *s2){
	int n = strlen(s1);
	int m = strlen(s2);

	int minimumDistance = 0;
	int currentMinimum  = 0;
	int Table[n+1][m+1] ;

	memset(Table,0, sizeof(Table));
	
	//Intitialization
	for(int i=0; i≤n; i++)
		Table[i][0] =i;

	for(int i=1; i≤m; i++)
		Table[0][i] = i;

	for(int i=1; i≤n; i++){
		for(int j=1; j≤m; j++){
			//Case 3 : Possibility 1 :If X[i] == Y[j] 
			if(s1[i-1] == s2[j-1]){
				currentMinimum = Table[i-1][j-1];
			}
			//Case 3 : Possibility 2 :If X[i] != Y[j] 
			else{
				currentMinimum =  Table[i-1][j-1] + 1;
			}
			//Case 1 : Deletion of character from S1 
			if(Table[i][j-1] > Table[i-1][j]){
				minimumDistance = Table[i-1][j] + 1;
			}
			//Case 2 : Addition of character on S1 
			else {
				minimumDistance = Table[i][j-1] + 1;
			}
			if(currentMinimum < minimumDistance){
				minimumDistance = currentMinimum;
			}
			Table[i][j] = minimumDistance;
		}
	}
	return Table[n-1][m-1];
}

Complexity of algorithm to find minimum edit distance between two strings is O(n2) with extra space complexity of O(n2).

Please share if there is something wrong or missing. If you are interested in contributing to website, please reach out to us on [email protected]

Coin change problem

Given a number S and coins of values V = {v1, v2,v3, v4}. Find the number of ways change can be made for S using these coins.We have an infinite supply of these coins. Commonly, this problem is known as the coin change problem. For example,

Input:
N = 4; S = {1,2,3}
Output:
4
Explanation:
There are four solutions: {1,1,1,1},{1,1,2},{2,2},{1,3}

coin-change-problem

Thoughts

As always, let’s start with a brute force solution. We have two choices for a coin: Either we can include the coin in solution, or we can not include it in solution. If coin m is included in solution, problem reduces to find change for value N-V(m) using K coins. Notice that we can again use all the coins for the reduced problems too.
If a coin is not included in a solution when the value to change is less than the denomination of the coin. At this time, there is a reduction in value to be changed, but we lose one coin, hence problem stands: Change value V with N-1 coins.
If we look at it, it is simple recursive formulation.

C(N,m) = C(N,m-1) + C(N- V(m), m)

coin change problem

When do we stop then? When do we know that there is no more solution going forward? If change is required for value zero, there is only one solution possible, which is to include no coin.

C(N,m) = 1 if N == 0

Also, what is the solution if we do not have any coins with us at all and there is value to be changed. There is no solution then.

C(N,m) =0 if N > 0 and m < 0

Also, if the value required is less than zero, then there is again no solution, given that all coins have positive values.

C(N,m) = 0 if N < 0

We see that problem reduces to smaller problems and then there are points when we can stop going forward. That’s a perfect condition for a recursive solution.

Recursive implementation

package com.company;

/**
 * Created by sangar on 5.5.18.
 */
public class CoinChange {

    public static int coinChange(int value, int[] coins, int consideredCoin){
        /* If the value to be changed is negative, since coins are positives,
        there is no way to change them
         */
        if(value < 0) return 0;

        /*When there may or may not be coins, and no value is
        required, there is one solution which is not to include any
        coin */
        if(value == 0) return 1;

        /* When there is the value required by no coins left,
           there is no solution
         */
        if(value > 0 && consideredCoin == coins.length) return 0;
                // When coin is included
        return coinChange(value-coins[consideredCoin],coins, consideredCoin)
               //When coin is not included
            + coinChange(value,coins, consideredCoin + 1);
    }

    public  static void main(String[] args){
        int value = 4;
        int[] coins = {1,2,3};

        int count = coinChange(value, coins, 0);
        System.out.println(count);
    }
}

Complexity of recursive implementation of coin change solution is exponential O(2m) where m is number of coins.

For every coin, we have 2 options, either we include it or exclude it so if we think in terms of binary, its 0(exclude) or 1(include). so for example if we have 2 coins, options will be [ 00, 01, 10, 11 ]. so its 22. For m coins, it will be 2m. In all these options we will be checking whether that selection has made the change which is required
The reason for this exponential complexity is that we are calculating smaller subproblems again and again.

coin change problem recursive tree

In the discussion so far, we see two properties of the solution: First, there is an optimal substructure, which is an optimal solution to subproblems provides an optimal solution to bigger problems. This is known as the optimal substructure property. Second, subproblems are overlapping. These two conditions as necessary for applying a dynamic programming approach.

To avoid calculating the same subproblem again and again, that we can avoid using simple memorization. Let’s create a two-dimensional table Coins. Coins(i, j) represents the number of ways in which change for i can be made using j coins.
Now if jth coin is included, then numbers of ways will be Coins(i- v[j], j-1)

If jth coin is not included, number of ways will be  Coins (i, j-1). Adding both of them will give us

Coins(i,j) = Coins(i-v[j]) + Coins(i, j-1).

For all i = 0 Cost(i,j) = 1, for all j = 0 Cost(i,j) will 1 for all i > 0.
We can start bottom-up and fill the table as per the formula above. Once we have filled the entire table, Coins(N,m) will be answered to the problem.

Show me the dynamic programming implementation

package com.company;

/**
 * Created by sangar on 5.5.18.
 */
public class CoinChange {

    public static int coinChangeDP(int value, int[] coins){

        int[][] numWays = new int[value+1][coins.length+1];

        for(int i = 0; i <= coins.length; i++ ){
            numWays[0][i] = 1;
        }

        for(int i = 1; i <= value; i++ ){
            numWays[i][0] = 0;
        }

        for(int i = 1; i <= value; i++){
            for(int j = 1; j <= coins.length; j++){
                numWays[i][j] = (i -coins[j-1] >= 0 ? numWays[i-coins[j-1]][j] : 0 )
                                 + numWays[i][j-1];
            }
        }

        return numWays[value][coins.length];
    }

    public  static void main(String[] args){
        int value = 4;
        int[] coins = {1,2,3};

        int count = coinChangeDP(value, coins);

        System.out.println(count);
    }
}

Example: for Value = 4 and coins {1,2,3}, below is the bottom up filling of table.

The complexity of dynamic programming implementation of the coin change problem is O(Nm) where N is value and m is the number of coins. Additional O(Nm) space is also required for memoization.

Please share if there is something missing or wrong. If you want to share your knowledge with thousands of learners across the world, please reach out to us on [email protected]

Number of binary search trees with N nodes

Number of binary search trees with n nodes

Given a number N, calculate number of binary search trees with n nodes those can be formed using number 1 to N as nodes. For example, with N = 3, there are 5 different trees which can be created as shown below.

number of binary search trees with n nodes

To solve the problem, let’s reduce the problem. What if there is only one node i.e N = 1.  There is only one tree possible with given node as root node and no children.

number of binary search trees with n nodes

How about if N = 2, that is we have two nodes (1 and 2).  There are two binary search trees possible

number of binary search trees with n nodes

Now let’s take N =3. Five BST are possible as shown above in figure in example.

One observation is that every node becomes root at least one time. And when a number becomes root, all elements greater than the node form right subtree and all numbers less than root, for left subtree.

For every node i as root, all nodes on its left side  (from 1 to i-1 ) can form left subtree. Root of left subtree will be all numbers from 1 to i-1.
For node i  as root, all nodes on right side from i+1 to N will form right subtree. Root of right subtree of node(i) will be all numbers between i+1 to N.

Calculate number of left subtrees possible with given node i , lets call it l. Then calculate number of right subtrees possible with i as root node and call it r. Now for each left subtree, there are r right subtrees with given root node. So total number of trees which can be formed using given node as root will be (l * r)

Consider each node as root node and calculate number of trees possible. Add them together and we come to our solution.

Number of binary search trees with n nodes : Implementation

package com.company.BST;

/**
 * Created by sangar on 11.5.18.
 */
public class NumberOfBST {

    public static int numberOfTrees(int n){
        if(n <= 1) return 1;

        int sum = 0;
        int left = 0, right = 0;

        for(int i=1; i<=n; i++){
            left = numberOfTrees(i-1);
            right = numberOfTrees(n-i);
            sum +=  (left * right);
        }
        return sum;
    }

    public static void main (String[] args){
        System.out.print("Number of tress possible :" + numberOfTrees(3));
    }
}

Simple way to check if answer for program is correct or not, calculate a number called as Catalan number. Here we need to calculate Nth Catalan number. Mathematically, Catalan number can be represented as

Look at the execution of above implementation, see that some of subproblems are solved again and again.

number of bst with n nodes

We know that optimal solution to subproblems gives optimal solution to the original problem. Also, there are overlapping subproblems which are solved multiple times. These two conditions are the perfect for thinking in dynamic programming.  To avoid solving same problem multiple times, we use technique called memoization, where we will store solutions to subproblems which are already solved.

Let’s say T[i] represents number of binary search trees with i as root node.
What will be T[0] ? As there is one tree possible, empty tree with zero nodes, T[0] = 1. Again what about T[1]? We already saw that only one BST is possible with one node. T[1] = T[0] = 1

T[i] = Sum ( T[j] * T[i-j-1]) for j = o to i

package com.company.BST;

/**
 * Created by sangar on 11.5.18.
 */
public class NumberOfBST {

    public static int numberOfTrees(int n){
        if(n <= 1) return 1;

        int sum = 0;
        int left = 0, right = 0;

        for(int i=1; i<=n; i++){
            left = numberOfTrees(i-1);
            right = numberOfTrees(n-i);
            sum +=  (left * right);
        }
        return sum;
    }

    public static int numberOfTreesDP(int n){
        if(n <= 1) return 1;

        int[] T = new int[n+1];
        T[1] = T[0] = 1;

        for(int i=2; i<=n; i++){
            int sum = 0;
            for(int j=0; j<i; j++){
                sum += T[j] * T[i-j-1];
            }
            T[i] = sum;
        }
        return T[n];
    }

    public static void main (String[] args){
        System.out.print("Number of tress possible :" + numberOfTreesDP(4));
    }
}

Complexity of dynamic programming approach is O(n2) along with addition space complexity of O(n)

For more dynamic programming problem please refer : What is DP?

Please share if there is something missing or wrong. If you want to contribute and share your knowledge with thousands of learners across world, please reach out to us on [email protected]

Longest Arithmetic Progression

Longest Arithmetic Progression

Given a set of integers in sorted order, find length of longest arithmetic progression in that set.

Arithmetic progression is set of numbers in which difference between two consecutive numbers is constant. Mathematical formula for arithmetic progression is

Tn = a + (n – 1) d where a is first element, T(n) is nth element and d is constant.

1,2,3 is AP with d = 1
3,7,11,15 is AP with d = 4

Let’s define longest arithmetic progression problem in detail first.  Problem statement is to find longest sequence of indices, 0 < i1 < i2 < … < ik < n such that sequence A[i1], A[i2], …, A[ik] is an arithmetic progression.

Longest Arithmetic Progression  thoughts

What will be the brute force solution? In any arithmetic progression,  difference between any two consecutive elements should be same as the difference between first and second element. We can pick each pair of numbers from set as first two elements in AP, then scan the remaining array to find all numbers which satisfy the condition.

There are n(n-1) pairs for a set of n elements, for each pair, we linearly scan the array for more elements in AP. Overall complexity of brute force algorithm to find length of longest arithmetic progression would be O(n3).

Can we do better the cubic complexity? Let’s understand a more simpler problem first. Given three numbers, what is most efficient way to find if they form an arithmetic progression?

A[i], A[j] and A[k] form an AP if 2* A[j] = A[i] + A[k] where k>j and i<j.

For example, 1,2,3 are AP as 2*2 = 1 + 3. Also, 7,11,15 is AP as 2*11 = 15 +7.

How can we use this information to find if there is an arithmetic progression with 3 numbers in a set of integer?  This is very similar problem to find pair of numbers in sorted array which sum up to X. We have to find i and k such that A[i] + A[k] = 2*A[j], where 1<j<n-1.

  • For each 0<j<n-1:
  • Initialize i as j-1 and k as j+1
    • If A[i] + A[k] is equal to 2*A[j], then we are done.
    • If A[i] + A[k] > 2*A[j], then decrease i by 1.
    • Else if A[i] + A[k] < 2*A[j], then increment k by 1.

This will give answer to question if there exist three numbers in set which form AP.

If set contains more than two or more elements, minimum length of longest AP will be 2. Why?  Any number will always form AP of length 2 with last element of set. Can we combine all this to come up with the solution for original problem?

Let’s say L[i][j] store the length of longest arithmetic progression with A[i] and A[j] as first two elements of AP where i < j. If j == n, then L[i][j] = 2, that means bottom most column in matrix will be all 2. Why?

Now, if we fix j, we find i and k such that A[i], A[j] and A[k] form AP, then

L[i][j] = 1 + L[j][k].

This recurrence relation means that we must have L[j][k] before L[i][j]. As per relationship, i<j<k, hence the table L will be filled from bottom right to top left.

Algorithm to find length of longest arithmetic progression

  1. For j = n L[i][j] = 2 for 0<i<n, bottom most column filled with 2.
  2. Fix j = n-1 to 1 and for each j do below steps
    • Find all i and k such that A[i], A[j] and A[k] form AP. Algorithm given above.
      • Fill L[i][j] = 1 + L[j][k]
      • Check if L[i][j] is longer than current max length, if yes, update it.
    • Slight change for optimization, if A[i] + A[k] is greater than 2*A[j], we can safely fill L[i][j] as 2
    • While i > 0 even after k > n, fill all L[i][j] =2.
#include<stdlib.h>
#include<stdio.h>

#define max(a,b) (a>b) ? a:b

int longestArithmeticProgression(int a[], int n){
	int i,j,k;
	int Table[n][n];
	int longestAP = 2;
	
	for(i=0;i<n; i++)
		Table[i][n-1] =2;
		
	for(j= n-2; j>=1; j-- ){
		i = j-1;
		k = j+1;
		
		while(i>=0 && k<n){
			if(2* a[j] > a[i] + a[k]){
				k++; // Table[j][k]is already filled 
			}
			else if (2* a[j] < a[i] + a[k]){
             /*Table[i][j] needs to be filled before we move up */
             	Table[i][j] =2; 
             	i--;
            }
            else{
            	Table[i][j] = Table[j][k] +1;
             	longestAP = max(longestAP, Table[i][j]);
             	i--;
             	k++;
            }
        }
        while(i>=0){
        	Table[i][j] =2; 
        	i--;
        }
    }
    return longestAP;
}

int main(){
	int array[] = {1,7,10,13,16,19};
	int n = sizeof(array)/sizeof(array[0]);
	printf("Lenght of longest arithemetic progration is : %d",
	          longestArithmeticProgression(array,n));
     return 0;
}

Complexity of dynamic programming approach to find length of longest arithmetic progression is O(n2) with additional space complexity of O(n2).

Reference
http://www.cs.uiuc.edu/~jeffe/pubs/pdf/arith.pdf

Please share if there is something wrong or missing. Reach out to us at [email protected] if you are interested in taking personalized coaching sessions.

Count all possible paths in maze

Count all possible paths in maze

Count all possible path in maze is one more grid problem which is asked in Amazon and Microsoft interviews and can be solved using dynamic programming. Before going into details of solution, let’s understand the problem. Problem statement is find total number of paths in a given maze or grid to reach at right most bottom cell from leftmost top cell. You can move right, down and diagonally and not left. For example, one of the path to reach bottom most cell is shown below

all possible paths in grid
One possible path to reach to destination

All possible paths in grid : Recursive approach

Typical property of a maze problem is that it reduces to a smaller problem as soon as we make one move. Another problem which uses the ame concept is Minimum cost path in grid . In any problem, once move is made, solve for smaller subproblem, in this case, try to figure out how many paths are possible from that new cell to destination cell. Also once we have decided to move in a particular direction (let’s say right) from a cell, that does not mean we need not count paths possible by going to other directions (down and diagonal). Hence for each cell, we have to count possible path if  we move right, possible paths if we move down and possible paths if we move diagonally and add them up.

Since, problem reduces to smaller problem with each move, think of applying recursion. What will be the base case for recursion?
Base will be when we reach at rightmost bottom cell. If we take i and j as row and column of maze, base case would be

(i == m && j ==n) return 1

Recursion formulation for maze problem would be

    count(i,j) =  count(i-1, j) + count(i, j-1) + count(i-1, j-1)

Recursive implementation would be

#include <stdio.h>

int PossiblePaths(int i,int j, int m, int n){
	if(i > m || j > n) return 0; 
	
	if(i == m && j == n) return 1;
	
	return PossiblePaths(i+1,j, m,n) 
			+ PossiblePaths(i, j+1, m,n) 
			+ PossiblePaths(i+1, j+1,m,n);
}

int main(void) {
	
	int m = 4;
	int n = 4;
	printf("\n Number of paths in maze : %d",PossiblePaths(0,0,m,n) );
	return 0;
}

Let’s see how the execution happens. We take a 3×3 maze so m and n equal to 3. To start with we take i and j  equal to 0.

all possible paths in a grid

Execution of above recursive implementation. From above figure, see that there are some subproblems which are calculated again and again. These will increase as we go down the tree.
Two basic conditions those should satisfy before we apply dynamic programming:

1. There should be optimal subproblem, which reduce original problem to smaller one and solving smaller problems lead to solution of bigger problem.
2. There should be overlapping subproblems which asks for tabulation of the results from subproblems to be used for solution further.

All possible paths in grid : DP implementation

To store solutions of subproblems, we would use two dimensional table.
Each cell Table(i,j) will store number of paths which are possible to reach cell(i,j). Our answer will be table(m,n).
Cell (i,j) can be reached at by either coming from (i-1,j) (Moving down) or by coming from cell (i,j-1) (Moving right) or by coming from cell (i-1, j-1) (Moving diagonally).

So our Table(i,j) can be calculated as Table(i,j) = Table(i-1,j) + Table(i,j-1) + table(i-1,j-1)
Also, to reach any cell in first column is 1 and cell in first row is also 1. Hence, Table(i,0) = Table(0,j) = 1

#include<stdlib.h>
#include<stdio.h>

int PossiblePaths(int m,int n){
	int Table[m+1][n+1];
	int i,j;
	
	for(i=0;i<=m; i++){
		Table[i][0] =1;
	}
	for(i=0;i<=n; i++){
		Table[0][i] =1;
	}
	for(i=1; i<=m; i++ ){
		for(j=1; j<=n; j++){
			Table[i][j] = Table[i-1][j] + Table[i][j-1] + Table[i-1][j-1];
		}
	}
	return Table[m][n];
}

int main(){
   printf("%d",PossiblePaths(4,4));
   return 0;
}

Space optimized version (Thanks to Jakube).

#include<stdlib.h>
#include<stdio.h>
 
int PossiblePaths(int m,int n){
	int Table[n+1];
	int diagonalSum = 0;
 
	for(int i=0;i<=n; i++){
		Table[i] = 1;
	}
	for(int i=1; i<=m; i++ ){
		int diagonalSum = 1;
		for(int j=1; j<=n; j++){
			int temp = Table[j];
			Table[j] = Table[j] +  Table[j-1] +  diagonalSum;
			diagonalSum = temp;
		}
	}
	return Table[n];
}
 
int main(){
   printf("%d",PossiblePaths(4,4));
   return 0;
}

Dynamic programming approach to find all possible paths in a grid takes extra O(N2) memory but reduces exponential time complexity to O(N2).

Please share if there is something wrong or missing, we would love to hear for you. If you want to contribute to website, please reach out to us on [email protected]

Balanced partition problem

Given a set of integers, partition those integers into two parts where the difference between the two parts is minimum. This problem is known as balanced partition problem. For example,

Input:
A = [1,7,4,11], 
Output:
1
Explanation:
Two subsets can be: {1,11} and {7,4}, two have a difference of 1, which is the minimum difference we can get by splitting this array.
Mathematically, you have a set of n integers each in the range 0, . . . , K. Partition these integers into two subsets such that you minimize |S1 − S2|, where S1 and S2 denote the sums of the elements in each of the two subsets.

Balance partition problem can be asked in many other ways, for instance, given a list of 22 players and their strengths, divide those 22 players into two teams so that both teams are balanced. Another version can be that you have n candy, each candy has a value associated with it. You want to distribute those candies between two kids as equally as possible.

No matter what version is asked, the approach remains the same.

Balance partition problem: thoughts

The brute force method will be to list down all the subsets of the given set and find the sum of each one of them. Then scan through the sum of all the subsets and find the two closest ones. For a set of n elements, there can be 2n subset. Therefore, the complexity of this brute force solution is already exponential.

Let me tweak balance partition problem a bit. We find if there are two subsets of the set of integers such that the difference between sum of these two subsets is zero. Essentially, this is a special case of the original problem. If the difference between the sum of two subsets is zero that means the sum of both subsets should be exactly equal to half of the sum of all elements in the set.

So problem reduces to a smaller problem that is to find if there is a subset of integers which add up to half the sum of all integers in the set? This is the subset sum problem which we have already solved. 

How can we use information provided by subset set problem above? Let’s say S is the sum of all the integers in the set. S/2 will be half of that sum. We have to find a subset with sum i such that S/2 -i is minimum.

Whether or not, there is a subset with sum i in the set is given by solving subset sum problem. For the sums, i, which are possible with subsets of the set, find the one which is the least distance from S/2. That will give us other subsets which are least greater than half of the sum of all elements of the set and that will be minimal difference possible between two subsets.

So,  expression would be as

min(S/2 - i) where T[n][i] = True and i>=0 and i<=S/2

Why we took i >=0 and i<S/2? Because, we want to be balanced, so i cannot be more than half of the total sum in any case.

Balanced partition problem implementation

package com.company;

/**
 * Created by sangar on 25.11.18.
 */
public class BalancedPartition {
    public int findBalancePartition(int[] a){

        // Calculate sum of all the elements in set 
        int S = 0;
        for (int i=0; i<a.length; i++)
            S += a[i];

        boolean T[][] = new boolean[a.length + 1][S + 1];

        /* Initialize the first column as true. 
            0 sum is possible with all elements. 
        */
        for (int i=0; i<=a.length; i++)
            T[i][0] = true;

        /*  Initialize top row, except dp[0][0], 
            as false. With 0 elements, no other 
            sum except 0 is possible
        */
        for (int i=1; i<=S; i++)
            T[0][i] = false;

        
        for (int i = 1; i <= a.length; i++) {
            for (int j = 1; j <= S; j++) {
                // If ith element is excluded 
                T[i][j] = T[i - 1][j];

                // If ith element is included 
                if (a[i - 1] <= j)
                    T[i][j] |= T[i - 1][j - a[i - 1]];
            }
        }

        // Initialize difference of two sums. 
        int diff = Integer.MAX_VALUE;

        for (int j = S/2; j >= 0; j--) {
            // Find the 
            if (T[a.length][j] == true)
            {
                diff = S - 2 * j;
                break;
            }
        }
        return diff;
    }
}

Once, we get the nearest sum, we can always backtrack the table and find elements of the subset itself. Actually, this problem is now reduced to 0/1 knapsack problem, where maximum value we can get is j from the set of integers.

Complexity to split set into two balanced partitions is O(n * S) with a space complexity of O(n * S), where S will be the max value array can have.

Minimum jumps to reach at end

Minimum jumps to reach end of array

Given an array of integers, find minimum jumps to reach end of the array. Condition is that you can maximum jump a[i] indices from index i.

For example, in following array, minimum jumps required are 2.

Original array

At index 1, we can either jump 0, 1 or 2 indices ahead. If we jump 2 indices, we would require two more jumps (at 1 and 1) to reach at 4. So total number of jumps would be 3.

You jump maximum at start, but at the end, more number of jumps required.

However if we jump only 1 index ahead, next A[i] will allow us to jump 3 indices ahead, doing so we will reach at the end of the array. So minimum number of jumps to reach at the end of array is 2.

minimum jumps required
Not starting with maximum jump actually save one jump to reach at the end

Minimum number of jumps : thought process

What would be the brute force method to solve this? At each index, you try all possible jumps and get the combination which gives you the minimum jumps. This method will have exponential complexity which we do not want.

What is the original problem? It’s minJumps(start, end) Of all the jumps possible from start, let’s say we go to index k, then what how does problem reduces? Well, now we have to find minimum number of jumps from k to end. How to decide on k now? We try all k values from start+1 to start + a[i].

minJumps(start, end) = Min ( minJumps(k, end) )
for all k reachable from start 

Now, we have clear recursion relationship, what should be the base case? When k + A[k] > end, or k == end, we should return 1 as there would be only one jump required from k to end now.

package com.company;

/**
 * Created by sangar on 10.10.18.
 */
public class MinimumJumps {

    public int minimumNumberOfJump(int[] a, int start, int end){
        //If start == end, we reached the end, return 0.
        if(start == end) return 0;

        //if current element is 0, you cannot jump to end at all
        if(a[start] == 0) return Integer.MAX_VALUE;

        int minimumJumps = Integer.MAX_VALUE;

        for(int k=start+1; k<=start+a[start] && k<=end; k++){
            /*
            For each K from start+1 to end, find the minimum jumps.
             */
            int jumps = minimumNumberOfJump(a,k,end);
            if(jumps != Integer.MAX_VALUE && jumps + 1 <; minimumJumps){
                minimumJumps  = jumps + 1;
            }
        }
        return minimumJumps;
    }
}

Test cases for above function

package test;

import com.company.MinimumJumps;
import org.junit.jupiter.api.Test;

import static org.junit.Assert.assertEquals;

/**
 * Created by sangar on 23.9.18.
 */
public class MinimumJumpTest {

    MinimumJumps tester = new MinimumJumps();

    @Test
    public void baseTest() {

        int[] a = {1, 3, 5, 8, 9, 2, 6, 7, 6, 8, 9};
        assertEquals(3,
			tester.minimumNumberOfJump(a,0, a.length-1));
    }

    @Test
    public void arrayContainsZeroTest() {

        int[] a = {1, 3, 0, 0, 0, 2, 6, 7, 6, 8, 9};
        assertEquals(Integer.MAX_VALUE, 	  
			tester.minimumNumberOfJump(a,0, a.length-1));
    }

    @Test
    public void nullArrayTest() {

        assertEquals(0, tester.minimumNumberOfJump(null,0, 0));
    }

    @Test
    public void arrayWithTwoElementsTest() {

        int[] a = {1, 0};
        assertEquals(1,
			tester.minimumNumberOfJump(a,0, a.length-1));
    }
}

Let’s see execution trace of above function for an input.

Nodes in red are re-calculated

From the above execution tree, we notice that some subproblems are calculated again and again. This is typically known as overlapping subproblems.
Also, optimal solution to subproblem actually lead us to optimal solution for original problem which is optimal subproblem structure. These two property are must to apply dynamic programming to a problem.

What if we store minimum number of jumps required to reach a particular index. To reach first index, jumps required is 0. Jump[i] represents the number of reach index i. Solution to reach at the end of the array would be Jump[n-1]. How do we feel this array? For each i,  from  j = 0 to i-1 and check if j+a[j] <= i, if yes, update jump[i] = min (jump[i], jump[j]+1).

Minimum number of jumps: dynamic programming approach

package com.company;

/**
 * Created by sangar on 10.10.18.
 */
public class MinimumJumps {

    public int minimumNumberOfJumpDP(int[] a){

        if(a == null || a.length == 0) return 0;

        if(a[0] == 0) return Integer.MAX_VALUE;

        int[] jump = new int[a.length];

        //no jumps required for first element
        jump[0] = 0;

        for(int i=1; i<a.length;i++){
            jump[i] = Integer.MAX_VALUE;

            for(int j=0; j<i; j++){
                if(j+a[j]>=i && jump[j] != Integer.MAX_VALUE ){
                    jump[i] = Integer.min(jump[i], 1 + jump[j]);
                }
            }
        }
        return jump[a.length-1];
    }
}

Complexity of dynamic programming approach to find minimum number of jumps to reach end of an array is O(n2) with space complexity of O(n)

If you are interested to solve this problem in O(n) time, please visit stack overflow discussion 

Please share if there is something wrong or missing. If you are interested in taking coaching from one of our experienced teachers, please reach out to us at [email protected]

Subset sum problem

Given a set of integers, find if there is a subset with a sum equal to S where S is an integer.

This problem is commonly known as a subset sum problem.

For example, in set = [2,4,5,3], if  S= 6, answer should be True as there is a subset [2,4] which sum up to 6. However, for the same set if S = 15, answer would be False as there is no subset which adds up to 10.

Here, we need to find all elements (present in the set) whose sum does not exceed S. This condition is similar to what we have in the knapsack problem. In the Knapsack problem, we had a limited capacity and we could take anything which does not exceed the limit. However there is a difference, in the knapsack problem, we were allowed to take items less than capacity if the value was greater than all other combinations. In the subset sum problem, we need to discard subsets that have the sum less than S.

What was the basic strategy for solving the knapsack problem? Yes, take each item and check if it fits in the constraint. If it does, we would take that item and reduce the problem to a subproblem looking in n-1 items and reduced capacity of C-v where v was the value of item included.
If the chosen item does not satisfy the constraint, ignore it, and reduce the problem to N-1 items and capacity C. The same principle applies here too.

Subset sum problem: Algorithm

Algorithm for find subset with a given sum using recursion is as follows:
For each integer in the set, there are two options:
1. Include current integer in solution.
2. Do not include the integer in solution

The choice we make above will reduce the original problem to subproblem with n-1 elements and either S-v or S as sum expected for those n-1 elements.

When do we stop reducing problems into sub-problems?

If the sum expected of remaining elements in sub-problem is equal to zero at any given point of time, the subset with the given sum is found. If we have visited all elements of the set and yet not achieved expected sum S (In this case, we have not elements while expected sum is still greater than 0), there is no subset.

With the help of some bookkeeping, we can also print all subsets with the given sum.

Subset with given sum recusive implementation

#include<stdlb.h>
#include<stdio.h>

#define true 1
#define false 0
int isSubsetSum(int arr[], int n, int sum,
                int subset[], int count){
  int i;
  if(sum == 0) {
       printf("\n");
       for(i =0; i < count; i++)
           printf("%d  ",  subset[i]);
           return true;
       }
  if(n < 0  && sum != 0)  return false;
	
  subset[count] =  arr[n];
  return
         isSubsetSum(arr, n-1, sum-arr[n], subset, count + 1)
         || isSubsetSum(arr, n-1, sum, subset, count );
}

int main(){

  int set[] = {1,3,5,4,6};
  int n  =  sizeof(set)/sizeof(set[0]);
  int K = 10;
  int subset[n]
  printf("Is there subset with Sum = %d : %s",
          K, isSubsetSum(set, n, subset, K) ? "Yes": "No");
  return 0;
}

The complexity of the algorithm to find subset in an array with a given sum is O(2N)) as in worst case it has to go through all the A (of length 1 to N ) of the set.
subset sum dp

Subset sum problem dynamic programming approach

Two conditions which are must for application of dynamic programming are present in the above problem. The optimal solution to subproblem actually leads to an optimal solution for the original problem. At the same time, we are solving subproblems, again and again, so overlapping subproblems. How can we use dynamic programming here then?
Let’s say we to find if j elements in the set which add up to sum i. What if there are no elements in the set, that means j = 0. In this case, no matter what, no sum except 0 is possible. If we store it in a table, T[0][0] = True, and T[i][0] = False for all i > 0 and i< S.

What if we have to find S = 0 for given j elements? In that case, it is always possible to find a set with zero elements (empty subset) which adds up to zero. Therefore, T[0][j] = True for all j >=0 and j <= n.

When would T[i][j] be True otherwise? It can be true in two conditions:

  1. If there is a subset with j-1 elements which already adds up to sum i.
  2. If there is a subset with j-1 elements which add up to i-a[j]. This means adding a[j] to that subset will give us a subset of j elements with sum i.
T[i][j] = T[i-a[j]][j-1] || T[i][j-1]

Make sure that i – a[j] >= 0. This recurrence relation will fill up the table and value T[n][S] will tell us if there is a subset with sum S in the set of n integers.

#include<stdlib.h>
#include<stdio.h>

int isSubsetSum(int arr[], int n, int sum)
{
  /* The value of subset[i][j] will be true if there is a
  subset of set[0..j-1] */
  int subset[sum+1][n+1];
  int i,j;

  /* If sum ==0, there will be empty set satisfying that condition
  hence row with sum = 0 will have all true values. */
  for (i = 0; i <= n; i++)
    subset[0][i] = true;

  /* If sum is not 0 and set is empty, no subset is there */
  for (i = 1; i <= sum; i++)
    subset[i][0] = false;

  for (i = 1; i <= sum; i++)
  {
    for ( j = 1; j <= n; j++)
    {
        /* If there is subset with j-1 elements, copy value */
        subset[i][j] = subset[i][j-1];

        /* Now if the latest element can be added */
        if (i <= arr[j-1])
            subset[i][j] = subset[i][j]
                           ||subset[i-arr[j-1]][j-1];
    }
  }
  return subset[sum][n];
}

/* Driver program */
int main(){

  int set[] = {1,3,5,4,6};
  int n  =  sizeof(set)/sizeof(set[0]);
  int K = 10;
  printf("Is there subset with Sum = %d : %s",
          K, isSubsetSum(set, n, K) ? "Yes" : "No");
  return 0;
}

Dynamic programming implementation of subset problem has the time complexity of O(nS) and uses O(nS) extra space.

There is another problem based on the subset sum problem called: Partition Equal Subset Sum

class Solution {
    public boolean canPartition(int[] nums) {
        
        int sum = 0;
        
        for(int i=0; i<nums.length; i++){
            sum+=nums[i];
        }
        
        if(sum % 2 == 1) return false;
        
        int rSum = sum/2;
        
        return isSubset(nums, rSum);
    }
    
    private boolean isSubset(int [] a, int sum){
        
        boolean [][] dp = new boolean[a.length+1][sum+1];
        
        for(int i=0; i<=a.length; i++){
            dp[i][0] = true;
        }
        
        for(int i=1; i<=sum; i++){
            dp[0][i] = false;
        }
        
        for(int i=1; i<=a.length; i++){
            for(int j=1; j<=sum; j++){
                dp[i][j] = dp[i-1][j];
                
                if(j >= a[i-1]){
                    dp[i][j] = dp[i][j] || dp[i-1][j-a[i-1]];
                }
            }
        }
            
        return dp[a.length][sum];
    }
}

Please share if there is something wrong or missing in the post. We would love to hear from you. If you are preparing for an interview and need preparation material, please signup to the website.

Word break problem

Word break problem

This problem is commonly asked in the Google and Amazon interview. We all know that if you typed string in Google search box does not make sense, Google breaks that into meaningful words and asks us back if we meant those words instead of a single word. This post discusses how can we find if the given string can be broken into meaningful dictionary words. For example, if I typed algorithmsandme and given dictionary is [“algorithms”, “and”, “me”], this string is breakable in meaningful words. but if the string is algorithmsorme this is not breakable into meaningful words. You can find this problem for practice at leetcode.

Word break problem : thoughts

We start with the first character of the string, check if the character itself is a word in the dictionary? If yes, then our problem reduces to the smaller problem, that is to check if substring from index 1 to s.length is breakable or not.
If not, then we check two characters and then three characters and so on till we can check the whole string. As with every character inclusion, the problem reduces in size but remains the same, so ideal case for recursive implementation.

package AlgorithmsAndMe;

import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreak {

    public boolean wordBreak(String s, List<String> wordDict) {
        return wordBreakUtil(s, wordDict, 0, table);
    }

    private boolean wordBreakUtil(String s, 
                                   List<String> wordDict, 
                                   int index) {

        if (index == s.length()) return true;

        boolean isBreakable = false;
        for(int i=index; i<s.length(); i++) {
            isBreakable = isBreakable 
                   || wordDict.contains(s.substring(index, i+1))
                    && wordBreakUtil(s, wordDict, i + 1);
        }

        return isBreakable;
    }
}

If you notice we are solving the same problems again and again in recursive function wordBreakUtil, how can we save that repeated calculations? Best way to save the already solve problems in a cache, that way we can refer to the cache if the problem is already solved or not. If yes, do not solve it again and use the cached value. This approach is called a Top Down approach and uses memoization to avoid repeated subproblems.

package AlgorithmsAndMe;

import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreak {

    public boolean wordBreak(String s, List<String> wordDict) {
        int [] table =  new int[s.length()];
        for(int i=0; i<s.length(); i++){
            table[i] = -1;
        }
        return wordBreakUtilTopDown(s, wordDict, 0, table);
    }

    private boolean wordBreakUtilTopDown(String s, 
                            List<String> wordDict,
                            int index,
                            int[] table) {

        if (index == s.length()) return true;

        if(table[index] < 0) {
            boolean isBreakable = false;
            for (int i = index; i < s.length(); i++) {
                isBreakable = isBreakable 
                        || wordDict.contains(s.substring(index, i + 1))
                        && wordBreakUtilTopDown(s, wordDict, i + 1);
            }
            table[index] = isBreakable ? 1 : 0;
        }
        return table[index] == 1 ? true : false;
    }
  }

If you run the first solution, it will exceed the time limit on leetcode, however, the second implementation should be accepted with 4ms as the time to run. Now you can appreciate the efficiency by memoization.

Word break problem using dynamic programming

In the last two implementations, two things are evident: first, the optimal solution of a subproblem leads to the optimal solution of the original problem. Second, there are overlapping subproblems. These are two must have conditions for applying dynamic programming. We already saw the memoization and top-down approach of DP to avoid repeated solving of subproblems. How can we do it bottom up?

What if store an information if the string till index i is breakable? What will be the base case? The string before index 0 is alway breakable as empty string. So table[0] can be always true. To check if string till index i is breakable or not, we check from index 0 to index i-1 if there is any index j till which string is breakable. If yes, then we just check if substring from index j to i, that will make table[i] as true.

package AlgorithmsAndMe;

import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreak {

    public boolean wordBreak(String s, List<String> wordDict) {
        return wordBreakBottomUp(s, wordDict, 0, table);
    }

    private boolean wordBreakUtilBottomUp(String s, List<String> wordDict){

        if(s == null || s.length() == 0) return false;

        boolean[] table  = new boolean[s.length()+1];

        table[0] = true;
        for (int i = 1; i <= s.length(); i++) {
            for (int j = i - 1; j >= 0; j--) {
                if (table[j] && wordDict.contains(s.substring(j, i))) {
                        table[i] = true;
                    }
                }
            }
        }
        return table[s.length()];
    }
}

The time complexity of the above implementation of the word break problem is O(n2)

If you want to store all the strings which can be generated by breaking a particular word, below is the code.

package AlgorithmsAndMe;

import java.util.*;

public class WordBreak2 {

    public List<String> wordBreak(String s, List<String> wordDict) {
        Map<String, List<String>> map = new HashMap<>();
        return wordBreakUtil2(s, wordDict, map);
    }

    private List<String> wordBreakUtil2(String s,
                                        List<String> wordDict,
                                        Map<String, List<String>> map) {

        if(map.containsKey(s)){
            return map.get(s);
        }

        List<String> result = new ArrayList<String>();
        if (wordDict.contains(s)){
            result.add(s);
        }

        for(int i=1; i<=s.length(); i++) {
            String prefix = s.substring(0, i);
            if(wordDict.contains(prefix)){
                List<String> returnStringsList = wordBreakUtil2(s.substring(i), wordDict, map);

                for(String returnString :returnStringsList ){
                    result.add(prefix + " " + returnString);
                }
            }
        }
        map.put(s,result);

        return result;
    }
}

Please share if there is something is wrong or missing. If you are preparing for an interview and need any help with preparation, please reach out to us or book a free session.

Longest Common Subsequence

Longest common subseuence

A subsequence of a string is set of all the characters which are left to right order and not necessarily contiguous. For example, string ABCDEG has ABC, ADG, EG, BCDEG subsequences; whereas BDA is not a subsequence of the given string, even though all the characters are present in the string, they do not appear in the same order.

longest common subsequence lcs

Given two strings X and Y, find longest common subsequence (LCS) Z. For example, X = ABCDSEFGD Y = ACFEFXVGAB; LCS Z would be ACEFG.

Longest common subsequence: line of thoughts

First of all, notice that it is an optimization problem, it is a hint that it may be a dynamic programming problem but we are not sure yet.

Let’s say that the length of the string 1 and the string of 2 are N and M. Can I know the longest common subsequence in length N and M if I already know the LCS in N-1 and M-1? The direct question is can I divide the original problem into subproblems and solve those subproblems to get the answer for original problem? In this case, the answer is yes. (This is the second hint towards dynamic programming application, optimal subproblem substructure).

How can we divide the problem into subproblems? The length of X is N and length of Y as M. Start from the end of both strings. Check if X[N] == Y[M]. If yes, the problem reduces to find the longest common subsequence in X[1..N-1] and Y[1..M-1].

What if they are not equal? Then one by one we have to exclude character from string X and Y. Why?
First, we exclude the character from the X and find LCS in remaining characters of X and all the characters of Y. The problem reduces to X[1..N-1] and Y[1..M]. Next, we exclude a character from Y, the problem reduces to X[1..N] and Y[1..M-1]. Any of the two choices can give us the longest common subsequence, so we would take maximum from both the cases.

LCS(i,j)  =  1 + LCS[i-1, j-1] if X[i] == Y[j]
  =   MAX (LCS[i-1,j], LCS[i, j-1]) if X[i] != Y[j]
=   0 if i or j is 0

Interesting to see why LCS(i,j) is 0 when either i or j is 0? Because the longest common subsequence in two strings, when one string is empty is 0.

Can we implement the recursive function?

    public int longestCommonSubsequence(String s1, String s2, int i, int j){

        //If any of the string is nul return 0
        if(s1 == null || s2 == null) return 0;

        //We reached at the end of one of the string
        if(i == s1.length() ||  j == s2.length())
            return 0;

        if(s1.charAt(i) ==  s2.charAt(j)){
            return  1 + longestCommonSubsequence(s1, s2, i+1, j+1);
        }

        return Integer.max(longestCommonSubsequence(s1, s2, i+1, j),
                longestCommonSubsequence(s1, s2, i, j+1));

If we follow the execution cycle of the above code, we will see something like below

longest common subsequence lcs

It is evident from the partial tree that there are some problems which are solved again and again. This is the third hint (overlapping subproblems) that we can apply dynamic programming.

It will be more evident if you implement the recursive function with reverse traversal of the strings. In that implementation, the base case will be when one of the string is empty, and at that point, LCS of two strings will be 0. If we take a two dimensional table such that T[i][j] represents longest common subsequence till ith and jth characters of string S1 and S2 then T[0][i] = 0 and T[i][0] = 0.

T[i][j] = T[i-1][j-1] + 1 if X[i] = Y[j]
T[i][j] = max(T[i-1][j], T[i][j-1]) if X[i] != Y[j]

Dynamic programming implementation of LCS

package com.company;

/**
 * Created by sangar on 4.2.19.
 */
public class LongestCommonSubsequence {

    public int longestCommonSubsequenceDP(String s1, String s2){

        //If any of the string is nul return 0
        if(s1 == null || s2 == null) return 0;

        int len1 = s1.length();
        int len2 = s2.length();

        int[][] table = new int[len1+1][len2+1];

        for (int i=0; i<=len1; i++){
            for (int j=0; j<=len2; j++) {
                if (j == 0 || i == 0) {
                    table[i][j] =  0;
                }

                else if (s1.charAt(i-1) == s2.charAt(j-1)) {
                    table[i][j] = 1 + table[i - 1][j - 1];
                } else {
                    table[i][j] = Integer.max(table[i - 1][j], table[i][j - 1]);
                }
            }
        }

        return table[len1][len2];
    }
}

Above implementation has time and space complexity of O(n2). Please share if there is anything wrong or missing.