Scheduling weighted jobs

Scheduling weighted jobs

Suppose we have been give n jobs j1, j2,j3…jn with their start time s1,s2,… sn and finish time f1,f2, f3…fn. There is a value vi associated with each job. Problem is scheduling weighted jobs such all jobs are compatible and we get maximum value. Two jobs are said to be compatible, if there execution time do not overlap.

For example, we have four jobs as shown below:

scheduling weighted jobs

In above figure maximum value can be achieved by scheduling job 1 and job 4 which is value of 250. Notice that there one more schedule with compatible jobs (Job1, Job2 and Job 3), however, value we get by that schedule is only 170 which is less than what we got in earlier schedule.

Scheduling weighted jobs : Line of thoughts

There is strong urge to use greedy algorithm here, and problems is very similar to Interval Scheduling Algorithm. However, greedy algorithm works for this problem when value of all jobs is equal. Since value of jobs is different here, greedy algorithm fails.

Let’s consider brute force solution. First of all, sort all jobs based on finish time in increasing order. Now, for each job, decide if including it in schedule gives us maximum value or excluding it will give us maximum value. When we include a job, check if it is compatible with other jobs which are included in schedule. To determine compatibility quickly, we pre-calculate an array, called P such that

p(j) = largest index i < j such that job i is compatible with j.

For jth job or interval to be compatible with ith interval, start time of jth interval or job should be greater than end time of ith interval or job.

For example: p(8) = 5, p(7) = 3, p(2) = 0.

scheduling-weighted-jobs

Now, let’s say OPT(j) represents the maximum value which we gain by adding jobs from 1 to j. As mentioned above, there are two cases:

Case 1: OPT selects job j. In this case we can not use incompatible jobs {p(j) + 1, p(j) + 2, …, j – 1} and must include optimal solution to problem consisting of remaining compatible jobs 1, 2, …, p(j).

Case 2: OPT does not select job j. – must include optimal solution to problem consisting of remaining compatible jobs 1, 2, …, j-1

For case 1, we already have P[j] calculated. With P[j] already prepared, we know that we don’t have to check any job later than P[j] as all of them will be conflicting with current job. Recursive formula for calculating maximum value for n jobs will be:

OPT( j) = 0 if j = 0 
          max { vj + OPT( p(j) ), OPT(j-1)} otherwise

Scheduling weighted jobs : Recursive solution

package com.company;

import java.util.Arrays;

/**
 * Created by sangar on 4.5.18.
 */
public class ScheduleWeightedJobs {

    public static int optimalScheduling(Job[] jobs, int[] nonConflictJobs, int j){
        if(j == -1){
            return 0;
        }

        return Integer.max(optimalScheduling(jobs, nonConflictJobs, nonConflictJobs[j]) + jobs[j].getValue(),
                            optimalScheduling(jobs, nonConflictJobs, j-1));
    }

    public static void main(String[] args) {

        Job[] jobs = new Job[4];
        jobs[0] = new Job(1, 3, 50);
        jobs[1] = new Job(3, 5, 20);
        jobs[2] = new Job(6, 9, 100);
        jobs[3] = new Job(3, 12, 200);

        Arrays.sort(jobs, (o1, o2) -> o1.getEndTime() - o2.getEndTime());

        int[] nonConflictingJobs = new int[jobs.length];

        for (int j = 0; j < jobs.length; j++) {
            nonConflictingJobs[j] = -1;
            for(int i = j-1; i >= 0; i--) {
                if(jobs[i].getEndTime() <= jobs[j].getStartTime()) {
                    nonConflictingJobs[j] = i;
                    break;
                }
            } 
        }

        int maxValue = optimalScheduling(jobs,nonConflictingJobs, jobs.length-1);

        System.out.println(maxValue);
    }
}

This recursive algorithm has exponential complexity as there are lot of subproblems which are calculated repeatedly. For example,
Schedule weighted jobs

Recursive execution tree for above problem would like
weighted jobs scheduling

If we revisit the problems there are two properties of this problem : First it is optimal substructure, which means, optimal solution to subproblem leads to optimal solution to bigger problem. Second, there are overlapping subproblems. From figure, we can see that there are subproblems which are being re-calculated. Typical way to avoid this repetition is to store solutions to subproblem, this method is called memoization. This is kind of a cache where results of subproblems are stored and looked into whenever required.

This is typical case of dynamic programming application.

scheduling weighted job : Dynamic programming implementation

package com.company;

import java.util.Arrays;

/**
 * Created by sangar on 4.5.18.
 */
public class ScheduleWeightedJobs {

    public static int optimalSchedulingDP(Job[] jobs, int[] nonConflictJobs){
        int[] optimalValue = new int[jobs.length];

        optimalValue[0] = jobs[0].getValue();

        for(int i = 1; i < jobs.length; i++){
            optimalValue[i] = Integer.max(optimalValue[nonConflictJobs[i]] + jobs[i].getValue(),
                                optimalValue[i-1]);
        }
        return optimalValue[jobs.length-1];
    }

    public static void main(String[] args) {

        Job[] jobs = new Job[4];
        jobs[0] = new Job(1, 3, 50);
        jobs[1] = new Job(3, 5, 20);
        jobs[2] = new Job(6, 9, 100);
        jobs[3] = new Job(3, 12, 200);

        Arrays.sort(jobs, (o1, o2) -> o1.getEndTime() - o2.getEndTime());

        int[] nonConflictingJobs = new int[jobs.length];

        for (int j = 0; j < jobs.length; j++) {
            nonConflictingJobs[j] = -1;
            for(int i = j-1; i >= 0; i--) {
                if(jobs[i].getEndTime() <= jobs[j].getStartTime()) {
                    nonConflictingJobs[j] = i;
                    break;
                }
            }
        }

        int maxValue = optimalSchedulingDP(jobs,nonConflictingJobs);

        System.out.println(maxValue);
    }
}

Run time complexity of dynamic programming approach is O(n2). Sorting takes O(n log n) and calculation of maximum value takes O(n2).
If we have pre-sorted input based on finish time, then this approach takes only O(n). Note that we need additional O(n) space for storing results of subproblems.

How about finding the solution itself, means to find which jobs are actually give us optimal value? This requires some post processing. Algorithm is as follows

Find-solution(j) : 
 if (j = 0) output nothing 
 else if (vj + Table[P(j)] > Table[j-1]) print j 
     Find-Solution(p(j)) 
 else Find-Solution(j-1)

Please share if there is something wrong or missing. If you are interested in contributing to algorithms and me, please drop a mail

Box stacking problem 

Consider, we have been given 3-Dimensional boxes. Each box has width, depth and height (wi, di, hi). Box stacking problem is to stack these boxes in such a way that we achieve maximum height. There is one condition that is attached to it: A box can be placed on top of another only if both it’s base dimensions width and depth are less than a box on which it stacked on. There is no restriction on height, a tall box can be placed on a short box.
box stacking problem

With conditions in place, with given n boxes, we are actually, we are building a pyramid of boxes with maximum height.

This problem is closely related to longest increasing subsequence.

Recurrence relation for box stacking problem

Solution involves understanding of rotation aspects of the boxes. To avoid this aspect affect our solution, we list down all rotations of boxes as individual boxes. Therefore, for each box there are three representations. For example, for a box with dimensions a,b,c such that a>b>c

representation 1 : h=a, w=b, d=c>; 
representation 2 : h=b, w=a, d=c; 
representation 3 : h=c, w=a, d=b

Without losing generalization, we can avoid representation where wi < di. Now that we have three representations for each box, our input space increases to 3XN and the solution will be using these 3N boxes. There is another catch here. This solution works only when there are multiple instances of each box and we can use two different orientations of the same box while fetching maximum height.

Finding the sort order

Another problem is these boxes which are given to us are not ordered in any form. However, to stack boxes, we need to consider them in some order. As height does not affect stacking order, we can ignore it. Now, we have to consider only two dimensions.

Let’s say, we order boxes on the base area in decreasing order. How does it work? Consider two boxes with different base areas. It is impossible for a box with a larger base area to be stacked on top of a box with a smaller base area. There are only two dimensions, so at least one must be larger than the corresponding dimension smaller base area box. Therefore, if a box within our sequence can’t be placed on top, no box with a greater area can be placed on top either.

Let H(i) be the height of the stack of boxes 1,2,3,4…i. Modeling recurrent relation for H(i), put box i on a box j such that wi < wj and di < dj and H(j) is maximum for all j less than i.

H(i) = max(H(i), H(j) for all j < i such that wi < wj && di < dj ) + hi

Finally, output will be the maximum of all H[i].

Show me dynamic programming implementation

#include <iostream>
#include <algorithm>
using namespace std;

typedef struct box {
    int width;
    int depth;
    int height;
} Box;

bool boxComparator(Box b1, Box b2) {
    return ( b1.depth * b1.width > b2.depth * b2.width );
}
 
int findMaxHeightBoxStack(Box boxes[], int n)
{
    int H[n];
    for(int i=0; i<n; i++){
        H[i] = boxes[i].height;
    }
    for(int i=1; i<n; i++){
    	for( int j=i-1; j>=0; j--){
    		if(boxes[j].width > boxes[i].width 
    		   && boxes[j].depth > boxes[i].depth 
    		   && H[j] + boxes[i].height){
    		   	H[i] = H[j] + boxes[i].height;
    		}
    	}
    }
	
	int maxHeight = 0 ;
	for(int i=0; i<n; i++){
		if(maxHeight < H[i]){
			maxHeight = H[i];
		}
	}
	return maxHeight;
}

int boxStacking(Box boxes[], int n)
{
	
	Box orientations[3*n]; //for rotations
	int index = 0;
	for(int i=0; i<n; i++){
		orientations[index] = boxes[i]; // first one as such
		index++;
		
		orientations[index].height = boxes[i].width;
		orientations[index].width = max( boxes[i].height, boxes[i].depth) ;
		orientations[index].depth = min( boxes[i].height, boxes[i].depth);
		
		index++;
		orientations[index].height = boxes[i].depth;
		orientations[index].width = max( boxes[i].height, boxes[i].width) ;
		orientations[index].depth = min( boxes[i].height, boxes[i].width) ;
		index++;
	}
	n = 3*n;

    sort(orientations, orientations+n, boxComparator);
	return findMaxHeightBoxStack( orientations, n);
}
 
// Driver program
int main()
{
    Box boxes[] = { {4, 6, 7}, {1, 2, 3}, {4, 5, 6}, {10, 12, 32} };
    int n = sizeof(boxes)/sizeof(boxes[0]);
    cout << "Maximum height is " << boxStacking(boxes, n);
    return 0;
}

Implementation is quite simple, we need one dimension array H[]. These boxes are already sorted by area in decreasing order.

The complexity of the algorithm to find maximum height is O(n2) and space complexity is O(n).

This problem can be extended by putting boxes with K dimensions instead of 3 dimensions. Then also, the approach would be the same only number of orientations will change.

Please share if there is something is wrong or missing. If you want to contribute to algorithms and me, please contact us, we would love to hear from you.

Reference: https://people.cs.clemson.edu/~bcdean/dp_practice/dp_5.swf

Longest Increasing Subsequence in O(nlogn)

In the last post, longest increasing subsequence, we discussed brute force and dynamic programming based solutions. The complexity of the brute force solution is exponential whereas for the dynamic programming approach it is O(n2). Question is – Can we find the longest increasing subsequence in O(nlogn) complexity?

Let’s revisit the problem statement: Given an array of integers, find the length of the longest increasing subsequence. An increasing subsequence contains elements A[i] and A[j] only if i < j and A[i]A[j].
For example,

Input:
[2,4,5,3,1,6,7], 
Output:
5
Explanation:
The increasing subsequences are [2,4,5,6,7], [2,3,6,7], [1,6,7] and many more. The longest subsequence here has a length of 5.

The basic idea behind the solution is to keep track of all active subsequences at a given point in time. Based on the current number being considered, update these active lists. To understand this process, let’s work out an example.

A = {2,8,7}
Monotonically increasing subsequences are {2,8} and {2,7}

What if we add another element, 11 in this?

A = {2,8,7,11}
Monotonically increasing subsequences are {2,8,11} and {2,7,11}

What if new element 9 is added to array? What happens now? If we add it t0 subsequences, the length of the longest subsequence remains 3.

A = {2,8,7,11,9}
Monotonically increasing subsequences are {2,8,9} and {2,7,9}

The decision to take for each element being considered is whether we create new active subsequences with length 3 with element 9 in them or continue with 11. If the next element is 10 we know that adding 9 to subsequence leads us to longer subsequences rather than keeping 11.

How do we decide when to replace and when to continue with the old element in the list of subsequences?
We add a new number A[i] to the sequence if A[i] > E, E is the last element in subsequence
and replace an number with A[i], if there exists a number A[j] such that if E > A[i] < A[j], it means, the new number falls somewhere between A[j] and E.

What if A[i] is smaller than all elements in the present list of subsequences? In this case, we have to create a new list and add A[i] into it. The invariant is to maintain lists of increasing sequences and update them based on the next number.
Each time a new element is to be added, scan all the lists of subsequences in decreasing order of their length. The following algorithm shows how to add/replace the new elements in the existing lists or to create a new list with it.

1. If A[i] is the smallest among all end candidates of active lists, start a new active list with A[i] of length 1.
2. If A[i] is largest among all end candidates of active lists, clone the largest active list, and append A[i] to it.
3. If A[i] is in between, find the list with the largest end number that is smaller than A[i]. Clone and append A[i] to this list.
4. Discard all other lists of the same length as that of this modified list.

LIS in nlogn example

Let’s take an example and see how it works with an array A = [ 0, 8, 4, 12, 2, 10, 6, 14].

For A[0], there are no active lists of subsequences, we will create a new one.
longest increasing subsequence in logn

Next, we go to A[1] which is 8. A[i] is greater than the ends of all the current lists, we will take the longest one and append A[1] to it.
longest increasing subsequence in logn

For A[2] with value 4, A[i] is less than the end of one of the list and greater than the end of other. We will find the list which has end less than A[i], in this case, the first list containing [0]. Clone it and append A[2] to it and discard all other lists of the same length.

For A[3] with value 12, it is the same case as A[1] since it is greater than all the ends of the current lists, we will clone the longest available list and append it to that.

A[4] with value 2, it has the same case as A[2], Clone the one with largest end which is less than A[4], append A[4] to it and discard all same length lists.

A[5] with value 10. Same as A[4]. Clone, extend, and discard all the same length subsequences.
Lists = [ [0], [0, 2], [0,2,10] ] and [0, 4, 12] is discarded.

A[6] is 6. Same as A[5] We will clone the list which has end smaller than A[6], extend it, and discard all other lists which have the same length.
Lists = [ [0], [0, 2], [0,2,6] ] and [0, 2, 10] is discarded.

Following the same approach, we will go through all the numbers in the given array. The longest increasing subsequence in the given array is [ 0,2,6,14] with a length of 4.
lis in nlogn

It seems like a lot of things need to be done just for maintaining the lists and there is significant space complexity required to store all of these lists. We can optimize on this, observe that we use only ends of the list and their sizes. We do not care what was prior to them in list. So, can we store the ends of all the lists of an auxiliary array and do operations on them? Size of this array in worst case will be n.

To append to the list, add another element in the auxiliary array. To replace just overwrite the smallest number which is greater than the current number. To find the smallest number which is greater than the current number, we can use binary search algorithm.

To find the length of the longest subsequence, keep track of the length of the auxiliary array because this will be the length of LIS.

Show me implementation of longest increasing subsequence in O(nlogn)

    public int lengthOfLIS(int[] nums) {
        
        if(nums == null || nums.length == 0) return 0;
        
        int [] dp = new int[nums.length]; 
        int len = 0;
        
        for(int num : nums){
            int index = Arrays.binarySearch(dp, 0, len, num);
            
            if(index < 0)
                index = -(index+1);
            dp[index] = num;
            
            if(index == len)
                len++;
        }
        
        return len;
    }

The complexity of this algorithm is O(nlogn) as for each element in the array, it requires O(logn) time to find the ceiling of it and put it at the correct position.

This article has taken some inspiration from: http://stackoverflow.com/questions/6129682/longest-increasing-subsequenceonlogn and the comments provided by readers under these articles.

What are the problems you can solve with the longest increasing subsequence?
1. Russian doll envelopes.
2. Box stacking problem.
3. Bridges across the river.

Please share if you find something wrong or missing. Also, if you want to contribute to the website, please refer to Publishing and contact us. We would love to publish your article and at the same time, will pay you too.

Longest alternating Subsequence

In this post, we will discuss another dynamic programming problem called the longest zigzag subsequence which can be solved using dynamic programming.

A sequence of numbers is called a alternating sequence if differences between successive numbers strictly alternate between positive and negative value. In other words, alternate subsequence is where elements of subsequence are alternate increasing and decreasing order, means, they satisfy below conditions:

x1 < x2 > x3 < x4 > x5 < ….  x2 < x3 > x4 < x5 > …. xn

A sequence with fewer than two elements is trivially a zigzag subsequence.

For example, 1,9,3,9,1,6 is a zigzag sequence because the differences (8,-6,6,-8,5) are alternately positive and negative. In contrast, 1,6,7,4,5 and 1,9,4,4,5 are not zigzag sequences, first sequence is not because its first two differences are positive and second because its last difference is zero.
Coming to the problem of the day: Given an array of integers, find longest alternating subsequence.

We have already seen a similar problem longest increasing subsequence in an array. That problem is solved using a dynamic programming approach. To apply dynamic programming, we need to properties: first, Optimal subproblem structure, that is the solution of the original problem depends on the optimal solution of subproblem; and second, overlapping subproblems, so that we can save computation by memoization.

Do these two properties exist in this problem? Does the longest zigzag subsequence till length i has anything to do with the longest zigzag subsequence till j where j is less than i? Also, it is already clear that alternating subsequence can start with decreasing first and then increasing or increasing first and then decreasing.

To add ith as next element in subsequence, consider two cases. First, ith element can be greater than previous element in longest zigzag subsequence till j where j < i. In this case, we are looking for all such j where A[j] < A[i]. Another criterion for j should be that A[j] less than the previous element in the sequence, that means, at j, we are looking exactly opposite condition than that i.

Second, ith element can be less than previous element in longest zigzag subsequence till j where j < i. In this case, we are looking for all such j where A[j] > A[i]. Another criterion for j should be that A[j] is greater than the previous element in the sequence, that means, at j again, we are looking exactly opposite condition than that at i.
For each i we will store these two.

Let’s say increase[i] describes LZS, for the first case and decrease[i] describes it for the second case.

  increase[i] = max(decrease[j] + 1) for all j< i && A[j] < A[i]
  decrease[i] = max(increase[j] + 1) for all j< i && A[j] > A[i]

Longest alternating subsequence dynamic programming approach

Before going through the implementation, it will be great if you can go through Longest increasing subsequence using dynamic programming
Implementation wise, both increase and decrease array can be one two dimensional array Table[][]. Table[i][0] represents length of longest zigzag subsequence ending at i with A[i] being greater than A[j] for all j in earlier subsequences.

Similarly, Table[i][1] represents length of subsequence ending at i with A[i] being less than A[j] for all j in earlier subsequences.

Table(i,0) = max(Table(j,1) + 1); 
             for all j < i and A[j] < A[i] 
Table(i,1) = max(Table(j,0) + 1); 
             for all j < i and A[j] > A[i]

What will be length of longest zigzag subsequence for index i?

Result =  max (Table(i,0), Table(i,1))

Click here to see longest alternating subsequence implementation

#include <stdio.h>
#include <stdlib.h>
 
int max(int a, int b) {  return (a > b) ? a : b; }
 
int longestZigzagSubsequence(int A[], int n)
{
    int Table[n][2];
 
    for (int i=0; i<n; i++){
    	Table[i][0] = 1; 
    	Table[i][1] = 1;
    }
 
    int result = 1;
 
    for (int i=1; i<n; i++) {
        for (int j=0; j<i; j++){
        	// If A[i] is greater than last element in subsequence, 
        	//then check with Table[j][1]
        	if (A[j] < A[i] && Table[i][0] < Table[j][1] + 1)
                    Table[i][0] = Table[j][1] + 1;
                /* If A[i] is smaller than last element in subsequence,
                then check with Table[j][0] */
                if( A[j] > A[i] && Table[i][1] < Table[j][0] + 1)
                   Table[i][1] = Table[j][0] + 1;
        }
 
        /* Pick maximum of both values at index i  */
        if (result < max(Table[i][0], Table[i][1]))
            result = max(Table[i][0], Table[i][1]);
        printf("\n %d", result);
    }
 
    return result;
}
Complexity of dynamic programming approach to find longest alternate subsequence is O(n2) using O(n) extra space.

Please share if there is something wrong or missing. If you want to contribute to website, please contact us.

Longest Substring Without Repeating Characters

Given a string, find the longest substring without repeating characters in it. For example,

Input:
S = "abcaabaca" 
Output:
3
Explanation:
The longest substring without repeating characters will be "abc"

Input: 
"bbbbb"
Output:
1
Explanation:
The answer is "b", with a length of 1.

A brute force solution will be to scan all substrings of the given string and check which one has the longest length and no repeating characters. For a string with size n, there will be n * (n-1) substrings, and to check it each for unique characters, it will take n comparison in the worst case. So, the worst-case complexity of this algorithm is O(n3) with additional space of O(n). The code is simple enough.

package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 1.1.18.
 */
public class NonRepeatingCharacters {

     boolean allUniqueCharacters(String s, int start, int end) {

        HashMap<Character, Boolean> characters = new HashMap<>();

        for (char c : s.substring(start, end).toCharArray()) {
            if(characters.containsKey(c)) return false;
            characters.put(c, Boolean.TRUE);
        }
        return true;
    }

    int longestSubstringWithoutRepeatingCharacters(String s) {
        int len = s.length();
        int maxLength = 0;
          
        for (int i =0; i < len; i++){
            for (int j=i+1; j<len; j++){
                int length = j-i;
                if (allUniqueCharacters(s, i, j)){
                    maxLength = Integer.max(maxLength, length);
                }
            }
        }
        return maxLength;
    }

    public static void main(String[] args) {
        String s = "abcdabcbb";
        System.out.println(longestSubstringWithoutRepeatingCharacters(s));
    }
}

Sliding window approach

A sliding window is an abstract concept commonly used in array/string problems. A window is a range of elements in array/string which defined by start and end indices. A sliding window is a window which “slides” its two boundaries in a certain direction.
Read fundamentals and template for a sliding window to understand more about it and how it is applied to problems.

In the brute force approach, we repeatedly checked each substring for unique characters. Do we need to check each substring? If a substring s[i,j-1] contains non repeating characters, while adding jth character, check if that character is already present in substring s[i,j-1]. Since we scan substring to ascertain the uniqueness of new characters, the complexity of this algorithm is O(n2).

How about optimizing the scanning part? What if a hash is used to store characters which are already seen in substring s[i,j-1]. In that case, checking the uniqueness of a new character is done in O(1) and overall algorithm complexity becomes linear.

 public  static int longestSubstringWithoutRepeatingCharacters(String s) {
        int len = s.length();
        HashMap<Character, Boolean> characters = new HashMap<>();

        int maxLength = 0;
        int start = 0;
        int  end = 0;
        while (start < len && end < len) {
            //Check only the last character.
            if(!characters.containsKey(s.charAt(end))){
                characters.put(s.charAt(end), Boolean.TRUE);
                end++;
            }
            else {
                int currentLength = end-start;
                maxLength = Integer.max(maxLength, currentLength);
                //Move start of window one position ahead.
                characters.remove(s.charAt(start));
                start++;
            }
        }
        return maxLength;
    }

If a character already present in substring s[i,j-1], that means, it cannot be added to the longest substring. Find the length of substring (j-i) and compare it with the current maximum length. if it is greater, the max length of the longest substring without repeating characters is (j-i).
At last move the window to the position of duplicate.

Below is an example execution of the above code.
longest substring without repeating characters

Longest substring without repeating characters : 3

There is a small optimization that helps us to skip more characters when repeating character is found instead of skipping one at a time. Store the index of each character seen in substring [i,j-1].  While processing jth character, if it is already in the hash, we know the index k where that character is in the string. There is no way that any substring can contain unique characters till k and j are in it. So, we skip all indices from i to k and start from k+1 instead of i+1 as in the above method.

Show me the optimized code

  public static int longestSubstringWithoutRepeatingCharacters3(String s) {
        int len = s.length();
        HashMap<Character, Integer> characters = new HashMap<>();

        int maxLength = 0;

        for (int start=0, end = 0; end <len; end++) {
            if (characters.containsKey(s.charAt(end))) {
                //find the index of duplicate character.
                int currentIndex = characters.get(s.charAt(end));
                start = Integer.max(currentIndex, start) + 1;
            }
            int currentLength = end - start;
            maxLength = Integer.max(maxLength, currentLength);
            //Update new location of duplicate character
            characters.put(s.charAt(end), end );
        }
        return maxLength;
    }

Complexity of find longest substring without repeating characters is hence O(n) with additional space complexity of O(n).
Please share if something is wrong or missing. We would love to hear from you.

Minimum cost path in matrix

Given a 2D matrix, Cost[][], where Cost[i][j] represent cost of visiting cell (i,j), find minimum cost path to reach cell (n,m), where any cell can be reach from it’s left (by moving one step right) or from top (by moving one step down).
For example, to reach (3,3) would be 16 by following path = ((0,0), (1,0), (2,0), (3,0), (3,1), (3,2), (3,3))

minimum cost path

Minimum cost path : line of thoughts

This problem is similar to Finding possible paths in grid. As mentioned there, the grid problem reduces to smaller sub-problems once choice at the cell is made, but here move will be in the reverse direction. To find minimum cost at cell (i,j), first find the minimum cost to the cell (i-1, j) and cell (i, j-1). Take the minimum of those two and add the cost of the cell (i,j) which will be the minimum cost to reach (i,j).
Solution(n) = Cost of choice + Solution(n-1).

CostToMove(i,j) = Min(CostToMove(i-1,j), CostToMove (i, j-1)) + Cost(i,j)

The above equation can be implemented in recursive function. What should be the terminating condition for recursion? It’s obvious, at starting cell i.e i=0 and j=0.

findCost(i,j, cost) = cost(i,j) + Min( findCost(i-1,j, cost), findCost(i,j-1, cost))

Recursive implementation

#include <stdio.h>
#include <stdlib.h>

int min( int a, int b){
    if(a > b) return b;
    return a;
}

int findMinimumCostPath(int Cost[3][3], int i, int j){
	
    if( i == 0 && j == 0 )
        return Cost[0][0];
	
    if( i == 0 )
	return findMinimumCostPath(Cost,i, j-1)
               + Cost[i][j];
    if( j == 0) 
    	return findMinimumCostPath(Cost,i-1, j)
               + Cost[i][j];
    	
    return min(findMinimumCostPath(Cost,i-1, j), 
    		   findMinimumCostPath(Cost,i, j-1)
                   + Cost[i][j]);
}
int main()
{
    int M,N; 
    
    M = N = 3; 
    int Cost[3][3] = {
        1,3,4,
    	5,3,2,
        3,4,5
    };
    printf("Minimum cost of path : %d" ,
         findMinimumCostPath(Cost, M-1, N-1));
    
}
Another way to implement the same function in Java

    private int minPathSumUtilRec(int[][] grid, int i, int j){
        
        int m = grid.length;
        int n = grid[0].length;
        
        if(i>m-1 || j > n-1 ){
            return Integer.MAX_VALUE;
        }
        
        if(i == m-1 && j == n-1 ) 
            return grid[i][j];
    
        return Math.min(minPathSumUtilRec(grid, i+1, j), 
                        minPathSumUtilRec(grid, i, j+1)) + grid[i][j];
        
    }

This solution exceeds the time limit on leet code submission for Minimum cost path in the matrix, because we are following each possible path. The number of paths in the matrix with given conditions is exponential and hence, the complexity of the recursive method is exponential too. Can we do better than that?

We saw that the problem reduces to subproblem at every cell and the optimal solution to a bigger problem depends on the optimal solution of subproblem, which is known as Optimal sub-structure. This is one of the conditions to apply dynamic programming.

There are several subproblems which are solved again and again in recursive solution i.e there are overlapping subproblems. How can we avoid re-solving subproblems? The first immediate thing we can do it to store the result of already solved problems and use them when required. Below implementation uses a two-dimensional table to store the minimum cost to reach cell(i,j) and uses it when we have to solve the problem for cell(i,j).

Top down approach using cache

    //Top down approach
    private int minPathSumUtilTopDown(int[][] grid, int i, int j, int[][] table){
        
        int m = grid.length;
        int n = grid[0].length;
        
        if(i>m-1 || j > n-1 ){
            return Integer.MAX_VALUE;
        }
        
        if(i == m-1 && j == n-1 ) 
            return grid[i][j];
        
        //If solution is already present, return it
        if(table[i][j] < Integer.MAX_VALUE) return table[i][j];
    
        table[i][j] = Math.min(minPathSumUtilTopDown(grid, i+1, j, table), 
                      minPathSumUtilTopDown(grid, i, j+1, table))
             + grid[i][j];
        
        return table[i][j];
        
    }

The above solution falls within the time limit when tested on Leetcode.

Another approach is to use the bottom-up approach when we start with a solution to the smaller problem and try to find a solution to the bigger problem. Create a 2-dimensional array to save solutions of subproblem. Each cell M[i][j] will store the minimum cost path until cell (i,j).

Topmost row in the array is peculiar as any cell in that row can be reached only from the left cell.

MinCost(0,j) = MinCost(0,j-1) + Cost[0][j]

Similarly, cells in the leftmost column can only be reached from the top cell.

MinCost(i,0) = MinCost(i-1,0) + Cost[i][0]

For all other cells,

MinCost(i,j) = Min( MinCost(i-1),j), MinCost(i, j-1)) + Cost[i][j]

Since, the solution of (i-1, j) and (i, j-1) is a prerequisite for the solution of (i,j), this filling method is called bottom-up.

Minimum cost path: Dynamic programming implementation

#include <stdio.h>
#include <stdlib.h>

int min( int a, int b){
    if(a > b) return b;
    return a;
}

int findMinimumCostPath(int Cost[3][3], int M, int N){
    //declare the minCost matrix	
    int MinCost[M][N]; 

    MinCost[0][0] = Cost[0][0];

    // initialize first row of MinCost matrix
    for (int i=1; i<N; i++){
        MinCost[0][i] = MinCost[0][i-1] + Cost[0][i];
    }

    for (int i=1; i<M; i++){
        MinCost[i][0] = MinCost[i-1][0] + Cost[i][0];
    }
    
    for (int i=1;i<M; i++){
    	for (int j=1; j<N; j++){
           MinCost[i][j] = min(MinCost[i-1][j],
                           MinCost[i][j-1]) + Cost[i][j];
        }
    }

    return MinCost[M-1][N-1];
    
}

Complexity of dynamic programming approach to find minimum cost path in grid is O(n2) with additional space complexity of O(n2).

Extend this problem by actually finding a path that leads to the destination. The solution is simple, start from destination cell, as that will be part of the final path anyways, start moving either to a cell to left or top of the cell, whichever is less till you reach origin cell.

One more variant of this problem is adding flexibility that one can move from left to right, top to down, and diagonally as well. Nothing changes in solution except take a minimum of three cells instead of two (left, top, and diagonal).

Please share if there is something wrong or missing. If you want to contribute, please write to us 

Boolean Parenthesization Problem

Boolean Parenthesization problem

Given a boolean expression, a string with True or False as operands and between each pair of operand,  there is boolean operator (and &, or | and xor ^). Find number of ways in which this Boolean expression can be parenthesized so that expression evaluates to True. This is known as Boolean Parenthesization problem. To understand problem better, let’s take some examples
Expression :

T ^ F & T

Two ways :

((T ^ F) & T) and (T ^ (F & T))

boolean parenthesization problem

T | T & F ^ T

Four ways :

((T|T)&(F^T)), (T|(T&(F^T))), (((T|T)&F)^T)
and (T|((T&F)^T))

boolean-parenthesization

Boolean Parenthesization problem : Line of thoughts

What will be the most trivial Boolean expression? Of course, an expression with only one Boolean value T or Boolean value F.

How many ways can this expression be parenthesized so that expression evaluates to True ? Apparently, there is only one way.

For Boolean value T, there is one way, (T); whereas for F, there no way we can parenthesize to evaluates True. An expression can evaluate to either True or False value.

Let’s say, T(i,j) is number of ways expression from i to j can be parenthesized so that it evaluates to True. Similarly, F(i,j) is number of ways expression evaluates to False. With base case, only one value either T or F is there, hence i=j, hence following equations hold true

T(i,i) = 1 if operand is T
         0 if operand is F

F(i,i) = 0 if operand is T
         1 if operand is F

How to calculate T(i, j) for expression with more than one values and operators between them?  This is something familiar to matrix chain multiplication problem. We will put parenthesis at all possible position and count how many ways these two resultant expressions hold True. Once we have count for each expression, we can combine count based on operator between split expression.

For expression from index i to index j, find k such that i<k<j, and find number of ways expressions from i to k and k+1 to j evaluates to True. Interesting, once these numbers are determined, number of ways for expression i to j can be calculated based on operator between expression i to k and k+1 to j.

When Boolean operator is & (AND)

When can expression (i,j) be True if expression is of form Expression(i, k) & Expression(k+1, j)?  Only if Expression(i,k) and Expression(k+1,j) are  both True. Hence, for any k, expression can be True in T(i,k) * T(k+1, j) where T(i,k) is number of ways Expression(i,k) is True and T(k+1, j) is number of ways Expression(j+1, j) is True. For all possible values of k, expression becomes

T(i,j)  = Summation ( T(i,k) * T(k+1,j)) for all k such that i < k < j

How about expression (i,j) being evaluates to False? Simple enough, one of the two expression should evaluate to False.

If Total(i,j) represents total number of ways an expression can be parenthesized irrespective of out being True or False, then

Total(i,j) =  Total(i,k) * Total(k+1, j)
or
Total(i,j) = T(i,j) + F(i,j)

If we take out number of ways an expression can parenthesized as True from Total, it gives number of ways it can be evaluates False. Hence, below equation

F(i,j) = Sum ( Total (i,j) - T(i,k)* T(k+1)) for all k for i< k< j
or
F(i,j) = Sum (Total(i,k) * Total(k+1, j) - T(i,k)* T(k+1) )

When Boolean operator | (OR)

In case, operator is OR, then, whole expression is True is any one of the expressions is True. How many ways both Exp(i,k) and Exp(k+1, j) be False.

Following the same logic from AND operator True, it can be derived that

F(i,j) = Summation (F(i,k)* F(k+1,j)) for all  i<k<j

Overall expression is True when both sub-expressions are not False. Hence.

T(i,j) = sum ( Total(i,j) - F(i,k)* F(k+1,j)) for k such i<k

In the same vein, T(i,j) and F(i,j) when operand is xor will be

T(i,j) = sum(T(i,k)*F(k+1,j) + F(i,k)* T(k+1,j)) for k such i<k

To find solution to Boolean parenthesis problem, find is T(1,N).

Implementation : Boolean parenthesization problem

package com.company;

/**
 * Created by sangar on 31.12.17.
 */
public class BooleanParenthesis {

    public static int calculateNumberOfWays(String operators, String operands){
        int numOperands = operands.length();


        int[][] F = new int[numOperands][numOperands];
        int[][] T = new int [numOperands][numOperands];

        for (int i=0; i<numOperands; i++){
            System.out.println(operands.charAt(i));
            F[i][i] = (operands.charAt(i) == 'F')? 1: 0;
            T[i][i] = (operands.charAt(i) == 'T')? 1: 0;
            System.out.println(T[i][i]);
        }

        for (int L=1; L<numOperands; L++) {
            for (int i=0; i<numOperands-L; ++i){
                int j = i+L;
                T[i][j] = F[i][j] = 0;
                for (int k=i; k<j; k++){
                    int totalIK = T[i][k] + F[i][k];
                    int totalKJ = T[k+1][j] + F[k+1][j];
                    if (operators.charAt(k) == '&') {
                        T[i][j] += T[i][k]*T[k+1][j];
                        F[i][j] += (totalIK *totalKJ - T[i][k]*T[k+1][j]);
                    }
                    if (operators.charAt(k) == '|'){
                        F[i][j] += F[i][k]*F[k+1][j];
                        T[i][j] += (totalIK*totalKJ - F[i][k]*F[k+1][j]);
                    }
                    if (operators.charAt(k) == '^'){
                        T[i][j] += F[i][k]*T[k+1][j] + T[i][k]*F[k+1][j];
                        F[i][j] += T[i][k]*T[k+1][j] + F[i][k]*F[k+1][j];
                    }
                }
            }
        }
        for(int i=0; i<numOperands; i++){
            for(int j=0; j<numOperands; j++){
                System.out.println("(" + i + "," + j + ") :"  + T[i][j]);
            }
        }
        return T[0][numOperands-1];
    }

    public static void main(String[] args) {

        String operands = "TTFT";
        String operators = "|&^";

        System.out.println("Number of ways to parenthisize expression : " +
                calculateNumberOfWays(operators, operands));

    }
}

Complexity of  dynamic programming approach to find ways to parenthesize a Boolean expression to evaluate it to True is O(n3). and space complexity is O(n2) .

Please share if there is something missing or wrong. If you want to contribute to algorithms and me and share your knowledge with thousands of learners across world, please contact us..

Matrix chain multiplication-dynamic programming

What is matrix chain multiplication in general? To read on that please refer to Wiki. However, today’s problem is not about actually multiplying chain of matrices, but to find out the optimal way to multiply them in order to minimize the number of scalar multiplications.

To be able to multiply two matrices, it is required that the number of columns in the first matrix is equal to the number of rows of the second matrix.

If we multiply a matrix of dimension M x N to another matrix N x P, we can a matrix of dimension M x P.

matrix chain multiplication

How many scalar multiplications need to be done to multiply M x N to N x P matrix? It’s M x N x P.

Given N matrices with dimensions, find the optimal way to multiply these matrices, in order to minimize the total number of scalar multiplications.

Matrix chain multiplication : line of thoughts

Before going further, let’s understand some basics of matrix multiplication

  1. Matrix multiplication is associative i.e.  A* (B*C) = (A*B) *C
  2. It is not commutative i.e  A * (B*C) not equal to A * (C * B)
  3. To multiply two matrices, they should be compatible i.e. no of columns in the first matrix should be equal to the number of rows of the second matrix. No of columns of first matrix = No. of rows of second matrix

Since matrix multiplication is associative, this problem actually reduces to figure out a way to put parenthesis around matrices so that the total number of scalar multiplications is least.
Let’s take an example and understand. We have matrices A, B, C and D with dimensions array as 10 x 20, 20 x 30, 30 x 40, 40 x 50 respectively.

How can we solve this problem manually?  Given chain of matrices is as ABCD. There are three ways to split the chain into two parts: (A) x (BCD) or as (AB) x (CD) or as (ABC) x (D).

Any which way, we have smaller problems to solve now. If we take the first split, cost of multiplication of ABCD is cost of multiplication A + cost of (BCD) + cost of multiplication of A x (BCD).

Similarly for rest two splits. The answer will be the minimum of all three possible splits.

To get cost of ABCD,  solve cost of BCD.  (BCD) can be split in two parts : (B) x (CD) or (BC) x (D).

We will continue with (B) x (CD). For the cost of (CD), splits in one way (C) x (D). Cost of which is nothing but M x N x P where C is matrix of dimension M x N and D is a matrix of dimension N x P.

Cost of  (BC) = M x N x P. Below figure shows the detailed calculation of each possible splits and final gives the answer as the minimum of all possible splits.

matrix chain multiplication brute force
Notice that for N matrices, there are N-1 ways to split the chain. This manual calculation can easily be implemented as a recursive solution.

Recursive implementation

package com.company;

/**
 * Created by sangar on 31.12.17.
 */
public class MCM {

    public static int matrixChainMultiplication(int[] P, int i, int j){
        int count = 0;
        int min = Integer.MAX_VALUE;

        System.out.println("("+ i + "," + j + ")");
        if(i==j) return 0; // No cost of multiplying zero matrix

        for(int k=i; k<j; k++){
            System.out.println("Parent : ("+ i + "," + j + ")");
            count = matrixChainMultiplication(P,i, k)
                    + matrixChainMultiplication(P, k+1, j)
                    +   P[i-1]*P[k]*P[j];

            min =  Integer.min(count, min);
        }

        return min;
    }

    public static void main(String[] args) {
        int arr[] = new int[] {1, 2, 3, 4, 3};
        int n = arr.length;

        System.out.println("Minimum number of multiplications is "+
                matriChainMultiplication(arr));
    }
}

While implementing, we will get the matrices as array P where P[i-1] and P[i] represent the dimension of the matrix i.

However, complexity of this implementation of the matrix chain multiplication is exponential in time and hence of no use for every large input. Also, if you look at the calculations tree, there are many sub-problems that are solved again and again. This is called overlapping subproblems. What can be done to avoid calculating subproblems again and again? Save it somewhere, using memoization.

time complexity of matrix chain multiplication

If we can save cost of multiplication of matrices i to j, we can refer it back when needed. This technique is called as memorization in dynamic programming.

Cost of multiplying matrices Ai to Aj  is the cost of

Cost (Ai, Aj) = Cost(Ai,Ak) + Cost(Ak+1,Aj )+(P[i-1] * P[k] * P[j])

The idea is to find out K such that cost(Ai, Aj) becomes minimum. If M[i,j] represents the cost to multiply matrix i to matrix j, then,

M[i,j]  = M[i,k] + M[K+1,j] + ((P[i-1] * P[k] * P[j])

When calculating M[i,j]; M[i,k] and M[k+1,j] should be already available, this is called bottom-up filling of a matrix. Also, M[i,i] = 0 as cost of multiplying a single matrix will be 0.

Since, we are taking the bottom-up approach to fill our solution matrix, start calculating by grouping two matrices at a time, then 3 and then 4 till we reach n matrices chain. We start with length L= 2 and go on to solve the table entry for length N. M[1, N] will give us the final cost.

To find minimum cost(i,j), we need to find a K such that expression

Cost (Ai, Aj) = Cost(Ai,Ak) + Cost(Ak+1,Aj )+(P[i-1] * P[k] * P[j])

becomes minimum, hence

M[i,j] = min (M[i,j], (M[i,k] + M[k+1,j], P[i-1] * P[k] * P[j]))

Matrix chain multiplication leetcode implementation

package com.company;

/**
 * Created by sangar on 31.12.17.
 */
public class MCM {

    public  static  int matriChainMultiplicationDP(int[] P){
        int n = P.length;

        int[][] M = new int[n][n];

        for(int i=0; i<n; i++){
            for(int j=0; j<n; j++){
                M[i][j] = 0;
            }
        }

        for(int L=2; L<n; L++){
            /* For every position i, we check every chain of len L */
            for(int i=1; i<n-L+1; i++){
                int j = i+L-1;
                M[i][j] = Integer.MAX_VALUE;

                /* For matrix i to j, check every split K */
                for(int k=i; k<j; k++){
                    int temp = M[i][k] + M[k+1][j] + P[i-1] * P[k] * P[j];
                    /* Check if the current count is less than minimum */
                    M[i][j] = Integer.min(temp, M[i][j]);
                }
            }
        }

        return M[1][n-1];
    }
    public static void main(String[] args) {
        int arr[] = new int[] {1, 2, 3, 4, 3};
        int n = arr.length;

        System.out.println("Minimum number of multiplications is "+
                matriChainMultiplicationDP(arr));
    }
}

Let’s run through an example and understand how does this code work?

P = {10, 20,30,40,30}, 
dimensions of matrix [1] = 10X20, 
dimensions of matrix [2] = 20X30, 
dimensions of matrix [3] = 30X40,
dimensions of matrix [4] = 40X30

Important to understand here is what does M[i][j] represent? In our case, M[i][j] represent the minimum cost to multiply a chain of matrices from matrix i to matrix j.
With this representation, we can safely say that M[i][i] is 0, as there is no cost to multiply only one matrix.

Start with for loop with L=2.

complexity of matrix chain multiplication
leetcode matrix chain multiplication

M[1, N-1] will be the solution to the matrix chain multiplication problem.

Time complexity of matrix chain multiplication using dynamic programming is O(n2). Also space complexity is O(n2).

Reference
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Dynamic/chainMatrixMult.htm

If you want to contribute to the website, please contact us. Please share if there is something wrong or missing.

Longest common substring

Longest Common Substring

Given two string A and B, find longest common substring in them. For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”. Below figure shows longest common substring.

longest common substring

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m).

Can we do better than that?

Longest common substring : Line of thoughts

We have to find longest common substring in strings of length M and length N. Can we find longest common substring till length M-1 and N-1 and then derive longest common substring for M and N?  Yes, we can find. The length either grows by one if last characters are equal or reset to zero if last characters are not equal. Why so?

First see why we need to reset to zero when characters are different. This because we are looking for common substring which means characters should be consecutive, any different character restart the the entire search because with those two  different characters, there can’t be any common substring.

What if characters are same? In that case we increment by one, because, longest common substring in N-1 and M-1 would be either 0 or some number based on how any consecutive common characters were till N-1 and M-1.

What will be longest common substring when one of the strings is empty? It will be zero.

So, do you see recursion here? So, let’s write recursion relation and then implement it.

LCS(i,j) = 1+LCS(i-1, j-1) if S[i] = T[j] 
         =  0 otherwise

This recursion relation has optimal subproblem property that solution to the problem actually depends on solutions to subproblems. Also, there are subproblems which will be calculated again and again, which is called overlapping subproblems. These two properties are required for dynamic programming. To not to calculate subproblems, we will use memoization, for that  create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j]. And since solution for i-1 and and j-1 is required before solution of i and j, this matrix will be filled bottom up.

Longest common substring using dynamic programming

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 
   1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
( Because new character is added to already common substring, 
     if any, till A[0...i-1] and B[0,,j-1])
   1.2 if both characters are not same, LCS[i][j] = 0,
       ( Because if characters are not same, there cannot be any
         common substring including A[i] and B[j].

Implementation

#include <stdio.h>
#include <string.h>

int max(int a, int b){
	return a>b ? a:b;
}
 int longestCommonSubstring(char * A, char * B){
     int lenA = strlen(A);
     int lenB = strlen(B);
     int LCS[lenA+1][lenB+1];

     for (int i=0; i<= lenA; i++){
         LCS[i][0] = 0;
     }

     for (int j=0; j <= lenB; j++){
         LCS[0][j] = 0;
     }
	
     int maxLength = 0;
     for (int i=1; i<= lenA; i++){
        for (int j=1; j <= lenB; j++){
            if (A[i] == B[j]){
                LCS[i][j] = 1 + LCS[i-1][j-1];		
                maxLength = max( maxLength, LCS[i][j] );
            } 
            else {
               LCS[i][j] = 0;
            }
         }
     }
     return maxLength;
}

int main(void) {
    char *a = "ABCDEFGSE";
    char *b = "EBCDEFGV";
	
    printf("\n Longest common substring : %d",
			longestCommonSubstring(a,b));
    return 0;
}
package com.company;

/**
 * Created by sangar on 5.1.18.
 */
public class LCS {

    public  static int longestCommonSubstring(String A, String B){
        int lenA = A.length();
        int lenB = B.length();

        int [][] LCS = new int[lenA][lenB];

        for (int i=0; i<lenA; i++){
            LCS[i][0] = 0;
        }

        for (int j=0; j<lenB; j++){
            LCS[0][j] = 0;
        }

        int maxLength = 0;
        for (int i=1; i<lenA; i++){
            for (int j=1; j<lenB; j++){
                if (A.charAt(i) == B.charAt(j)){
                    LCS[i][j] = 1 + LCS[i-1][j-1];
                    maxLength = Integer.max(maxLength, LCS[i][j]);
                }
                else {
                    LCS[i][j] = 0;
                }
            }
        }

        for (int i=0; i<lenA; i++){
            System.out.println();
            for (int j=0; j<lenB; j++){
                System.out.print(" " + LCS[i][j]);
            }
        }
        return maxLength;
    }

    public static void main(String[] args) {
	    String a = "ABCDEFGS";
	    String b = "EBCDEFG";

        System.out.println("Longest common substring :" +
                longestCommonSubstring(a,b));
    }
}

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

longest common substring dynamic programming

In next post, we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Please share if you find something wrong or missing. If you want to contribute to site, please refer contact us. We would be happy to publish your work and in turn will pay you too.

Interleaved string

Interleaved string

Given string A,B and C, find if string C is interleaved string of A and B.

C is said to be interleaved if it contains all characters of A and B and order of characters in respective string is maintained. For example string C in figure is interleaved string of A and B

Interleaved string : Line of thoughts

Consider length of C as length of A + length of B. If it is not true return false (why?). Let’s say we start with first character of C, A and B. If they match, move to second character of C and A. Keep B at first character.
If above condition is not true, check if first character of C and B match, then move to second character C and B. A remains at first character.
Once done, again do above steps with new first characters of strings, while character in C matches character in A or B.
If both above conditions are false, return false.

Now conditions problem reduces to smaller problem with C with length N-1, one of A or B with M-1.
From above description, we can figure out that recursion can be used to solve this problem.

isInterleaved(A,B,C)  = isInterleaved(A+1, B, C+1) If character of C matches with character of A
|| isInterleaved(A, B+1,C+1) If character of C matches with character of B

What shall be the base case?
If we reach at the end of C, we have considered all characters, we can return true if all characters in other two strings are considered. If not returned false.

Recursive implementation of interleaved string of two strings

#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 

int isInterleaved(char *c, char *a, char *b){

    if(!(*c) && !(*a) && !(*b))
        return true;

    if(*c == '\0'){
        return false;
    }
	 // if character of a and c match
    return ((*c == *a) && isInterleaved(c+1,a+1,b)) || 
    		// if character of b and c match
            ((*c == *b) && isInterleaved(c+1,a,b+1)); 
}

int main(void) {
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Iterative implementation

#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 

int isInterleaved(char *c, char *a, char *b){

    while(*c != '\0'){
    	if(*c == *a){
            a++;
        }
        else if(*c == *b){
            b++;
        }
        else{
            return false;
        }
        c++;
    }
    if(*a != '\0' || *b != '\0')
        return false;

    return true;
}

int main(void) {
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Iterative implementation will not work with input where there are common characters in string A and B, for example A = XXY and B = XXZ and if C = XXZXXY will return false

Complexity of above code will be linear O(N) N being length of string C, where as complexity of recursive solution will b O(2N) but it does not fail in above mentioned case.

Dynamic programming approach for interleaved string

If we look closely, we can see that there are many sub problems which are being calculated again and again. Let’s look at recursion tree for input A = XXY and B = XXZ and C = XXZXXY
Interleaved strings

We get the idea that we need to store result of smaller sun problems, so that we do not calculate it again and again.

We create a two dimensional table. Table(i,j) = true only if C[i+j-1] is interleaved string if A[i] and B[j].
Empty string is interleaved of two other strings so,
Table[0][0] = true

If one of the strings was empty:
Table(i,0) = A[i] == C[i] && Table(i-1, 0) that is to say if till i-1 characters C was interleaved of A, then for ith character it will be true if ith character matches ith character of A. Note that B is null here
Again if string A is empty, then Table(0,j) = Table(0, j-1) . With same argument above.

With these base cases, we can fill table bottom up as follows

Table(i,j) = Table(i-1,j)  if (A[i] == C[i+j]) && (B[j] != C[i+j])
Table(i,j) = Table(i,j-1) (B[i] == C[i+j]) && (A[i] != C[i+j])

Table(i,j) = Table(i-1,j) || Table(i, j-1) if (A[i] == C[i+j]) && (B[j] == C[i+j])
#include <stdio.h>
#include <string.h>
#define true 1
#define false 0; 
int isInterleaved(char *c, char *a, char *b){

    int lenA = strlen(a);
    int lenB = strlen(b);
    int i,j;

    int Table[lenA+1][lenB+1];
    // Initialization
	for(i=0; i<=lenA; i++){
        for(j=0; j<=lenB; j++){
        	Table[i][j] = false;
        }
	}
    for(i=0; i<=lenA; i++){
        for(j=0; j<=lenB; j++){
        	// Both strings are empty
            if(i==0 && j==0)
                Table[i][j] = true;
    		// string A is empty, compare characters in C and B
            if(i==0 && c[j-1] == b[j-1]){
                Table[i][j] =  Table[i][j-1];
            }
            // string B is empty, compare characters in C and A
	        else if(j==0 && c[i-1] == a[i-1]){
                Table[i][j] =  Table[i-1][j];
            }
            // Both strings are not empty
            //1. If character of A matches with character of C
            // but not of B
            else if (a[i-1] == c[i+j-1] && b[j-1] != c[i+j-1]){
                Table[i][j] = Table[i-1][j];
            }
            //2. If character of B matches with character of C
            // but not of A
            else if (a[i-1] != c[i+j-1] && b[j-1] == c[i+j-1]){
                Table[i][j] = Table[i][j-1];
            }
            //1. If character of A matches with character of C
            // and charactetr of B also matches with C
            else if (a[i-1] == c[i+j-1] && b[j-1] == c[i+j-1]){
                Table[i][j] = Table[i-1][j] || Table[i][j-1];
            }
        }
    }
    return Table[lenA][lenB];
}

int main(void) {
	char *c = "abxycdz";
	char *a = "abcd";
	char *b = "xyz";
	if(isInterleaved(c,a,b)){
		printf("\n String is interleaved");
	}
	else{
		printf("\n String is not interleaved");
	}
	return 0;
}

Complexity of above code will be O(N2).

Please share if there is something is wrong or missing. If you want to contribute to website, please write to us on [email protected]