Is subsequence

Given a string s and a string t, check if s is subsequence of t. A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, “ace” is a subsequence of “abcde” while “aec” is not). For example:

Input:
s="ace" ,t= "abcde"
Output: 
True

Input:
s="aec" t="abcde"
Output: 
False

Approach

This problem looks similar to edit distance problem which can be solved using dynamic programming. The only difference is only deletions are allowed in this problem.

Recursive approach

What happens if we take a character at index i from string s and a character at index from string t? There are two possibilities: either these characters are equal or they are not equal. If the characters are equal, then we found the corresponding character in the target string, so we have to look for a remaining substring from index i+1 in s in substring in t from index j+1
What if the characters are not equal? In that case, we keep looking in substring of t from index j+1, but the index of s does not change, it remains at i, because we have not found the match for this character yet in the target string.

Implementation note: s.substring(1) actually get the substring of the string from the second character to the last character and achieves the increase in the indices as mentioned above.

 private boolean isSubsequenceUtil(String s, String t){
        
        if(s.length() == 0) return true;
        
        if(t.length() == 0) return false;
        
        return (s.charAt(0) == t.charAt(0)
                 && isSubsequenceUtil(s.substring(1), t.substring(1)))
            || isSubsequenceUtil(s, t.substring(1));
                
    }

If you run the above code in Leetcode, it will give you Time Limit Exceeded error, that becomes obvious when we draw the execution tree of the function, we will notice that we are solving the same problem multiple times.

Top down approach

The first thing we should do it to avoid solving the subproblem, again and again, that can be achieved using memorization. We introduce a caching place to store if we have solved the problem already and if yes, what was the result. If we already have solved the problem, we will just use the result and not solve the problem again. Below is the memorization code.

 private boolean isSubsequenceUtil(String s, String t, boolean[][] isKnown,
                                      boolean[][] value,
                                      int sIndex, int tIndex){
        
        if(s.length() == sIndex) return true;
        
        if(t.length() == tIndex) return false;
        
        if(isKnown[sIndex][tIndex]){
            return value[sIndex][tIndex];
        }
        
        value[sIndex][tIndex] = (s.charAt(sIndex) == t.charAt(tIndex)
                 && isSubsequenceUtil(s, t, isKnown, value, 
                             sIndex+1, tIndex+1))
            || isSubsequenceUtil(s, t, isKnown, value, sIndex, tIndex+1);
                
        isKnown[sIndex][tIndex] = true;
                                 
        return value[sIndex][tIndex];
    }

This code passes all the test cases at Leetcode. This approach is called a top-down approach in dynamic programming, where we start with the top, keep solving smaller problems until we find a solution for the original problem.

Bottom up approach

Since the optimal solution to the subproblems leads to optimal solution to the original problem, we can apply another approach called the bottom-up approach. In this approach, we start from the smallest possible problem and build our solution upwards.

What is the smallest problem possible in this case? If string s is empty and target string t is not empty, s is definitely a subsequence although an empty one.
So, if I create a two dimensional array where rows represent the number of characters in t and column represent number of characters in s, then I can initialize the first column (which represents zero length of the source string) as

for(int i=0; i<=t.length(); i++){
   dp[i][0] = true;
}

Other way around, what if string t is empty and string s is not empty, then there is no way possible string s can be subsequence of string t, this can be filled in the table as follows:

for(int i=0; i<=t.length(); i++){
    dp[0][i] = false;
}

What if we move up a bit, let’s say we have i characters in string t and j characters in string s. We compare the i-th character with j-th character and see if there are equal?
If they are equal and if string t with i-1 characters already has string s with j-1 characters as subsequence, then we can mark that string t with i characters has string s with j characters as subsequence too.

If the characters are not equal, then either string s with j characters should already be subsequence in i-1 characters of string t, or it cannot be a subsequence in i characters either.

dp[i][j]= t[i] == s[j] && dp[i-1][j-1] 
dp[i][j]= dp[i-1][j]

If we go through each character in both string. One implementation note, i and j represent the length of strings and not the index, that is why we compare characters at index i-1 and j-1 when solving for dp[i][j]. Also if length of t is less than length of s there is no way s can be subsequence of t.

     private boolean isSubsequenceUtilDP(String s, String t){
        
        boolean [][] dp = new boolean[t.length()+1][s.length()+1];
         
        for(int i=0; i<=t.length(); i++){
            dp[i][0] = true;
        }
        for(int j=1;j<=s.length(); j++){
            dp[0][j] = false;
        }
        
        for(int i=1; i<=t.length(); i++){
            for(int j=1; j<=s.length(); j++){
                if(i < j) dp[i][j] = false;
                  
                else {
                    dp[i][j] = t.charAt(i-1) == s.charAt(j-1) ?
                        dp[i-1][j-1]
                        : dp[i-1][j];
                }
            }
        }
                                 
        return dp[t.length()][s.length()];
    }

The time complexity of dynamic programming solution is O(n * m) where n and m are length of string t and s respectively. Also, there is additional space complexity of O(n * m).

We can reduce this complexity by using stack data structure as the relative position of characters in source and target string should remain same and matched characters can be deleted.

Using Stack

  1. Push the characters of source string in reverse order onto the stack.
  2. Iterate through the characters of the target string, and check if the character matches with the top element of the stack. If it matches, pop the element from the stack.
  3. Obtain the end result – If size of stack is zero, then it returns true, else false.
public boolean isSubsequence(String s, String t) {
         if(s.length()==0 && t.length() > 0)
            return true;

        if(s.length() > t.length())
            return false;

       Stack<Character> stack=new Stack<>();
        for(int i=s.length()-1;i >=0;i--)
            stack.push(s.charAt(i));

        for(int i=0; i < t.length();i++){
            if(!stack.isEmpty() && t.charAt(i)==stack.peek())
                stack.pop();
        }
        return stack.isEmpty();
    }

Time Complexity of the stack based solution is O(n + m) and the space Complexity isO(m), where n and m are length of string t and s respectively

The space complexity can be further reduced to O(1) by using two pointers approach.

Two Pointers

Take two pointers i which is used to traverse string t and j, which is used to traverse strings. Increment j whenever there is a match and compare the value of j with the length of source string at the end.

public boolean isSubsequence(String s, String t) 
{
        if(s.length() > t.length())
            return false;
        int i = 0, j = 0;
        while(i < t.length() && j < s.length()){
            if(t.charAt(i) == s.charAt(j))
                j++;
            i++;
        }
        if(j == s.length())
            return true;
        return false;
}

The time Complexity is O(n + m) and the space Complexity is O(1)

Please write to us if something is missing or wrong, we will be happy to fix it.

Longest common substring

Longest Common Substring

Given two string A and B, find longest common substring in them. For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”. Below figure shows longest common substring.

longest common substring

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m).

Can we do better than that?

Longest common substring : Line of thoughts

We have to find longest common substring in strings of length M and length N. Can we find longest common substring till length M-1 and N-1 and then derive longest common substring for M and N?  Yes, we can find. The length either grows by one if last characters are equal or reset to zero if last characters are not equal. Why so?

First see why we need to reset to zero when characters are different. This because we are looking for common substring which means characters should be consecutive, any different character restart the the entire search because with those two  different characters, there can’t be any common substring.

What if characters are same? In that case we increment by one, because, longest common substring in N-1 and M-1 would be either 0 or some number based on how any consecutive common characters were till N-1 and M-1.

What will be longest common substring when one of the strings is empty? It will be zero.

So, do you see recursion here? So, let’s write recursion relation and then implement it.

LCS(i,j) = 1+LCS(i-1, j-1) if S[i] = T[j] 
         =  0 otherwise

This recursion relation has optimal subproblem property that solution to the problem actually depends on solutions to subproblems. Also, there are subproblems which will be calculated again and again, which is called overlapping subproblems. These two properties are required for dynamic programming. To not to calculate subproblems, we will use memoization, for that  create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j]. And since solution for i-1 and and j-1 is required before solution of i and j, this matrix will be filled bottom up.

Longest common substring using dynamic programming

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 
   1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
( Because new character is added to already common substring, 
     if any, till A[0...i-1] and B[0,,j-1])
   1.2 if both characters are not same, LCS[i][j] = 0,
       ( Because if characters are not same, there cannot be any
         common substring including A[i] and B[j].

Implementation

#include <stdio.h>
#include <string.h>

int max(int a, int b){
	return a>b ? a:b;
}
 int longestCommonSubstring(char * A, char * B){
     int lenA = strlen(A);
     int lenB = strlen(B);
     int LCS[lenA+1][lenB+1];

     for (int i=0; i<= lenA; i++){
         LCS[i][0] = 0;
     }

     for (int j=0; j <= lenB; j++){
         LCS[0][j] = 0;
     }
	
     int maxLength = 0;
     for (int i=1; i<= lenA; i++){
        for (int j=1; j <= lenB; j++){
            if (A[i] == B[j]){
                LCS[i][j] = 1 + LCS[i-1][j-1];		
                maxLength = max( maxLength, LCS[i][j] );
            } 
            else {
               LCS[i][j] = 0;
            }
         }
     }
     return maxLength;
}

int main(void) {
    char *a = "ABCDEFGSE";
    char *b = "EBCDEFGV";
	
    printf("\n Longest common substring : %d",
			longestCommonSubstring(a,b));
    return 0;
}
package com.company;

/**
 * Created by sangar on 5.1.18.
 */
public class LCS {

    public  static int longestCommonSubstring(String A, String B){
        int lenA = A.length();
        int lenB = B.length();

        int [][] LCS = new int[lenA][lenB];

        for (int i=0; i<lenA; i++){
            LCS[i][0] = 0;
        }

        for (int j=0; j<lenB; j++){
            LCS[0][j] = 0;
        }

        int maxLength = 0;
        for (int i=1; i<lenA; i++){
            for (int j=1; j<lenB; j++){
                if (A.charAt(i) == B.charAt(j)){
                    LCS[i][j] = 1 + LCS[i-1][j-1];
                    maxLength = Integer.max(maxLength, LCS[i][j]);
                }
                else {
                    LCS[i][j] = 0;
                }
            }
        }

        for (int i=0; i<lenA; i++){
            System.out.println();
            for (int j=0; j<lenB; j++){
                System.out.print(" " + LCS[i][j]);
            }
        }
        return maxLength;
    }

    public static void main(String[] args) {
	    String a = "ABCDEFGS";
	    String b = "EBCDEFG";

        System.out.println("Longest common substring :" +
                longestCommonSubstring(a,b));
    }
}

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

longest common substring dynamic programming

In next post, we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Please share if you find something wrong or missing. If you want to contribute to site, please refer contact us. We would be happy to publish your work and in turn will pay you too.