Longest Common Subsequence

Longest common subseuence

A subsequence of a string is set of all the characters which are left to right order and not necessarily contiguous. For example, string ABCDEG has ABC, ADG, EG, BCDEG subsequences; whereas BDA is not a subsequence of the given string, even though all the characters are present in the string, they do not appear in the same order.

longest common subsequence lcs

Given two strings X and Y, find longest common subsequence (LCS) Z. For example, X = ABCDSEFGD Y = ACFEFXVGAB; LCS Z would be ACEFG.

Longest common subsequence: line of thoughts

First of all, notice that it is an optimization problem, it is a hint that it may be a dynamic programming problem but we are not sure yet.

Let’s say that the length of the string 1 and the string of 2 are N and M. Can I know the longest common subsequence in length N and M if I already know the LCS in N-1 and M-1? The direct question is can I divide the original problem into subproblems and solve those subproblems to get the answer for original problem? In this case, the answer is yes. (This is the second hint towards dynamic programming application, optimal subproblem substructure).

How can we divide the problem into subproblems? The length of X is N and length of Y as M. Start from the end of both strings. Check if X[N] == Y[M]. If yes, the problem reduces to find the longest common subsequence in X[1..N-1] and Y[1..M-1].

What if they are not equal? Then one by one we have to exclude character from string X and Y. Why?
First, we exclude the character from the X and find LCS in remaining characters of X and all the characters of Y. The problem reduces to X[1..N-1] and Y[1..M]. Next, we exclude a character from Y, the problem reduces to X[1..N] and Y[1..M-1]. Any of the two choices can give us the longest common subsequence, so we would take maximum from both the cases.

LCS(i,j)  =  1 + LCS[i-1, j-1] if X[i] == Y[j]
  =   MAX (LCS[i-1,j], LCS[i, j-1]) if X[i] != Y[j]
=   0 if i or j is 0

Interesting to see why LCS(i,j) is 0 when either i or j is 0? Because the longest common subsequence in two strings, when one string is empty is 0.

Can we implement the recursive function?

    public int longestCommonSubsequence(String s1, String s2, int i, int j){

        //If any of the string is nul return 0
        if(s1 == null || s2 == null) return 0;

        //We reached at the end of one of the string
        if(i == s1.length() ||  j == s2.length())
            return 0;

        if(s1.charAt(i) ==  s2.charAt(j)){
            return  1 + longestCommonSubsequence(s1, s2, i+1, j+1);
        }

        return Integer.max(longestCommonSubsequence(s1, s2, i+1, j),
                longestCommonSubsequence(s1, s2, i, j+1));

If we follow the execution cycle of the above code, we will see something like below

longest common subsequence lcs

It is evident from the partial tree that there are some problems which are solved again and again. This is the third hint (overlapping subproblems) that we can apply dynamic programming.

It will be more evident if you implement the recursive function with reverse traversal of the strings. In that implementation, the base case will be when one of the string is empty, and at that point, LCS of two strings will be 0. If we take a two dimensional table such that T[i][j] represents longest common subsequence till ith and jth characters of string S1 and S2 then T[0][i] = 0 and T[i][0] = 0.

T[i][j] = T[i-1][j-1] + 1 if X[i] = Y[j]
T[i][j] = max(T[i-1][j], T[i][j-1]) if X[i] != Y[j]

Dynamic programming implementation of LCS

package com.company;

/**
 * Created by sangar on 4.2.19.
 */
public class LongestCommonSubsequence {

    public int longestCommonSubsequenceDP(String s1, String s2){

        //If any of the string is nul return 0
        if(s1 == null || s2 == null) return 0;

        int len1 = s1.length();
        int len2 = s2.length();

        int[][] table = new int[len1+1][len2+1];

        for (int i=0; i<=len1; i++){
            for (int j=0; j<=len2; j++) {
                if (j == 0 || i == 0) {
                    table[i][j] =  0;
                }

                else if (s1.charAt(i-1) == s2.charAt(j-1)) {
                    table[i][j] = 1 + table[i - 1][j - 1];
                } else {
                    table[i][j] = Integer.max(table[i - 1][j], table[i][j - 1]);
                }
            }
        }

        return table[len1][len2];
    }
}

Above implementation has time and space complexity of O(n2). Please share if there is anything wrong or missing.