Longest common subseuence

A subsequence of a string is set of all the characters which are left to right order and not necessarily contiguous. For example, string ABCDEG has ABC, ADG, EG, BCDEG subsequences; whereas BDA is not a subsequence of the given string, even though all the characters are present in the string, they do not appear in the same order.

Given two strings X and Y, find longest common subsequence (LCS) Z. For example, X = ABCDSEFGD Y = ACFEFXVGAB; LCS Z would be ACEFG.

Longest common subsequence: line of thoughts

First of all, notice that it is an optimization problem, it is a hint that it may be a dynamic programming problem but we are not sure yet.

Let’s say that the length of the string 1 and the string of 2 are N and M. Can I know the longest common subsequence in length N and M if I already know the LCS in N-1 and M-1? The direct question is can I divide the original problem into subproblems and solve those subproblems to get the answer for original problem? In this case, the answer is yes. (This is the second hint towards dynamic programming application, optimal subproblem substructure).

How can we divide the problem into subproblems? The length of X is N and length of Y as M. Start from the end of both strings. Check if X[N] == Y[M]. If yes, the problem reduces to find the longest common subsequence in X[1..N-1] and Y[1..M-1].

What if they are not equal? Then one by one we have to exclude character from string X and Y. Why?
First, we exclude the character from the X and find LCS in remaining characters of X and all the characters of Y. The problem reduces to X[1..N-1] and Y[1..M]. Next, we exclude a character from Y, the problem reduces to X[1..N] and Y[1..M-1]. Any of the two choices can give us the longest common subsequence, so we would take maximum from both the cases.

LCS(i,j)  =  1 + LCS[i-1, j-1] if X[i] == Y[j]   =   MAX (LCS[i-1,j], LCS[i, j-1]) if X[i] != Y[j] =   0 if i or j is 0

Interesting to see why LCS(i,j) is 0 when either i or j is 0? Because the longest common subsequence in two strings, when one string is empty is 0.

Can we implement the recursive function?

public int longestCommonSubsequence(String s1, String s2, int i, int j){

//If any of the string is nul return 0
if(s1 == null || s2 == null) return 0;

//We reached at the end of one of the string
if(i == s1.length() ||  j == s2.length())
return 0;

if(s1.charAt(i) ==  s2.charAt(j)){
return  1 + longestCommonSubsequence(s1, s2, i+1, j+1);
}

return Integer.max(longestCommonSubsequence(s1, s2, i+1, j),
longestCommonSubsequence(s1, s2, i, j+1));

If we follow the execution cycle of the above code, we will see something like below

It is evident from the partial tree that there are some problems which are solved again and again. This is the third hint (overlapping subproblems) that we can apply dynamic programming.

It will be more evident if you implement the recursive function with reverse traversal of the strings. In that implementation, the base case will be when one of the string is empty, and at that point, LCS of two strings will be 0. If we take a two dimensional table such that T[i][j] represents longest common subsequence till ith and jth characters of string S1 and S2 then T[0][i] = 0 and T[i][0] = 0.

T[i][j] = T[i-1][j-1] + 1 if X[i] = Y[j] T[i][j] = max(T[i-1][j], T[i][j-1]) if X[i] != Y[j]

Dynamic programming implementation of LCS

package com.company;

/**
* Created by sangar on 4.2.19.
*/
public class LongestCommonSubsequence {

public int longestCommonSubsequenceDP(String s1, String s2){

//If any of the string is nul return 0
if(s1 == null || s2 == null) return 0;

int len1 = s1.length();
int len2 = s2.length();

int[][] table = new int[len1+1][len2+1];

for (int i=0; i<=len1; i++){
for (int j=0; j<=len2; j++) {
if (j == 0 || i == 0) {
table[i][j] =  0;
}

else if (s1.charAt(i-1) == s2.charAt(j-1)) {
table[i][j] = 1 + table[i - 1][j - 1];
} else {
table[i][j] = Integer.max(table[i - 1][j], table[i][j - 1]);
}
}
}

return table[len1][len2];
}
}

Above implementation has time and space complexity of O(n2). Please share if there is anything wrong or missing.

What is dynamic programming?

Dynamic programming is an approach to solve a larger problem with the help of the results of smaller subproblems. It is a technique used to avoid computing multiple time the same subproblem in a recursive algorithm. I find a lot of students asking me question around, how do I know this problem is a dynamic programming problem? There is a definite way to arrive at the conclusion if a problem is a dynamic programming problem or not?

The first thing I would recommend you to read before going down is this beautiful explanation of dynamic programming to 4 years old.

The first thing you will notice about dynamic programming problems (not all problems) is they are optimization problem. Either it will be finding minimum or maximum of some entity. For example, find minimum edit between two strings or find longest common subsequence etc. However, problems like Fibonacci series are not exactly like an optimization problem, these are more like Combinatorial problems. Still, this can be a good hint that a problem can be a DP problem.

Second, you will notice that the problem can be divided into a pattern like an fn(n) = C + fn(n-k) where k can be anything between 1 and n.
This property is called optimum subproblem structure, where an optimum solution to the subproblem leads to the optimum solution to the larger problem.
Once you get the equation, it is very easy to come with a recursive solution to the problem. I would advise you to write the recursive solution and try to calculate the complexity of the solution. It will exponential in big-O notation.

Then why did recursion work so well with a divide and conquer approach? The key point is that in divide and conquer, a problem is expressed in terms of subproblems that are substantially smaller, say half the size. For instance, mergesort sorts an array of size n by recursively sorting two subarrays of size n/2. Because of this sharp drop in problem size, the full recursion tree has only logarithmic depth and a polynomial number of nodes. In contrast, in a typical dynamic programming formulation, a problem is reduced to subproblems that are only slightly smaller than the original. For instance, fn(j) relies on fn(j − 1). Thus the full recursion tree generally has polynomial depth and an exponential number of nodes.
However, it turns out that most of these nodes are repeats, that there are not too many distinct subproblems among them. Efficiency is therefore obtained by explicitly enumerating the distinct subproblems and solving them in the right order.
Reference

This will lead us to the third property, which is overlapping subproblems. Once, you draw the execution tree of the recursive solution of the problem, it will appear that a lot of problems are being solved again and again at different levels of recursion.

The intuition behind dynamic programming is that we trade space for time, i.e. to say that instead of calculating all the subproblems taking a lot of time but no space, we take up space to store the results of all the subproblems to save time later. The typical characteristics of a dynamic programming problem are optimization problems, optimal substructure property, overlapping subproblems, trade space for time, implementation via bottom-up/memoization.

Dynamic programming in action

Enough of theory, let’s take an example and see how dynamic programming works on real problems. I will take a very commonly used but most effective problem to explain DP in action. Problem is known as the Fibonacci series. Fibonacci series is a series of integers where each integer is the sum of previous two integers. For example, 1,1,2,3,5,8,13,17 is a Fibonacci series of eight integers. Now, the question is given a number n, output the integer which will be at the nth integer in Fibonacci series. For example for n = 4, the output should be 3 and for n=6, it should 8.

First hint: It is a combinatorial problem, so maybe a DP problem. Second, it is already given in the problem that current integer depends on the sum of previous two integers, that means f(n) = f(n-1) + f(n-2). This implies that the solution to subproblems will lead to a solution to the bigger problem which is optimal substructure property.

Next step is to implement the recursive function.

public int fibonacci (int n) {
if (n < 2) //base case
return 1;

return fibonacci(n-1) + fibonacci(n-2);
}

Great, next step is to draw the execution tree of this function. It looks like below for n = 6. It is apparent how many times the same problem is solved at different levels.

So, now we know three things about the Fibonacci problem: It is combinatorial problem, there is optimal substructure and there are overlapping subproblems. As in dynamic programming, we side with more space than time, we will try to use extra space to avoid recalculating subproblems.

The first way is to use a case, which stores the value of fab(n) if it is already calculated. This is called memoization or top-down approach.

Map<Integer, Integer> cache = new HashMap<>();

public int fibonacci(int n){
if (n == 0)
return 0;
if (n == 1)
return 1;

if(cache.containsKey(n))
return cache.get(n);

cache.put(n, fibonacci(n - 1) + fibonacci(n - 2));

return cache.get(n);
}

Another approach is bottom up, where the smaller problems are solved in an order which helps us with solving bigger problems. Here also, we use memoization but in a different way. We store the solution of smaller subproblems and directly use this to build the solution.

int[] fib = new int[n];
fib[0] = fib[1] = 1;
public int fibonacci(int n){
for(int i=2; i<=n; i++){
fib[n] = fib[n-1] + fib[n-2];
}
return fib[n];
}

Above solution requires extra O(n) space, however, the time complexity is also reduced to O(n) with each subproblem solved only once.

Follow longest increasing subsequence problem, how we have applied the same pattern while we solved the problem.

Final thoughts
Where to apply dynamic programming : If you solution is based on optimal substructure and overlapping sub problem then in that case using the earlier calculated value will be useful so you do not have to recompute it. It is bottom up approach. Suppose you need to calculate fib(n) in that case all you need to do is add the previous calculated value of fib(n-1) and fib(n-2)

Recursion : Basically subdividing you problem into smaller part to solve it with ease but keep it in mind it does not avoid re computation if we have same value calculated previously in other recursion call.

Memoization : Basically storing the old calculated recursion value in table is known as memoization which will avoid re-computation if its already been calculated by some previous call so any value will be calculated once. So before calculating we check whether this value has already been calculated or not if already calculated then we return the same from table instead of recomputing. It is also top down approach.

Please share if there is something wrong or missing. If you are preparing for an interview and need help with preparation, please book a free session with us to guide you through it.

0/1 knapsack problem using dynamic programming

0/1 Knapsack is a typical problem that is used to demonstrate the application of greedy algorithms as well as dynamic programming. There are cases when applying the greedy algorithm does not give an optimal solution. There are many flavors in which Knapsack problem can be asked.

1. A thief enters a museum and wants to steal artifacts from there. Every artifact has a weight and value associated with it. Thief carries a knapsack (bag) which can take only a specific weight. Problem is to find the combination of artifacts thief steals so that he gets maximum value and weight of all taken artifacts is less capacity of knapsack he has. A thief cannot take any artifact partially. Either he takes it or leaves it. Hence the problem is 0/1 knapsack.

2. We have N files each having a size say Si. We have a total storage capacity of W bytes. For each file to be stored re-computation cost is Vi. Problem is to store as many files on storage that combined size of all files is less than W and their re-computation value is maximum. We can either store or leave a file, we cannot store a partial file. Hence this is a case of 0/1 knapsack problem.

0/1 knapsack problem : Line of thoughts

Brute force method would try all subsets of a set of items, whose weight adds up to the maximum capacity of knapsack and see which one gives maximum value. The complexity of the brute force algorithm would be of exponential order as there will be 2n possible subsets for n items.

Can we do better? If we consider each item, there are two possibilities associated with it.
First, current item is included in optimal subset. Then we need to find out all the items in remaining N-1 items which can optimize the subproblem for weight W-wk. Value of this item is added to candidate maximum value.

Second, current item is not included in optimal subset. In that case, we need to find out items in remaining N-1 items which can optimize the the original problem. Value of current item is not added into candidate maximum value.

Inclusion depends on two conditions :

1. Weight of the item is less bthan the total capacity of knapsack.
2. Inclusion of item increases current max value with K-1 items with W-Wk weight.

Since every steps reduces the problem to a smaller problem in terms of items of weight, recursive solution would be our first refuge. To implement this problem, what are the base cases? First, we cannot add any items to knapsack capacity is zero i.e. W == 0. Second, no item can be taken if there are no items remaining, i.e. n == 0.

Recursive implementation of 0/1 knapsack problem

package com.company;
/**
* Created by sangar on 19.8.18.
*/
public class KnapsackProblem {
static int knapSack(int W, int[] weights, int[] val, int n) {
/*
If there is no item or weight that can be carried is
zero, return zero
*/
if (n < 0 || W == 0)
return 0;

/*
If weight of the nth item is more than Knapsack
capacity W,then this item cannot be included
in the optimal solution
*/
if (weights[n] > W)
return knapSack(W, weights, val, n - 1);

/* Consider two cases, including item and excluding item.*/
else return Integer.max(
(val[n]
+ knapSack(W - weights[n], weights, val, n - 1)),
(knapSack(W, weights, val, n - 1))
);
}

public static void main(String args[]) {
int[] val = {60, 100, 120};
int[] wt = {10, 20, 30};
int W = 50;

int n = val.length;
System.out.println(knapSack(W, wt, val, n - 1));
}
}

If we look at the execution trace of the function, it looks like this.

There are seven problems to be solved at the leaf level. For N = 3, there are 7 problems to be solved before we start optimizing for the max value. For N, in general, it will take 2N subproblems to be solved. Hence, complexity of recursive implementation is O(2N).

If we take another example, it will become evident that there are some subproblems that are solved again and again. Overlapping subproblems is one of the criteria, we should be thinking about dynamic programming. Also, notice that the optimal solution to a smaller problem leads to the optimal solution to a bigger problem, which is the second condition for DP. This problem satisfy both these conditions, hence let’s design DP solution for it.

0/1 knapsack problem : Dynamic programming approach

We define two dimensional array V[N,W] where N is number of items and W is capacity. For 1<= i <= n and 0<=w<=W, V[i,w] represents the optimal solution for items I1, I2, ..In with maximum weight of w. If we can compute all the entries of this array, then the array entry V[N,W] is the solution to our problem

For i =0 and w=0, all values will be zero. So, the first column and first row will be filled with all zero values.
Recursively, we can fill the table bottom-up as follows.

V[i, w ] = max (V[i-1, w], V[i-1, w-w[i]) + V[i] )
V[0, w ] = 0; there are no items
V[i, 0 ] = 0; no items can be picked.
package com.company;

/**
* Created by sangar on 19.8.18.
*/
public class KnapsackProblem {
public static int knapsackDP(int W, int[] weights, int[] val, int n) {
int[][] V = new int[n+1][W + 1];
for(int i = 1 ; i < V[0].length; i++){
/*
If weight of item is less than current value
we can achieve minimum value V[i] with 1..i items
*/
if(weights[0] <= i){
V[0][i] = val[0];
}else{
V[0][i] = 0;
}
}

//Loop for all items
for (int i = 1; i < V.length; i++) {
for (int j = 1; j < V[i].length; j++) {
/*if a weight is more than the allowed weight,
that weight cannot be picked. */
if(weights[i] > j){
V[i][j] = V[i-1][j];
}else{
V[i][j] = Math.max(V[i-1][j],
val[i] + V[i-1][j-weights[i]]);
}
}
}
return V[V.length-1][W];
}

public static void main(String args[]) {
int[] val = {60, 100, 120};
int[] wt = {10, 20, 30};
int W = 50;

int n = val.length;
System.out.println(knapsackDP(W, wt, val, n - 1));
}
}

One similar problem which can be solved with the same approach is the minimum number of coins to be used to get a change of a particular amount.

Complexity of the dynamic programming implementation of knapsack problem is O(N *W). Space complexity is again O(N*W). It is thumb rule that we trade space for time in dynamic programming.

Please share if there is something is wrong or missing. If you want to contribute to the website, please reach out to us at [email protected]

Longest increasing subsequence

Given an array of integers, find the longest increasing subsequence i.e a subsequence such that every element of subsequence satisfy this condition: i < j  and a[i] < a[j].

Input:
[3, 10, 2, 1, 20 ]
Output:
3
Explanation:
The longest increasing subsequence is 3, 10, 20

Input:
[ 3, 2 ]
Output:
1
Explanation:
The longest increasing subsequences are {3} and {2}

Input:
[ 50, 3, 10, 7, 40, 80 ]
Output:
4
Explanation:
The longest increasing subsequence is {3, 7, 40, 80}

Longest common subsequence : thought process

If we have to find the longest increasing subsequence till the last element of an array, i.e nth element, the question is does that depend on LIS till n-1 elements before it? Idea is to see if any increasing subsequence already present till current index i,  can include  A[i] and still remain increasing? To confirm that, check every element at index j such that j>=0 and j < i and A[j] < A[i].

If A[j] is less than A[i], then A[i] can be part of increasing subsequence ending with element j. The length of such increasing subsequence can be (length of increasing subsequence ending at j )+1. Check for each such element and take maximum length. Let’s see an example and see how it works.

Before we start a solution, let’s think why are we applying dynamic programming here. First because, the solution to a bigger problem depends on the optimal solution of subproblems and second, because if we do not store solutions to subproblems we may end up calculating them again and again. Both Optimal subproblem property and overlapping subproblem property are satisfied with this problem and hence we will use dynamic programming to solve it.

LIS[i] will represent the longest increasing subsequence length till element i .

LIS[i] = 1 + max(LIS[j]) for all  0<=j<i and A[j]<A[i]
= 1 if no such element exists where j< i and A[j]<A[i]

class Solution {
public int lengthOfLIS(int[] nums) {

if(nums == null || nums.length == 0) return 0;

int [] LIS = new int[nums.length];

Arrays.fill(LIS, 1);

int max = 1;

for(int i=0; i<nums.length; i++){
for(int j=i-1; j>=0; j--){
if(nums[j] < nums[i] ){
LIS[i] = Math.max(LIS[i], LIS[j] + 1);
}
}
max = Math.max(LIS[i], max);
}
return max;
}
}

Let’s take an example and see how this code works? For example, given array A = {2,4,6,3,5,7,9}

Initialize LIS[0] =1, that means there is an increasing subsequence of length 1 at index 0.
For i = 1 i.e 4, check for j=0 to 1, excluding index 1. A[0] < A[1], hence LIS length is 2 (LIS[0] + 1 ).

For i = 2, i.e. 6 , check j = 0 to 2.  and check that LIS[0] = 1 and LIS[1] =2. Max LIS[j] for j=0 to  2 is 2. LIS[2] = 3 (LIS[1] +1).
For i =3 i.e 3, check from 0 to 3, Max LIS till index 3 will be LIS[3] = 2 because only A[0] is less than A[3]. Hence longest subsequence ending with i = 3 will have length only of 2.  LIS[3] = 2
For i = 4, i.e.5 Max LIS is again 3 which includes {2,4,5} or {2,3,5}
For i = 5, i.e 7, Max LIS is 4 which includes {2,4,5,7} or {2,3,5,7} or {2,4,6,7}
For i = 6, i.e 9, Max LIS is 5 which includes {2,4,5,7,9} or {2,3,5,7,9} or {2,4,6,7,9}

Therefore, the longest increasing subsequence is 6 for a given array.

Other problems which are a variance of longest increasing subsequence and can be solved by finding the longest increasing subsequence are :

1. Given two river banks (visualization: two parallel lines), one bank has numbers written (1….n) in sorted order. On the other bank, the numbers (1…n) are arranged randomly. A bridge can be formed from the ith point from bank 1 to ith point in bank 2. Find the max number of non-intersecting bridges you can form?
Just find the longest increasing subsequence in unordered number and that will be the solution.

2. Given a set of n types of rectangular 3-D boxes, where the ith box has height h(i), width w(i), and depth d(i) (all real numbers). You want to create a stack of boxes that are as tall as possible, but you can only stack a box on top of another box if the dimensions of the 2-D base of the lower box are each strictly larger than those of the 2-D base of the higher box.

3. Another problem which borrow heavily from approach is find longest zigzag subsequence in an array

Algorithm to find longest increasing subsequence works in O(n2) in time complexity with O(n) space complexity.

Please share if you find anything wrong or missing, we would love to hear from you. If you want to be part of editors and contributors team at Algorithms and Me, please drop us an email at [email protected]

Longest palindrome substring

Given a string, find the longest palindrome substring in that string. For example, in the string given below, the longest palindrome substring is DCBABCD

Brute force solution to the problem is pretty easy. Idea is to start from each character in the string and go on checking on the left and right side of that character until they are same. Once the left and right side characters differ, we check if the number of characters in substring centered character is greater than earlier such substring for any other character processed earlier. If it is greater, then we update the length and look for subsequent characters till the end of the string.

Implementation

private String longestPalindrome(String s) {

int longest = Integer.MIN_VALUE;
int start = 0; int end = -1;

for(int i=0; i<s.length(); i++){
int currentLength = 1;
for(int j= i-1, k= i+1; j>=0 && k < s.length()
&& s.charAt(j) == s.charAt(k); j--, k++){
currentLength+= 2;
}
if(currentLength >= longest) {
longest = currentLength;
start = i - currentLength / 2;
end = i + currentLength / 2;
}
}

for(int i=0; i<s.length(); i++){
int currentLength = 0;
for(int j= i, k= i+1; j>=0 && k < s.length()
&& s.charAt(j) == s.charAt(k); j--, k++){
currentLength+=  2;
}

if(currentLength >= longest) {
start = (i - currentLength/2) + 1;
end = (i + currentLength /2 );
longest = currentLength;
}
}

return s.substring(start,end+1);
}

The complexity of the implementation is O(n2).

Longest palindrome string: a dynamic programming approach

To apply dynamic programming to any problem, there are two conditions which must satisfy: First, the optimal subproblem property meaning that solution to smaller subproblems leads to the solution to the larger problem. Second, overlapping subproblem substructure meaning there are subproblems which are solved again and again in a recursive way.

The first property gives us the recurrence relationship, whereas the second property points us towards memoization.

Let’s take an example and see if we can come up with an algorithm. We have to find if a string of length 4 is a palindrome or not. To string to be a palindrome, it’s first and the last character should be the same. There are two paths here, either the first and last characters are same or they are not.

If the first and last characters are not the same, we can safely say that string is not a palindrome. If first and last characters are the same, can we say the string is a palindrome? No, not yet. We have to check if the substring from the second character to the second last character is a palindrome or not.

This is optimum subproblems property here. If substring s[i+1..j-1] is a palindrome, then string s[i..j] is palindrome if and only if s[i] = s[j].

Now, substring s[i+1..j-1] will be calculated for all the indices k where 0<=k<=i and for the indices k where j <=k<=length. This hints at overlapping substructure property. Can we precalculate this stuff before and save it in memory?
We can store this information in a matrix M, where M[i][j] stores if a substring starting at i and length j is palindrome or not. Notice that meaning of j is length and not an index in original string.

M[i][1] will be always true as a character is a palindrome in itself. M[i][2] is true only if s[i]=S[i+1]

For M[i][3], the first character of the substring will be s[i], the last character will be s[i+3-1], if s[i] = s[i+2] and if M[i+1][2] is true (substring starting at i+1 and length 2 is palindrome or not), then only M[i][3] can be true.

We fill the table bottom-up, starting with substrings with length 1 and 2 from each character in the string.

M[i][j] = (s[i] == [j+i-1] && M[i+1][j-2]) for i = 3 to n-j+1 and j = 1 to n

DP implementation

class Solution {
public String longestPalindromeSubstring(String s) {
if(s.length() == 0 ) return "";

int longest = 0;
int start = 0;
int end = 0;
boolean[][] table = new boolean[s.length()][s.length()];
table[0][0] = true;

for (int i = 1; i < s.length(); i++) {
//All single characters are palindrome in itself
table[i][i] = true;

//All two characters are palindrome if two characters are equal
table[i - 1][i] = s.charAt(i - 1) == s.charAt(i);
if(table[i-1][i] && longest <= 2){
longest = 2;
start = i-1;
end = i;
}

}

for (int L = 3; L <= s.length(); L++) {
for (int i = 0; i <= s.length()-L; i++) {
int j = i + L - 1;
table[i][j] = table[i + 1][j - 1]
&& (s.charAt(i) == s.charAt(j));

if(table[i][j] && longest <= L){
longest = L;
start = i;
end = j;
}
}
}

return s.substring(start, end+1);

}
}

Test cases

package test;

import com.company.LongestPalindromeSubstring;
import org.junit.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
/**
* Created by sangar on 2.1.19.
*/
public class LongestPalindromeSubstringTest {

LongestPalindromeSubstring tester = new LongestPalindromeSubstring();

@Test
public void longestPalindromeSubstringTest() {
assertEquals(4, tester.longestPalindromeSubstring("ABBA"));
}

@Test
public void longestPalindromeSubstringOneCharTest() {
assertEquals(1, tester.longestPalindromeSubstring("ACDB"));
}

@Test
public void longestPalindromeSubstringInMidTest() {