Range minimum query (RMQ)

Range minimum query RMQ

Given an array A[0..n], find the index of the element with the minimum value in a given range. This problem is known as Range Minimum Query or RMQ.
For example, if given array below, find the index of minimum value between index 2 and 7, RMQ answer would be 5, which is the index of element 1.

 RMQ range minimum query

Going by the brute force, every time a query is fired, we scan the range and find the minimum in a given range in the same way as we do for an entire array. The complexity of each query being answered is O(n) wherein the worst-case range is the entire array.

Can we preprocess our data, so that our query operations are less costly? If we do so, there are two parts to the solution now, first preprocessing and the second query. Let’s assume complexity of each step is f(n) and g(n) respectively, then the complexity of solution can be denoted as (f(n), g(n)).

What kind of preprocessing can be done? Basic idea is to calculate the minimum index of all the ranges possible in the array. How many ranges are possible for an array with n elements? It’s n2 ranges. Why?

So, to store the index of minimum value element of each range, O(n2) order space is required and time complexity goes to O(n3). However, complexity of query is O(1). So overall complexity of solution is ( O(n3), O(1) ).

#include <stdio.h>

int M[100][100];

int findMinimum(int a[], int start, int end, int size){
	if(start >= size || end >= size) return -1;
	int min = start;
	for(int i=start; i<=end; i++){
		if( a[i] < a[min] ){
			min = i;
		}
	}
	return min;
	
}
void preprocess(int a[], int size ){
    for(int i=0; i<size; i++){
        for(int j=0; j<size; j++){
            for(int k=i; k<=j; k++){
                M[i][j] = findMinimum(a,i,j,size); 
            }
        }
    }
}

int rmq(int start, int end){
	return M[start][end];
}

int main(void) {
	
	int a[] = { 2,3,1,5,9,7,10,5,6,3 };
	int size = sizeof(a)/sizeof(a[0]);
	
	//Preprocessing step
	preprocess(a, size);
	printf("\n Minimum index in range is : %d ", rmq(3,9) );
	printf("\n Minimum index in range is : %d ", rmq(2,7) );
	
	return 0;
}

With application of dynamic programming, the complexity of the preprocessing step can be reduced to O(n2).

#include <stdio.h>

int M[100][100];

void preprocess(int a[], int size)
{
	int i,j;
	for (i=0; i<size; i++)
		M[i][i] = i;
	
	for (i=0; i<size; i++){
		for (j=i+1; j<size; j++){
			if (a[M[i][j - 1]] < a[j])
				M[i][j] = M[i][j - 1];
			else
				M[i][j] = j;
		}
	}
}

int rmq(int start, int end){
	return M[start][end];
}

int main(void) {
	
	int a[] = { 2,3,1,5,9,7,10,5,6,3 };
	int size = sizeof(a)/sizeof(a[0]);
	
	//Preprocessing step
	preprocess(a, size);
	printf("\n Minimum index in range is : %d ", rmq(3,9) );
	printf("\n Minimum index in range is : %d ", rmq(2,7) );
	
	return 0;
}

Range minimum query with O(n), O(√n) complexity solution

Can we do better for preprocessing step while trading off query step? If we divide the array into smaller chunks and store index of minimum value element in those chunks, will it help? And what should be the size of chunks? How about we divide the array in √n parts, where √n is size of part.

RMQ or range minimum query based on square root partitioning

Now, find minimum element index in each of this chunk, and store it. Extra space required is (√n). Finding minimum for each chunk has a complexity of (√n * √n) as O(n).

To find minimum element index in the given range, follow three steps:
1. Find the index of the element with the minimum value in all chunks lying between the start and end of the given range. (Max √n operations if all chunks fall in the range)
2. Find minimum index in chunk where the start of the range lies. ( Max √n comparisons from start of the range to end of the chunk).
3. Find minimum index in chuck where end of the range lies from the start of chunk to end of the range.
4. Compare all these values and return the index of the minimum of them.

No matter, how big or small range is to find the index of an element with the minimum value, the worst case will be O(√n) as there are only 3*√n operations.

Let’s take an example and see how it works. Find minimum in range (2,7)

range minimum query or RMQ example

To get RMQ(2,7), what are the chunks with are lying within range? There is only one: chunk 1. Minimum index of chunk 1 is M[1] which is 5, so, minimum element in those chunks is A[5].

Find the index of the minimum value in chunk 0 where start of the range lies (starting from start of the range which 2). There is only one element, which is index 2, so element to compare is A[2].

Find minimum from the start of chunk where the end of the range lies. So, we will be comparing A[6] and A[7].

At the end, compare A[5] (minimum of all chunks between start and end of range ), A[2] (minimum in chunk where the start of the range lies) and A[6], A[7] (minimum in chunk where end of the range lies) and we have the answer as 5 as A[5] is the minimum of all these values.

Aggregating all things, we found a way to optimize solution of range minimum query with complexity as ((o(n), O(√n)).

RMQ using sparse table

Method 3 uses only O(√n) space, however, query time complexity is also O(√n). To reduce query time at the expense of space, there is another method called as sparse table method. This method uses features of method 2 (dynamic programming) and features of method 3 (find minimums of chunks).

In this approach, split input array into chunks of size 2j where j varies from 0 to log n and n is number of elements in array. There will be n log n such chunks and hence the space complexity becomes O(n log n).

After splitting, find the index of the minimum element in each chunk and store it in a lookup table. 

M[i][j] stores minimum in range from i with size 2j.

RMQ using sparse matrix table

For example, M[0][3] stores index of the minimum value between 0 and 7 (23 = 8 elements).

Now problem is how to create this lookup table? This table can be created using dynamic programming from bottom up. Specifically, we find index of the minimum value in a block of size 2j by comparing the two minima of its two constituent blocks of size 2j-1. More formally,

M[i,j] = M[i, j-1] if A[M[i, j-1]] >= A[M[i+2^j-1, j-1]] 
M[i,j] = M[i+2^j-1, j-1] otherwise.

How to find the index of the minimum value in a given range? Idea is to find two subranges which cover the entire range and then find the minimum of minimums of these two ranges.
For example, find RMQ(i,j). If 2k be size of largest block that fits into the range from i to j, then k = log(j – i + 1)

Now, we have two parts to look in from i to i+2k + 1 (already computed as M[i,k] ) and from j-2k+1 (already computed as M[j-2k+1, k]).

Formally,

    RMQ(i,j) =  M[i][k] if A[ M[i][k] ] >= A[M[j-2^k+1, k]]
    RMQ(i,j) =  M[j-2^k+1, k]

RMQ implementatio using sparse table

#include <stdio.h>
#include <math.h>

int M[100][100];

void preprocess(int a[], int size)
{
    int i, j;
	
    for (i = 0; i < size; i++)
        M[i][0] = i;
		
    for (j = 1; 1 << j <size ; j++){
        for (i = 0; i + (1 << j) - 1 < size; i++){
            if (a[M[i][j - 1]] < a[M[i + (1 << (j - 1))][j - 1]])
                M[i][j] = M[i][j - 1];
            else
                M[i][j] = M[i + (1 << (j - 1))][j - 1];
        }
    }
}  
  
int rmq(int a[], int start, int end){
    int j = floor(log(start-end+1));

    if ( a[M[start][j]] <= a[M[end-(1<<j)+1][j]] )
        return M[start][j];
    else 
        return M[end-(1<<j)+1][j];
}

int main(void) {
	
	int a[] = { 2,3,1,5,9,7,10,5,6,3 };
	int size = sizeof(a)/sizeof(a[0]);
	
	//Preprocessing step
	preprocess(a, size);
	printf("\n Minimum index in range is : %d ", rmq(a,3,9) );
	printf("\n Minimum index in range is : %d ", rmq(a,2,7) );
	
	return 0;
}

These two blocks entirely cover the range and since only once comparison required, the complexity of lookup will be O(1).

In this post, we discussed various ways to implement range minimum query based on space and time complexity tradeoff. In future posts, we will discuss applications of RMQ such as segmented trees and least common ancestor problem.

Please share if something is wrong or missing, we would love to hear from you.

Prune nodes not on paths with given sum

Prune nodes not on paths with given sum

Prune nodes not on paths with given sum is a very commonly asked question in Amazon interviews. It involves two concepts in one problem. First, how to find a path with a given sum and then second, how to prune nodes from binary tree. The problem statement is:

Given a binary tree, prune nodes which are not paths with a given sum.

For example, given the below binary tree and given sum as 43, red nodes will be pruned as they are not the paths with sum 43.

Prune nodes not on path with given sum
Prune nodes not on path with given sum

Prune nodes in a binary tree: thoughts

To solve this problem, first, understand how to find paths with a given sum in a binary tree.  To prune all nodes which are not on these paths,  get all the nodes which are not part of any path and then delete those nodes one by one. It requires two traversals of the binary tree.
Is it possible to delete a node while calculating the path with a given sum? At what point we find that this is not the path with given sum? At the leaf node.
Once we know that this leaf node is not part of the path with given sum, we can safely delete it.  What happens to this leaf node? We directly cannot delete the parent node as there may be another subtree which leads to a path with the given sum. Hence for every node, the pruning is dependent on what comes up from its subtrees processing.

At the leaf node, we return to parent false if this leaf node cannot be part of the path and delete the leaf node. At parent node, we look for return values from both the subtrees. If both subtrees return false, it means this node is not part of the path with the given sum. If one of the subtrees returns true, it means the current node is part of a path with the given sum. It should not be deleted and should return true to its parent.

Prune nodes from a binary tree: implementation

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
 
struct node{
	int value;
	struct node *left;
	struct node *right;
};
typedef struct node Node;

#define true 1
#define false 0

int prunePath(Node *node, int sum ){
	
	if( !node ) return true;
	
	int subSum =  sum - node->value;
	/* To check if left tree or right sub tree 
	contributes to total sum  */
	
	int leftVal = false, rightVal = false;
	
	/*Check if node is leaf node */
	int isLeaf = !( node->left || node->right );
	
	/* If node is leaf node and it is part of path with sum
	= given sum return true to parent node so tha parent node is
	not deleted */
	if(isLeaf && !subSum )
		return true;
		
	/* If node is leaf and it not part of path with sum 
	equals to given sum
    Return false to parent node */
    else if(isLeaf && subSum ){
    	free(node);
    	return false;
    }
    /* If node is not leaf and there is left child 
	Traverse to left subtree*/
    leftVal = prunePath(node->left, subSum);
    
    /* If node is not leaf and there is right child
	 Traverse to right subtree*/
    rightVal = prunePath(node->right, subSum);
    
    /* This is crux of algo.
    1. If both left sub tree and right sub tree cannot lead to
	path with given sum,Delete the node 
    2. If any one sub tree can lead to path with sum equal
	to given sum, do not delete the node */ 
    if(!(leftVal || rightVal) ){
    	free(node);
    	return false;
    }
    if(leftVal || rightVal ){
    	if(leftVal)
    		node->right = NULL;
    	if(rightVal)
    		node->left = NULL;
    	return true;
    }
    return true ;
}

void inoderTraversal(Node * root){
	if(!root)
		return;
	
	inoderTraversal(root->left);
	printf("%d ", root->value);
	inoderTraversal(root->right);
}
Node *createNode(int value){
	Node * newNode =  (Node *)malloc(sizeof(Node));
	newNode->value = value;
	newNode->right= NULL;
	newNode->left = NULL;
	
	return newNode;
}
Node *addNode(Node *node, int value){
	if(node == NULL){
		return createNode(value);
	}
	else{
		if (node->value > value){
			node->left = addNode(node->left, value);
		}
		else{
			node->right = addNode(node->right, value);
		}
	}
	return node;
}

/* Driver program for the function written above */
int main(){
	Node *root = NULL;
	//Creating a binary tree
	root = addNode(root,30);
	root = addNode(root,20);
	root = addNode(root,15);
	root = addNode(root,25);
	root = addNode(root,40);
	root = addNode(root,37);
	root = addNode(root,45);
	
	inoderTraversal(root);	
	prunePath(root, 65);
	
	printf( "\n");
	if( root ){
		inoderTraversal(root);	
	}
	return 0;
}

The complexity of this algorithm to prune all nodes which are not on the path with a given sum is O(n).

Please share if there is something wrong or missing. If you are preparing for interviews, please signup for free interview material.

Interval partitioning problem

Interval partitioning problem

In continuation of greedy algorithm problem, (earlier we discussed : even scheduling and coin change problems) we will discuss another problem today. Problem is known as interval partitioning problem and it goes like : There are n lectures to be schedules and there are certain number of classrooms. Each lecture has a start time si and finish time fi. Task is to schedule all lectures in minimum number of classes and there cannot be more than one lecture in a classroom at a given point of time. For example, minimum number of classrooms required to schedule these nine lectures is 4 as shown below.

interval partition

However,  we can do some tweaks and manage to schedule same nine lectures in three classrooms as shown below.

So, second solution optimizes the output.

Another variant of this problem is :  You want to schedule jobs on a computer. Requests take the form (si , fi) meaning a job that runs from time si to time fi. You get many such requests, and you want to process as many as possible, but the computer can only work on one job at a time.

Interval partitioning : Line of thought

First thing to note about interval partitioning problem is that we have to minimize something, in this case, number of classrooms. What template this problem fits into? Greedy may be? Yes it fits into greedy algorithm template. In greedy algorithm we take decision on local optimum.

Before discussing the solution, be clear that what is resource and what needs to be minimized? In this problem, resource is classroom and total number of classroom needs to be minimized by arranging lectures in certain order.

There are few natural orders in which we can arrange all lectures or for sake of generality, tasks. First is to arrange them in order of finish time,  second is to arrange in order of start time, third is to order them by smallest duration of task, fourth is by minimum number of conflicting jobs. Which one to chose?
You can come up with counter example when if lectures are arranged in classrooms by order of their end time, or smallest duration or minimum number of conflicting jobs, it does not end to optimal solution  So, let’s pick lectures based on earliest start time. At any given pint of time, pick lecture with least start time and yet not scheduled and then assign it to first available class. Will it work? Sure it does.  When you have assigned all lectures, total number of classrooms will be minimum number of classrooms required.

Interval partitioning algorithm

1. Sort all lectures based on start time in ascending order.
2. Number of initial classrooms = 0
3. While lecture to be scheduled:
   3.1 Take first lecture yet not scheduled,
   3.2 If there a already a class available for lecture's start time
       Assign lecture to the class.
   3.3 If not, then allocate a new classroom
       number of classroom = number of classroom + 1
4. Return number of classrooms.

Before jumping into the code, let’s discuss some data structures which we can use to implement this algorithm.

Understand that we have to find a compatible classroom for a lecture. There are many classrooms, we need to check if the finish time of lecture in that classroom is less than start time of new lecture. If yes , then classroom is compatible, if there is no such class, allocate a new class. If we store our allocated classrooms in such a way that it always gives classroom with least finish time of last lecture scheduled there, we can safely say that if this classroom is not compatible, none of the others will be.(Why?) Every time we assign a lecture to a classroom, sort the list of classroom, so that first classroom is with least finish time.  Sort has complexity of O(n log n) and if we do it for all n intervals, overall complexity of algorithm will be O(n2 log n).

We are sorting just to find minimum end time across all classrooms. This can easily be achieved by min heap or priority queue keyed on finish time of last lecture of class. Every time finish time of last lecture changes for a classroom, heap is readjusted and root gives us classroom with min finish time.

  • To determine whether lecture j is compatible with some classroom, compare sj to key of min classroom k in priority queue.
  • When a lecture is added to a classroom,  increase key of classroom k to fj.

Well know we have algorithm and data structure to implement in, so let’s code it.

PrioritityQueue implementation is given below:

import heapq
# This is our priority queue implementation
class PriorityQueue:
    def __init__(self):
        self._queue = []
        self._index = 0
 
    def push(self, item, priority):
        heapq.heappush(self._queue, (priority, self._index, item))
        self._index += 1
 
    def pop(self):
        if(self._index == 0):
                return None
        return heapq.heappop(self._queue)[-1];

Classroom class implementation

class Classroom:
	def __init__(self, number, finish_time):
		self.class_num = number
		self.finish_time = finish_time
	def __repr__(self):
		return 'Classroom({!r})'.format(self.class_num)

Interval partitioning problem : Implementation

from PriorityQueue import PriorityQueue
from Classroom import Classroom

jobs = [(1, 930, 1100),
        (2, 930, 1300),
        (3, 930, 1100),
        (5, 1100, 1400),
        (4, 1130, 1300),
        (6, 1330, 1500),
        (7, 1330, 1500),
        (8,1430,1700),
        (9, 1530, 1700),
        (10, 1530, 1700)
]

def find_num_classrooms():
	num_classrooms = 0;
	priority_queue = PriorityQueue()

	for job in jobs:
		# we have job here, now pop the classroom with least finishing time
		classroom = priority_queue.pop();
		if(classroom == None) :
			#allocate a new class
			num_classrooms+= 1;
			priority_queue.push(Classroom(num_classrooms,job[2]),job[2]);
		else:
			#check if finish time of current classroom is
			#less than start time of this lecture
			if(classroom.finish_time  <= job[1]):
				classroom.finish_time = job[2]
				priority_queue.push(classroom,job[2])
			else:
				num_classrooms+= 1;
				#Since last classroom needs to be compared again, push it back
				priority_queue.push(classroom,job[2])
				#Push the new classroom in list
				priority_queue.push(Classroom(num_classrooms,job[2]),job[2])

    return  num_classrooms
	
print "Number of classrooms required: " +  find_num_classrooms();

Java Implementation

package com.company;

import java.util.*;

/**
 * Created by sangar on 24.4.18.
 */
public class IntervalPartition {

    public static int findIntervalPartitions(ArrayList<Interval> intervals){
        PriorityQueue<Interval> queue =
                new PriorityQueue<Interval>(intervals.size(), Comparator.comparing(p -> p.getEndTime()));

        for(Interval currentInterval : intervals) {
            if (queue.isEmpty()) queue.add(currentInterval);
            else {
                if (queue.peek().getEndTime() > currentInterval.getStartTime()) {
                    queue.add(currentInterval);
                } else {
                    queue.remove();
                    queue.add(currentInterval);
                }
            }
        }
        return queue.size();
    }

    public static void main(String args[] ) throws Exception {
        ArrayList<Interval> intervals = new ArrayList<>();

        intervals.add(new Interval(930,1100));
        intervals.add(new Interval(930,1300));
        intervals.add(new Interval(930,1100));
        intervals.add(new Interval(1130,1300));
        intervals.add(new Interval(1100,1400));
        intervals.add(new Interval(1330,1500));
        intervals.add(new Interval(1330,1500));
        intervals.add(new Interval(1430,1700));
        intervals.add(new Interval(1530,1700));

        Collections.sort(intervals, Comparator.comparing(p -> p.getStartTime()));

        int minimumClassRooms = findIntervalPartitions(intervals);
        System.out.println(minimumClassRooms);
    }
}

This algorithm takes overall time of O(n log n) dominated by the sorting of jobs on start time. Total number of priority queue operations is O(n) as we have only n lectures to schedule and for each lecture we have push and pop operation.

Reference :

There is another method using binary search algorithm which can be used to solve this problem. As per problem statement, we have to find minimum number of classrooms to schedule n lectures. What are the maximum number of classrooms required? It will be number of lectures when all lectures conflict with each other.
Minimum number of classrooms will be 0 when there is no lecture to be scheduled. Now, we know the range of values of classrooms. How can we find minimum?

Basic idea is that if we can schedule all n lectures in m rooms, then we can definitely schedule them in m+1 and more rooms. So minimum number of rooms required will be either m or less than it. In this case, we can safely discard all candidate solution from m to n (remember n is the maximum number of classrooms).
Again what if we can not schedule lectures in m rooms, then there is no way we can schedule them in less than m rooms. Hence we can discard all candidate solutions less than m.

How can we select m? We can select is as mid of range which is (0,n). And try to fit all lectures on those m rooms based on condition that none of lecture conflicts. Keep track of end time of last lecture of each classroom. If none of the classroom has end time less than start time of new lecture, allocate new class. If total number of classrooms is less than or equal to m, discard m+1 to n. If it is more than m, then discard 0 to m and search for m+1 to n.

package com.company;

import java.util.*;

/**
 * Created by sangar on 24.4.18.
 */
public class IntervalPartition {

    public static boolean predicate(ArrayList<Interval> intervals, long candidateClassRooms){

        int i = 0;

        PriorityQueue<Interval> queue =
                new PriorityQueue<Interval>(intervals.size(), Comparator.comparing(p -> p.getEndTime()));

        for(Interval currentInterval : intervals){
            if(queue.isEmpty()) queue.add(currentInterval);
            else{
                if(queue.peek().getEndTime() > currentInterval.getStartTime()){
                    queue.add(currentInterval);
                }
                else{
                    queue.remove();
                    queue.add(currentInterval);
                }
            }
        }

        return queue.size() <= candidateClassRooms;
    }

    public static void main(String args[] ) throws Exception {
        ArrayList<Interval> intervals = new ArrayList<>();

        intervals.add(new Interval(930,1100));
        intervals.add(new Interval(930,1300));
        intervals.add(new Interval(930,1100));
        intervals.add(new Interval(1130,1300));
        intervals.add(new Interval(1100,1400));
        intervals.add(new Interval(1330,1500));
        intervals.add(new Interval(1330,1500));
        intervals.add(new Interval(1430,1700));
        intervals.add(new Interval(1530,1700));

        long low = 0;
        long high = intervals.size();

        Collections.sort(intervals, Comparator.comparing(p -> p.getStartTime()));

        while(low < high){
            long mid  = low + ( (high - low) >> 1);

            if(predicate(intervals, mid)){
                high = mid;
            }else{
                low = mid+1;
            }
        }
        System.out.println(low);
    }
}

Complexity of algorithm is dependent on number of lectures to be scheduled which is O(n log n ) with additional space complexity of O(c) where c is number of classrooms required.

Please share your views and suggestions in comments and feel free to share and spread the word. If you are interested to share your knowledge to learners across the world, please write to us on communicati[email protected]

Median of two sorted arrays

Before we find the median of two sorted arrays, let’s understand what is the median?

Median is the middle value in a list of numbers.

For example,

Input:
A = [2,4,5,6,7,8,9].
Output:
6

To find the median, the input should be sorted. If it is not sorted, then first sort it and return the middle of that list. The question arises is what if the number of elements in the list is even? In that case, the median is the average of two middle elements.

Median of two sorted arrays

There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

median of two sorted arrays

Before going into the post, find a pen and paper and try to work out an example. And as I tell in our posts, come up with a method to solve this considering, you have all the time and resources to solve this problem. I mean think of most brute force solutions.

Let’s simplify the question first and then work it upwards. If question was to find the median of one sorted array, how would you solve it?
If array has odd number of elements in it, return A[mid], where mid = (start + end)/2; if array has even number of elements, return average of A[mid] + A[mid+1].
For example for array A = [1,5,9,12,15], median is 9.
Complexity of this operation is O(1).

Focus back on 2 sorted arrays. To find a median of 2 sorted arrays in no more simple and definitely not O(1) operation. For example,

A = [ 1,5,9,12,15] and B = [ 3,5,7,10,17], median is 8.

How about merging these two sorted arrays into one, the problem is reduced to find the median of one array.

Although to find median in a sorted array is O(1), merge step takes O(n) operations. Hence, overall complexity would be O(n).
Reuse the merge part of Merge sort algorithm to merge two sorted arrays.

Start from the beginning of two arrays and advance the pointer of the array whose current element is smaller than the current element of the other. This smaller element is put on to output array which is sorted merged array. Merge will use an additional space to store N elements (Note that N is here sum of the size of both sorted arrays). The best part of this method is that it does not consider if the size of the two arrays is the same or different.

This can be optimized, by counting number of elements n, in two arrays in advance. Then we need to merge only n/2 + 1 elements if n is even and n/2 if n is odd. This saves us O(n/2) space.

There is another optimization: do not store all n/2 or n/2 + 1 elements while merging, keep track of last two elements in sorted array, and count how many elements are sorted. When n/2 + 1 elements are sorted return average of last two elements if n is even, else return n/2 element as the median. With these optimizations, time complexity remains O(n), however, space complexity reduces to O(1).

Impementation with merge function

package com.company;

/**
 * Created by sangar on 18.4.18.
 */
public class Median {

    public static double findMedian(int[] A, int[] B){
        int[] temp = new int[A.length + B.length];

        int i = 0;
        int j = 0;
        int k = 0;
        int lenA = A.length;
        int lenB = B.length;

        while(i<lenA && j<lenB){
            if(A[i] <= B[j]){
                temp[k++] = A[i++];
            }else{
                temp[k++] = B[j++];
            }
        }
        while(i<lenA){
            temp[k++] = A[i++];
        }
        while(j<lenB){
            temp[k++] = B[j++];
        }

        int lenTemp = temp.length;

        if((lenTemp)%2 == 0){
            return ( temp[lenTemp/2-1] + temp[lenTemp/2] )/2.0;
        }
        return temp[lenTemp/2];
    }

    public static void main(String[] args){
        int[] a = {1,3,5,6,7,8,9,11};
        int[] b = {1,4,6,8,12,14,15,17};

        double median = findMedian(a,b);
        System.out.println("Median is " + median);
    }
}

Optimized version to median of 2 sorted arrays

package com.company;

/**
 * Created by sangar on 18.4.18.
 */
public class Median {

    public  static int findMedianOptimized(int[] A, int[] B){
        int i = 0;
        int j = 0;
        int k = 0;
        int lenA = A.length;
        int lenB = B.length;

        int mid = (lenA + lenB)/2;
        int midElement = -1;
        int midMinusOneElement = -1;

        while(i<lenA && j<lenB){
            if(A[i] <= B[j]){
                if(k == mid-1){
                    midMinusOneElement = A[i];
                }
                if(k == mid){
                    midElement = A[i];
                    break;
                }
                k++;
                i++;
            }else{
                if(k == mid-1){
                    midMinusOneElement = B[j];
                }
                if(k == mid){
                    midElement = B[j];
                    break;
                }
                k++;
                j++;
            }
        }
        while(i<lenA){
            if(k == mid-1){
                midMinusOneElement = A[i];
            }
            if(k == mid){
                midElement = A[i];
                break;
            }
            k++;
            i++;
        }
        while(j<lenB){
            if(k == mid-1){
                midMinusOneElement = B[j];
            }
            if(k == mid){
                midElement = B[j];
                break;
            }
            k++;
            j++;
        }

        if((lenA+lenB)%2 == 0){
            return (midElement + midMinusOneElement)/2;
        }
        return midElement;
    }

    public static void main(String[] args){
        int[] a = {1,3,5,6,7,8,9,11};
        int[] b = {1,4,6,8,12,14,15,17};

        double median = findMedianOptimized(a,b);
        System.out.println("Median is " + median);
    }
}

Binary search approach

One of the properties which lead us to think about binary search is that two arrays are sorted. Before going deep into how binary search algorithm can solve this problem, first find out mathematical conditions which should hold true for a median of two sorted arrays.

As explained above, median divides input into two equal parts, so first condition median index m satisfy is a[start..m] and a[m+1..end] are equal size. We have two arrays A and B, let’s split them into two. The first array A is of size m, and it can be split into m+1 ways at 0 to m.

If we split at i, len(Aleft) – iand len(Aright) = m-i.
When i=0, len(Aleft) = 0 and when i=m, len(Aright) = 0.

median of two sorted arrays

Similarly, for array B, we can split it into n+1 way, j being from 0 to n.

find median of sorted arrays

After splitting at specific indices i and j, how can we derive the condition for the median: left part of the array should be equal to the right part of the array?

If len(Aleft) + len(Bleft) == len(Aright) + len(Bleft) , it satisfies our condition. As we already know these values for split at i and j, equation becomes

i+j = m-i + n-j

median of two sorted array

But is this the only condition to satisfy for the median? As we know, the median is middle of the sorted list, we have to guarantee that all elements on the left array should be less than elements in the right array.

It is must that max of left part is less than min of right part. What is max of left part? It can be either A[i-1] or B[j-1]. What can be min of right part? It can be either A[i] or B[j].

We already know that, A[i-1] < A[i] and B[j-1] < B[j] as arrays A and B are sorted. All we need to check if A[i-1] <= B[j] and B[j-1] <= A[i], if index i and j satisfy this conditions, then median will be average of max of left part and min of right part if n+m is even and max(A[i-1], B[j-1]) if n+m is odd.

Let’s make an assumption that n>=m, then j = (n+m+1)/2 -i, it will always lead to j as a positive integer for possible values of i (o ~m) and avoid array out of bound errors and automatically makes the first condition true.

Now, problem reduces to find index i such that A[i-1] <= B[j] and B[j-1]<=A[i] is true.

This is where binary search comes into the picture. We can start as mid of array A, j = (n+m+1)/2-i, and see if this i satisfies the condition. There can be three possible outcomes for the condition.
1. A[i-1] <= B[j] and B[j-1]<=A[i] is true, we return the index i.
2. If B[j-1] > A[i], in this case, A[i] is too small. How can we increase it? by moving towards right. If i is increased, value A[i] is bound to increase, and also it will decrease j. In this case, B[j-1] will decrease and A[i] will increase which will make B[j-1]<=A[i] true. So, limit search space for i to mid+1 to mand go to step 1.
3. A[i-1] > B[j], means A[i-1] is too big. And we must decrease i to get A[i-1]<=B[j]. Limit search space for i to 0 mid-1 and go to step 1

Let’s take an example and see how this works. Out initial two arrays as follows.

median of two sorted arrays leetcode

The index i is mid of array A and corresponding j will as shown

median of two sorted arrays leetcode solution

Since condition B[j-1] <= A[i] is not met, we discard left of A and right of B and find new i and j based on remaining array elements.

get median of two sorted arrays

Finally, our condition that A[i-1]<= B[j] and B[j-1] <=A[i] is satisfied, find the max of left and min of right and based on even or odd length of two arrays, return average of the max of left and min of right or return a max of left.

This algorithm has dangerous implementation caveat, what if i or j is 0, in that case, i-1 and j-1 will be invalid indices. When can j be zero, when i==m. Till i<m, no need to worry about j being zero. So be sure to check i<m and i>0, when we are checking j-1 and i-1 respectively.

Implementation

package com.company;

/**
 * Created by sangar on 18.4.18.
 */
public class Median {

    public static double findMedianWithBinarySearch(int[] A, int[] B){

        int[] temp;

        int lenA = A.length;
        int lenB = B.length;

        /*We want array A to be always smaller than B
          so that j is always greater than zero
         */
        if(lenA > lenB){
            temp = A;
            A = B;
            B = temp;
        }

        int iMin = 0;
        int iMax = A.length;
        int midLength =  ( A.length + B.length + 1 )/2;

        int i = 0;
        int j = 0;

        while (iMin <= iMax) {
            i = (iMin + iMax) / 2;
            j = midLength - i;
            if (i < A.length && B[j - 1] > A[i]) {
                // i is too small, must increase it
                iMin = i + 1;
            } else if (i > 0 && A[i - 1] > B[j]) {
                // i is too big, must decrease it
                iMax = i - 1;
            } else {
                // i is perfect
                int maxLeft = 0;
                //If there we are at the first element on array A
                if (i == 0) maxLeft = B[j - 1];
                //If we are at te first element of array B
                else if (j == 0) maxLeft = A[i - 1];
                //We are in middle somewhere, we have to find max
                else maxLeft = Integer.max(A[i - 1], B[j - 1]);

                //If length of two arrays is odd, return max of left
                if ((A.length + B.length) % 2 == 1)
                    return maxLeft;

                int minRight = 0;
                if (i == A.length) minRight = B[j];
                else if (j == B.length) minRight = A[i];
                else minRight = Integer.min(A[i], B[j]);

                return (maxLeft + minRight) / 2.0;
            }
        }
        return -1;
    }

    public static void main(String[] args){
        int[] a = {1,3,5,6,7,8,9,11};
        int[] b = {1,4,6,8,12,14,15,17};

        double median = findMedian(a,b);
        System.out.println("Median is " + median);
    }
}

The complexity of this algorithm to find the median of two sorted arrays is log(max(m,n)) where m and n are the size of two arrays.

Please share your views and suggestions.

Number of occurrences of element

Number of occurrences of element

Given a sorted array and a key, find the number of occurrences of a key in that array. For example, in the below array, the number of occurrences of 3 is 3.

number of occurrences of element

Brute force method will be to scan through the array, find the first instance of an element and then find the last instance, then do the math. The complexity of that method is O(N). Can we do better than that?

Did you get some hint when brute force method was described? Yes,we have already cracked the problem to find first occurrence and last occurrence in O(log n) complexity earlier. We will be using those two methods, all we need to do know is math.

occurrences = lastInstance - firstInstance + 1

Number of occurrences of element : Implementation.

package com.company;

/**
 * Created by sangar on 25.3.18.
 */
public class BinarySearcchAlgorithm {

    private static boolean isGreaterThanEqualTo(int[] a, int index, int key){
        if(a[index] >= key) return true;

        return false;
    }

    private static boolean isLessThanEqualTo(int[] a, int index, int key){
        if(a[index] <= key) return true;

        return false;
    }

    private int findFirstOccurance(int[] nums, int target){
        int start = 0;
        int end = nums.length-1;
        
        while(start<end){
            int mid =  start + (end-start)/2;
            
            if(if(isGreaterThanEqualTo(nums, mid, target)){){
                end = mid;
            }
            else{
                start = mid+1;
            }
        }
        return start < nums.length && nums[start] == target ? start : -1;
    }
    
    private int findLastOccurance(int[] nums, int target){
        int start = 0;
        int end = nums.length-1;
        
        while(start<=end){
            int mid =  start + (end-start)/2;
        
            if(isLessThanEqualTo(nums, mid, target)){
                start = mid+1;
            }
            else if(nums[mid] > target){
                end = mid-1;
            }
        }
        return end >= 0 && nums[end] == target ? end : -1;
    }

    public  static  int numberOfOccurrences(int[] a, int key){
        int firstInstance = findFirstOccurance(a, key);
        int lastInstance = findLastOccurance(a, key);

        return (firstInstance != -1) ? lastInstance-firstInstance + 1 : 0;
    }

    public static void main(String[] args) {
        int[] input = {3,10,11,15,17,17,17,20};

        int index = numberOfOccurrences(input,3);
        System.out.print(index == -1 ? "Element not found" : "Element found at : " + index);

    }
}

The worst case time complexity of the algorithm to find the number of occurrences of an element in a sorted array is O(log n). We are using the iterative method to find the first and last instances, therefore, there is no hidden space complexity of the algorithm.

You can test the code at leetcode
Please share if there is something wrong or missing. Also if you want to contribute to algorithms and me, please drop an email at [email protected]

Find element in sorted rotated array

Find element in sorted rotated array

To understand how to find element in sorted rotated array, we must understand first, what is a sorted rotated array? An array is called sorted where for all i and j such that i < j, A[i] <= A[j]. A rotation happens when last element of array is push at the start and all elements on array move right by one position. This is called as rotation by 1. If new last element is also pushed to start again all elements are moved to right again, it’s rotation by 2 and so on.

Find element in sorted rotated array
Sorted array

Sorted rotated array

Question which is very commonly asked in Amazon and Microsoft initial hacker round interviews or telephonic interviews : Given a sorted rotated array, find position of an element in that array. For example:

A = [2,3,4,1] Key = 4, Returns 2 which is position of 4 in array

A = [4,5,6,1,2,3] Key = 4 returns 0

Find element in sorted rotated array : Thought process

Before starting with any solution, it’s good to ask some standard questions about an array problem, for example, if duplicate elements are allowed or if negative numbers are allowed in array? It may or may not change the solution, however, it gives an impression that you are concerned about input range and type.

First thing to do in interview is come up with brute force solution, why? There are two reasons : first, it gives you confidence that you have something solved, it may not be optimal way but still you have something. Second, now that you have something written, you can start looking where it takes most of time or space and attack the problem there. It also, helps to identify what properties you are not using which are part of the problem and help your solution.

First thing first, what will be the brute force solution? Simple solution will be to scan through the array and find the key. This algorithm will have O(N) time complexity.

There is no fun in finding an element in sorted array in O(N) 🙂 It would have been the same even if array was not sorted. However, we already know that our array is sorted. It’s also rotated, but let’s forget about that for now. What do we do when we have to find an element in sorted array? Correct, we use binary search.

We split the array in middle and check if element at middle index is the key we are looking for? If yes, bingo! we are done.

If not, if A[mid] is less that or greater than key. If it is less, search in right subarray, and it is greater, search in left subarray. Any which way, our input array is reduced to half. Complexity of binary search algorithm is log (N). We are getting somewhere 🙂

Sorted rotated array

However, our input array in not a plain sorted array, it is rotated too. How does things change with that. First, comparing just middle index and discarding one half of array will not work. Still let’s split the array at middle and see what extra conditions come up?
If A[mid] is equal to key, return middle index.
There are two broad possibilities of rotation of array, either it is rotated more than half of elements or less than half of elements. Can you come up with examples and see how array looks like in both the cases?

If array is rotated by more than half of elements of array, elements from start to mid index will be a sorted.

If array is rotated by less than half of elements of array, elements from mid to end will be sorted.

Next question, how do you identify the case, where array is rotated by more or less than half of elements? Look at examples you come up with and see if there is some condition?

Yes, the condition is that if A[start] < A[mid], array is rotated more than half and if A[start] > A[mid], it is rotated by less than half elements.

Now, that we know, which part of array is sorted and which is not. Can we use that to our advantage?

Case 1 : When array from start to mid is sorted. We will check if key > A[start] and key < A[mid]. If that’s the case, search for key in A[start..mid]. Since, A[start..mid] is sorted, problem reduces to plain binary search. What if key is outside of start and middle bounds, then discard A[start..mid] and look for element in right subarray. Since, A[mid+1..right] is still a sorted rotated array, we follow the same process as we did for the original array.

Case 2 : When array from mid to end is sorted. We will check if key >A[mid] and key < A[end]. If that’s the case, search for key in A[mid+1..end]. Since, A[mid+1..end] is sorted, problem reduces to plain binary search. What if key is outside of mid and end bounds, then discard A[mid..end] and search for element in left subarray. Since, A[start..mid-1] is still a sorted rotated array, we follow the same process as we did for the original array.

Let’s take an example and go through the entire flow and then write concrete algorithm to find element in sorted rotated array.

Below is sorted rotated array given and key to be searched is 6.

We know, A[start] > A[mid], hence check if searched key fall under range A[mid+1..end]. In this case, it does. Hence, we discard A[start..mid].

At this point, we have to options:  either fallback to traditional binary search algorithm or continue with same approach of discarding based on whether key falls in range of sorted array. Both methods work. Let’s continue with same method.

Again find middle of array from middle +1 to end.

A[mid] is still not equal to key. However, A[start] < A[mid]; hence, array from A[start] to A[middle] is sorted. See if our key falls between A[start] and A[mid]? Yes, hence, we discard the right sub array A[mid..End]

Find the middle of remaining array, which is from start to old middle – 1.

Is A[mid] equal to key? No. Since, A[start] is not less than A[mid], see if key falls under A[mid+1..end], it does, hence discard the left subarray.

Now, new middle is equal to key are searching for. Hence return the index.

Similarly, we can find 11 in this array. Can you draw the execution flow that search?

Algorithm to find element in sorted rotated array

  1. Find mid =  (start + end)/ 2
  2. If A[mid] == key; return mid
  3. Else, if A[start] < A[end]
    • We know, left subarray is already sorted.
    • If A[start] < key and A[mid] > key :
      • discard A[mid..end]
      • Continue with new subarray with start and end = mid – 1
    • Else:
      • discard A[start..mid]
      • Continue with new subarray with start = mid + 1 and end
  4. Else
    • We know, right subarray is sorted.
    • If A[mid] < key and A[end] > key :
      • discard A[start..mid]
      • Continue with new subarray with start  = mid + 1 and end
    • Else:
      • discard A[start..mid]
      • Continue with new subarray with start and end = mid – 1

Find element in sorted rotated array : Implementation

package com.company;

/**
 * Created by sangar on 22.3.18.
 */
public class SortedRotatedArray {

    public static int findElementRecursive(int[] input, int start, int end, int key){

        if(start <= end){
            int mid = start + (end - start) / 2;

            if(input[mid] == key) return mid;

            else if(input[start] <= input[mid]){
                /*Left sub array is sorted, check if
                key is with A[start] and A[mid] */
                if(input[start] <= key && input[mid] > key){
                    /*
                      Key lies with left sorted part of array
                     */
                    return findElementRecursive(input, start, mid - 1, key);
                }else{
                    /*
                    Key lies in right subarray
                     */
                    return findElementRecursive(input, mid + 1, end, key);
                }
            }else {
                /*
                 In this case, right subarray is already sorted and
                 check if key falls in range A[mid+1] and A[end]
                 */
                if(input[mid+1] <= key && input[end] > key){
                    /*
                      Key lies with right sorted part of array
                     */
                    return findElementRecursive(input, mid + 1 , end, key);
                }else{
                    /*
                    Key lies in left subarray
                     */
                    return findElementRecursive(input, start, mid - 1, key);
                }
            }
        }
        return -1;
    }

    public static void main(String[] args) {
        int[] input = {10,11,15,17,3,5,6,7,8,9};

        int index = findElementRecursive(input,0, input.length-1, 6);
        System.out.print(index == -1 ? 
              "Element not found" : "Element found at : " + index);

    }
}

Iterative implementation

package com.company;

/**
 * Created by sangar on 22.3.18.
 */
public class SortedRotatedArray {

    public static int findElementIteratve(int[] input, int start, int end, int key) {

        while (start <= end) {
            int mid = start + (end - start) / 2;

            if (input[mid] == key) return mid;

            else if (input[start] <= input[mid]) {
                /*Left sub array is sorted, check if
                key is with A[start] and A[mid] */
                if (input[start] <= key && input[mid] > key) {
                    /*
                      Key lies with left sorted part of array
                     */
                    end = mid - 1;
                } else {
                    /*
                    Key lies in right subarray
                     */
                    start  = mid + 1;
                }
            } else {
                /*
                 In this case, right subarray is already sorted and
                 check if key falls in range A[mid+1] and A[end]
                 */
                if (input[mid + 1] <= key && input[end] > key) {
                    /*
                      Key lies with right sorted part of array
                     */
                    start = mid + 1;
                } else {
                    /*
                    Key lies in left subarray
                     */
                    end  = mid - 1;
                }
            }
        }
        return -1;
    }

    public static void main(String[] args) {
        int[] input = {10,11,15,17,3,5,6,7,8,9};

        int index = findElementIteratve(input,0, input.length-1, 6);
        System.out.print(index == -1 ? "Element not found" : "Element found at : " + index);

    }
}

Complexity of above recursive and iterative algorithm to find an element in a rotated sorted array is O(log n). Recursive implementation has implicit space complexity of O(log n)

What did we learn today? We learned that it’s always better to come up with non-optimized solution first and then try to improve it. Also helps to correlate problem with similar and simpler problem like we understood first what is best way to find an element in sorted array and then extrapolated the solution with additional conditions for our problem.

I hope that this post helped you with this problem and many more similar problems you will see in interviews.

Please share if you have some questions or suggestions. Also, if you want to contribute on this learning process of others, please contact us.

Longest alternating Subsequence

In this post, we will discuss another dynamic programming problem called the longest zigzag subsequence which can be solved using dynamic programming.

A sequence of numbers is called a alternating sequence if differences between successive numbers strictly alternate between positive and negative value. In other words, alternate subsequence is where elements of subsequence are alternate increasing and decreasing order, means, they satisfy below conditions:

x1 < x2 > x3 < x4 > x5 < ….  x2 < x3 > x4 < x5 > …. xn

A sequence with fewer than two elements is trivially a zigzag subsequence.

For example, 1,9,3,9,1,6 is a zigzag sequence because the differences (8,-6,6,-8,5) are alternately positive and negative. In contrast, 1,6,7,4,5 and 1,9,4,4,5 are not zigzag sequences, first sequence is not because its first two differences are positive and second because its last difference is zero.
Coming to the problem of the day: Given an array of integers, find longest alternating subsequence.

We have already seen a similar problem longest increasing subsequence in an array. That problem is solved using a dynamic programming approach. To apply dynamic programming, we need to properties: first, Optimal subproblem structure, that is the solution of the original problem depends on the optimal solution of subproblem; and second, overlapping subproblems, so that we can save computation by memoization.

Do these two properties exist in this problem? Does the longest zigzag subsequence till length i has anything to do with the longest zigzag subsequence till j where j is less than i? Also, it is already clear that alternating subsequence can start with decreasing first and then increasing or increasing first and then decreasing.

To add ith as next element in subsequence, consider two cases. First, ith element can be greater than previous element in longest zigzag subsequence till j where j < i. In this case, we are looking for all such j where A[j] < A[i]. Another criterion for j should be that A[j] less than the previous element in the sequence, that means, at j, we are looking exactly opposite condition than that i.

Second, ith element can be less than previous element in longest zigzag subsequence till j where j < i. In this case, we are looking for all such j where A[j] > A[i]. Another criterion for j should be that A[j] is greater than the previous element in the sequence, that means, at j again, we are looking exactly opposite condition than that at i.
For each i we will store these two.

Let’s say increase[i] describes LZS, for the first case and decrease[i] describes it for the second case.

  increase[i] = max(decrease[j] + 1) for all j< i && A[j] < A[i]
  decrease[i] = max(increase[j] + 1) for all j< i && A[j] > A[i]

Longest alternating subsequence dynamic programming approach

Before going through the implementation, it will be great if you can go through Longest increasing subsequence using dynamic programming
Implementation wise, both increase and decrease array can be one two dimensional array Table[][]. Table[i][0] represents length of longest zigzag subsequence ending at i with A[i] being greater than A[j] for all j in earlier subsequences.

Similarly, Table[i][1] represents length of subsequence ending at i with A[i] being less than A[j] for all j in earlier subsequences.

Table(i,0) = max(Table(j,1) + 1); 
             for all j < i and A[j] < A[i] 
Table(i,1) = max(Table(j,0) + 1); 
             for all j < i and A[j] > A[i]

What will be length of longest zigzag subsequence for index i?

Result =  max (Table(i,0), Table(i,1))

Click here to see longest alternating subsequence implementation

#include <stdio.h>
#include <stdlib.h>
 
int max(int a, int b) {  return (a > b) ? a : b; }
 
int longestZigzagSubsequence(int A[], int n)
{
    int Table[n][2];
 
    for (int i=0; i<n; i++){
    	Table[i][0] = 1; 
    	Table[i][1] = 1;
    }
 
    int result = 1;
 
    for (int i=1; i<n; i++) {
        for (int j=0; j<i; j++){
        	// If A[i] is greater than last element in subsequence, 
        	//then check with Table[j][1]
        	if (A[j] < A[i] && Table[i][0] < Table[j][1] + 1)
                    Table[i][0] = Table[j][1] + 1;
                /* If A[i] is smaller than last element in subsequence,
                then check with Table[j][0] */
                if( A[j] > A[i] && Table[i][1] < Table[j][0] + 1)
                   Table[i][1] = Table[j][0] + 1;
        }
 
        /* Pick maximum of both values at index i  */
        if (result < max(Table[i][0], Table[i][1]))
            result = max(Table[i][0], Table[i][1]);
        printf("\n %d", result);
    }
 
    return result;
}


Complexity of dynamic programming approach to find longest alternate subsequence is O(n2) using O(n) extra space.

Please share if there is something wrong or missing. If you want to contribute to website, please contact us.

Longest Substring Without Repeating Characters

Given a string, find the longest substring without repeating characters in it. For example,

Input:
S = "abcaabaca" 
Output:
3
Explanation:
The longest substring without repeating characters will be "abc"

Input: 
"bbbbb"
Output:
1
Explanation:
The answer is "b", with a length of 1.

A brute force solution will be to scan all substrings of the given string and check which one has the longest length and no repeating characters. For a string with size n, there will be n * (n-1) substrings, and to check it each for unique characters, it will take n comparison in the worst case. So, the worst-case complexity of this algorithm is O(n3) with additional space of O(n). The code is simple enough.

package com.company;

import java.util.HashMap;

/**
 * Created by sangar on 1.1.18.
 */
public class NonRepeatingCharacters {

     boolean allUniqueCharacters(String s, int start, int end) {

        HashMap<Character, Boolean> characters = new HashMap<>();

        for (char c : s.substring(start, end).toCharArray()) {
            if(characters.containsKey(c)) return false;
            characters.put(c, Boolean.TRUE);
        }
        return true;
    }

    int longestSubstringWithoutRepeatingCharacters(String s) {
        int len = s.length();
        int maxLength = 0;
          
        for (int i =0; i < len; i++){
            for (int j=i+1; j<len; j++){
                int length = j-i;
                if (allUniqueCharacters(s, i, j)){
                    maxLength = Integer.max(maxLength, length);
                }
            }
        }
        return maxLength;
    }

    public static void main(String[] args) {
        String s = "abcdabcbb";
        System.out.println(longestSubstringWithoutRepeatingCharacters(s));
    }
}

Sliding window approach

A sliding window is an abstract concept commonly used in array/string problems. A window is a range of elements in array/string which defined by start and end indices. A sliding window is a window which “slides” its two boundaries in a certain direction.
Read fundamentals and template for a sliding window to understand more about it and how it is applied to problems.

In the brute force approach, we repeatedly checked each substring for unique characters. Do we need to check each substring? If a substring s[i,j-1] contains non repeating characters, while adding jth character, check if that character is already present in substring s[i,j-1]. Since we scan substring to ascertain the uniqueness of new characters, the complexity of this algorithm is O(n2).

How about optimizing the scanning part? What if a hash is used to store characters which are already seen in substring s[i,j-1]. In that case, checking the uniqueness of a new character is done in O(1) and overall algorithm complexity becomes linear.

 public  static int longestSubstringWithoutRepeatingCharacters(String s) {
        int len = s.length();
        HashMap<Character, Boolean> characters = new HashMap<>();

        int maxLength = 0;
        int start = 0;
        int  end = 0;
        while (start < len && end < len) {
            //Check only the last character.
            if(!characters.containsKey(s.charAt(end))){
                characters.put(s.charAt(end), Boolean.TRUE);
                end++;
            }
            else {
                int currentLength = end-start;
                maxLength = Integer.max(maxLength, currentLength);
                //Move start of window one position ahead.
                characters.remove(s.charAt(start));
                start++;
            }
        }
        return maxLength;
    }

If a character already present in substring s[i,j-1], that means, it cannot be added to the longest substring. Find the length of substring (j-i) and compare it with the current maximum length. if it is greater, the max length of the longest substring without repeating characters is (j-i).
At last move the window to the position of duplicate.

Below is an example execution of the above code.
longest substring without repeating characters

Longest substring without repeating characters : 3

There is a small optimization that helps us to skip more characters when repeating character is found instead of skipping one at a time. Store the index of each character seen in substring [i,j-1].  While processing jth character, if it is already in the hash, we know the index k where that character is in the string. There is no way that any substring can contain unique characters till k and j are in it. So, we skip all indices from i to k and start from k+1 instead of i+1 as in the above method.

Show me the optimized code

  public static int longestSubstringWithoutRepeatingCharacters3(String s) {
        int len = s.length();
        HashMap<Character, Integer> characters = new HashMap<>();

        int maxLength = 0;

        for (int start=0, end = 0; end <len; end++) {
            if (characters.containsKey(s.charAt(end))) {
                //find the index of duplicate character.
                int currentIndex = characters.get(s.charAt(end));
                start = Integer.max(currentIndex, start) + 1;
            }
            int currentLength = end - start;
            maxLength = Integer.max(maxLength, currentLength);
            //Update new location of duplicate character
            characters.put(s.charAt(end), end );
        }
        return maxLength;
    }

Complexity of find longest substring without repeating characters is hence O(n) with additional space complexity of O(n).
Please share if something is wrong or missing. We would love to hear from you.

Longest common substring

Longest Common Substring

Given two string A and B, find longest common substring in them. For example, A = “DataStructureandAlgorithms” and B=“Algorithmsandme”, then longest common substring in A and B is “Algorithms”. Below figure shows longest common substring.

longest common substring

Brute force solution is to find all substrings of one string and check any of these substring are substring of second string, while comparing, keep track of the longest one we found. There can be n2substring for a string with length n and to find if a string is substring of another, it takes another m operations, where m is length of second string. Hence, overall complexity of this method is O(n2m).

Can we do better than that?

Longest common substring : Line of thoughts

We have to find longest common substring in strings of length M and length N. Can we find longest common substring till length M-1 and N-1 and then derive longest common substring for M and N?  Yes, we can find. The length either grows by one if last characters are equal or reset to zero if last characters are not equal. Why so?

First see why we need to reset to zero when characters are different. This because we are looking for common substring which means characters should be consecutive, any different character restart the the entire search because with those two  different characters, there can’t be any common substring.

What if characters are same? In that case we increment by one, because, longest common substring in N-1 and M-1 would be either 0 or some number based on how any consecutive common characters were till N-1 and M-1.

What will be longest common substring when one of the strings is empty? It will be zero.

So, do you see recursion here? So, let’s write recursion relation and then implement it.

LCS(i,j) = 1+LCS(i-1, j-1) if S[i] = T[j] 
         =  0 otherwise

This recursion relation has optimal subproblem property that solution to the problem actually depends on solutions to subproblems. Also, there are subproblems which will be calculated again and again, which is called overlapping subproblems. These two properties are required for dynamic programming. To not to calculate subproblems, we will use memoization, for that  create a two dimensional array called LCS with dimensions as n and m. LCS[i][j] represents the length of longest common substring in A[0..i] and B[0..j]. And since solution for i-1 and and j-1 is required before solution of i and j, this matrix will be filled bottom up.

Longest common substring using dynamic programming

How to fill LCS[i][j]?

1. Check if A[i] is equal to B[j] 
   1.1 If yes, LCS[i][j] = 1 + LCS[i-1][j-1]
( Because new character is added to already common substring, 
     if any, till A[0...i-1] and B[0,,j-1])
   1.2 if both characters are not same, LCS[i][j] = 0,
       ( Because if characters are not same, there cannot be any
         common substring including A[i] and B[j].

Implementation

#include <stdio.h>
#include <string.h>

int max(int a, int b){
	return a>b ? a:b;
}
 int longestCommonSubstring(char * A, char * B){
     int lenA = strlen(A);
     int lenB = strlen(B);
     int LCS[lenA+1][lenB+1];

     for (int i=0; i<= lenA; i++){
         LCS[i][0] = 0;
     }

     for (int j=0; j <= lenB; j++){
         LCS[0][j] = 0;
     }
	
     int maxLength = 0;
     for (int i=1; i<= lenA; i++){
        for (int j=1; j <= lenB; j++){
            if (A[i] == B[j]){
                LCS[i][j] = 1 + LCS[i-1][j-1];		
                maxLength = max( maxLength, LCS[i][j] );
            } 
            else {
               LCS[i][j] = 0;
            }
         }
     }
     return maxLength;
}

int main(void) {
    char *a = "ABCDEFGSE";
    char *b = "EBCDEFGV";
	
    printf("\n Longest common substring : %d",
			longestCommonSubstring(a,b));
    return 0;
}
package com.company;

/**
 * Created by sangar on 5.1.18.
 */
public class LCS {

    public  static int longestCommonSubstring(String A, String B){
        int lenA = A.length();
        int lenB = B.length();

        int [][] LCS = new int[lenA][lenB];

        for (int i=0; i<lenA; i++){
            LCS[i][0] = 0;
        }

        for (int j=0; j<lenB; j++){
            LCS[0][j] = 0;
        }

        int maxLength = 0;
        for (int i=1; i<lenA; i++){
            for (int j=1; j<lenB; j++){
                if (A.charAt(i) == B.charAt(j)){
                    LCS[i][j] = 1 + LCS[i-1][j-1];
                    maxLength = Integer.max(maxLength, LCS[i][j]);
                }
                else {
                    LCS[i][j] = 0;
                }
            }
        }

        for (int i=0; i<lenA; i++){
            System.out.println();
            for (int j=0; j<lenB; j++){
                System.out.print(" " + LCS[i][j]);
            }
        }
        return maxLength;
    }

    public static void main(String[] args) {
	    String a = "ABCDEFGS";
	    String b = "EBCDEFG";

        System.out.println("Longest common substring :" +
                longestCommonSubstring(a,b));
    }
}

Time complexity of dynamic programming approach to find length of longest common substring in two string is O(n*m) and space complexity is O(n*m) where n and m are lengths of two given strings.

longest common substring dynamic programming

In next post, we will discuss suffix tree method to find LCS which is more optimized than DP solution and can be easily be generalized for multiple strings.

This solution is very similar to Longest common subsequence. Difference between two problems is that a subsequence is collection of characters, which may or may not be contiguous in string, where for a substring, characters must be contiguous. Based on this difference, out solution will vary a bit.

Please share if you find something wrong or missing. If you want to contribute to site, please refer contact us. We would be happy to publish your work and in turn will pay you too.

Find bridges in graph

Given a direct graph, detect bridges in the graph.

An edge is called as bridge edge if and only if on removal of that edge will increases number of components increase by one.

For example, in the below graphs, bridges are shown in green
bridges in graph

The concept of detecting bridges in a graph will be useful in solving the Euler path or tour problem.

Depth First Search of graph can be used to see if graph is connected or not. We can use the same concept, one by one remove each edge and see if the graph is still connected using DFS. If yes, then the edge is not bridge edge, if not, then edge is bridge edge.

However, this method entails quite a complexity of O(E * (V+E)) where E is number of edges and V is number of vertices.

Let’s think something better. Consider that we are looking at the edge (u,v) in a graph. In what condition, we can say that it is a bridge edge?
If we can somehow reach node u or any ancestor of u from any node which is a decedent of v, that means the graph is still connected and (u,v) is not a bridge edge. If the above condition is not possible, then (u,v) is the bridge.

How can we determine that there is no edge from decedent of v to u or its ancestor? For that we need to maintain time when a node was discovered during the depth-first search, call it tin[].

tin[u] is time when node u was discovered using DFS. If d[u] < d[v], means u was discovered before v.

Below is a graph with tin[u] filled for each node.
find bridges in a graph

Now, figure out the lowest tin[x] which can be reached from each node. Reason to find that is to see if there is a node x which is reachable from children of v and has tin[x] less than tin[u], i.e. x is ancestor of u reachable from children of v.

Store lowest DFS ancestor reachable from a node i in an array low[u].
low[u] = min(low[u], low[v])  for edge (u,v)

Idea here is that if (u,v) is an edge, then either there is no back edge from subtree of v to u and ancestor of u.
If there is a back edge to x from subtree of v, then minimum tin[x] reached by node in subtree will be assigned to the low[u].

The diagram shows the calculation of low[] in a graph.
bridge edges in a graph
Finally, if low[v] > tin[u] that means if discovery time of u is less than least ancestor that can be reached from subtree of v, we have a bridge, because there is no way we can reach to an ancestor of u once we disconnect edge (u,v).

Lots of theory, let’s code it. We will be modifying Depth First Search implementation to keep track of tin[] and low[].

Bridges in a graph implementation

package AlgorithmsAndMe;

import java.util.*;

public class Bridges {

    Set<Integer> visited = new HashSet<>();
    /* This map stores the time when the
    current node is visited
     */
    Map<Integer, Integer> tin = new HashMap<>();

    /*
      low will store minimum on 
       tin[v]
       tin[p] for all p for which (v,p) is a back edge
       low[to] for all to for which (v,to) is a tree edge
     */
    Map<Integer, Integer> low = new HashMap<>();
    
    //To maintain monotonic increasing order.
    int timer;

    void dfs(Graph g, int u, int parent) {
        visited.add(u);

        //Put the current timer.
        tin.put(u, timer);
        //Low is the time of entry to start with
        low.put(u,timer);
        
        timer++;
        
        /*
            Go through all the neighbors
         */
        for (int to : g.getNeighbors(u)) {
            //If it is parent, nothing to be done
            if (to == parent) continue;
            
            /* If the neighbor was already visited
                get the minimum of the neighbor entry time
                or the current low of the node.
             */
            if (visited.contains(to)) {
                low.put(u, Math.min(low.getOrDefault(u, Integer.MAX_VALUE),
                        tin.getOrDefault(to, Integer.MAX_VALUE)));
            } else {
                //Else do the DFS
                dfs(g, to, u);
                /*
                 Normal edge scenario,
                 take the minimum of low of the parent and the child. 
                 */
                low.put(u, Math.min(low.getOrDefault(u, Integer.MAX_VALUE),
                        low.getOrDefault(to, Integer.MAX_VALUE)));
                
                /* If low of the child node is less than
                   time of entry of current node, then
                   there is a bridge.
                 */
                if (low.get(to) > tin.get(u))
                    System.out.println(u + "->" + to);
            }
        }
    }

    public void findBridges(Graph g) {
        timer = 0;
        Iterator it = g.getNodes().iterator();
        while(it.hasNext()){
            int i = (int) it.next();
            if (!visited.contains(i))
                dfs(g, i, -1);
        }
    }
}

The complexity of finding bridges in a graph is O(V+E) where V is number of vertices and E is number of edges in graph.

Problems you can solve using this concept:
796 – Critical Links