Trapping rain water

Given n non-negative integers representing an elevation map where the width of each bar is 1, compute how much water it is able to trap after rain. For example,

Input: [0,1,0,2,1,0,1,3,2,1,2,1] 
Output: 6

trapping rain water

Basic thought

We know water stays at the highest level it is able to, and it always maintains the same flat surface. Using this, we can infer that we need to find holes in the elevation where water would be able to rest at a level. To calculate how much water these holes would need to store, we can see that we need to have elevations on both sides, and we also need to track how much space a particular hole would be able to trap the water. Fig below is an example of a hole which holds 4 units of water.

trap water

Upon further breakdown of these holes, we can notice that we do not need to track the entire hole to find the capacity it holds, but we can parse each unit of the hole individually. Thus, the amount of water in each unit of the hole is

min(leftHeight, rightHeight) - currentUnitHeight.

What remains now is to calculate the leftHeight and the rightHeight. We could parse through them individually to find these out, but we can see a general pattern here: The highest elevation to the left inclusive of the current unit will become the leftHeight, and the highest elevation to the right inclusive of the current unit will become the rightHeight. The problem has been greatly simplified into maintaining track of highest heights on both sides of every unit.

Brute force solution

From our observations in the previous section, the simplest brute force approach is to calculate the highest elevation on both sides of every unit, and sum them up together. Each unit takes O(n) time with this approach, and there are n units to calculate in total. Thus, this approach will take O(n2) time.

Show me the brute force implementation

class Solution:
    def trap(self, height: List[int]) -> int:
        if len(height) < 3:
            return 0
        ans = 0
        for i in range(len(height)):
            left_max = 0
            right_max = 0
            for j in range(i + 1):
                left_max = max(left_max, height[j])
            for j in range(i, len(height)):
                right_max = max(right_max, height[j])
            ans += min(left_max, right_max) - height[i]
        return ans

Dynamic Programming approach

As we can see from the brute force solution, we calculate the leftHeight and the rightHeight multiple times for the same node, i.e. the problem has overlapping subproblems. Thus a dynamic programming approach should optimize the brute force approach further. We can store the leftHeight and rightHeight elements till each index we have iterated, and thus the water storage calculation for each unit will now take O(1) time. Overall, this approach passes over the array thrice, and thus has a runtime of O(n) with a space complexity of O(n).

Show me the dynamic programming implementation

class Solution:
    def trap(self, height: List[int]) -> int:
        if len(height) < 3:
            return 0
        ans = 0
        left_max = [0] * len(height)
        right_max = [0] * len(height)
        left_max[0] = height[0]
        right_max[-1] = height[-1]
        for i in range(1, len(height)):
            left_max[i] = max(height[i], left_max[i - 1])
        for i in reversed(range(len(height) - 1)):
            right_max[i] = max(height[i], right_max[i + 1])
        for i in range(1, len(height) - 1):
            ans += min(left_max[i], right_max[i]) - height[i]
        return ans

Stacks

Since we need to keep track of the highest elevations up to a point, stacks are a good approach to perform this operation in one pass of the array. The basic idea is since we need to store the largest elevations in the stack, as we iterate through the array, we can find the amount of water stored till the currentHeight is higher than elements of the stack and move on to the next value, i.e. water stored will always be of a hole shape, thus we can find the amount of water that can be stored between two high values. This operation passes over the array once, and each element can only have two operations maximum: Pushing and popping from the stack, thus its time complexity is O(n). The space complexity will be O(n), in case the entire array is stored on the stack.

Show me the stack implementation

class Solution:
    def trap(self, height: List[int]) -> int:
        if len(height) < 3: return 0 ans = 0 stack = [] for i in range(len(height)): while len(stack) > 0 and height[i] > height[stack[-1]]:
                top = stack.pop()
                if len(stack) == 0:
                    break
                # Distance between the larger value still in the 
                #stack with a hole the height of the top element 
                #and the current element
                distance = i - stack[-1] - 1
                # Water that can be stored is smaller 
                # heights between these bounds, and the height 
                # of the intermediate region between top of the stack 
                # and current index
                curr_height = min(height[i], height[stack[-1]]) - height[top]
                ans += distance * curr_height
            stack.append(i)
        return ans

Simple Optimization

Another way to approach the problem is that since we need to find the maximum elevations on either side to calculate the current water stored, we can calculate the global maximum in one pass, and once we have the index for the same, we can iterate to it and from it to the rest of the array knowing that we have one bounded measurement which is the highest elevation in the array. The time complexity for this operation is O(n), but there are two passes over the array(once to calculate global maximum, and once to calculate the water amount). The space complexity is O(1).

Show me the implementation

class Solution:
    def trap(self, height: List[int]) -> int:
        if len(height) < 3:
            return 0
        ans = 0
        gMax = 0 # Global Max
        for i in range(len(height)):
            gMax = i if height[gMax] < height[i] else gMax
        lMax = 0 # Left max yet
        for i in range(1, gMax):
            lMax = i if height[lMax] <= height[i] else lMax
            ans += max(0, height[lMax] - height[i])
        lMax = len(height) - 1 # Right max yet
        for i in reversed(range(gMax, len(height) - 1)):
            lMax = i if height[lMax] <= height[i] else lMax
            ans += max(0, height[lMax] - height[i])
        return ans

This article is contributed by Khushman Patel

Number of islands

Given a 2d grid map of ‘1’s (land) and ‘0’s (water), count the number of islands. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.

Input:
11110
11010
11000
00000

Output: 1

The first challenge of this problem is to identify that this is a graph problem. To identify that, we have to look if cells of the matrix can represent nodes, and are there any relationship between cells that can act as edges of the graph?
One hint is to the relationship between the cells of the matrix. For example, in the island problem, 0s are water, and 1s are land. Land can be connected to land only which is adjacent to the land and a piece of land is isolated if surrounded by water from all four sides.

To find the number of islands in the matrix, we have to find the number of connected components in the graphs as explained above. Which traversal can we use to find the number of connected components of a graph? It is a depth-first search problem.

In this case, nodes seem to be the cells with value 1(land). Most of the time, in these kinds of problems, each cell of the matrix with a certain value is a node. If we have an n x m matrix, we are looking at n x m possible nodes. Also, here be aware that we are not interested in all the cells. Based on the problem statement, we can discard some cells. For example, in the island problem, we are interested in the cells which represent the land.

Now, what will be the edges? In these kinds of problems, a statement is given like an island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. It means that a cell with value 1 is connected to neighbor cells only if they also contain 1.

Now that we know that cells containing 1 in the matrix are nodes of the graphs and connected to neighbors in they are also 1. To count the number of islands, all we have to do is find connected components in the graph. We reduced a matrix problem to a graph problem after all. Should we create an entire graph? Well no, at any point in time, we do not need the entire graph.

Number of islands implementation

public int numIslands(char[][] grid) {
int count = 0;
Set<Integer> visited = new HashSet&lt;&gt;();
int n = grid.length;

if(n &lt;=0) return 0;
int m = grid[0].length;

for(int i=0; i&lt;n; i++){
    for(int j=0; j&lt;m; j++){
        int cell = i * m + j;
        if(!visited.contains(cell) &amp;&amp; grid[i][j] == '1'){
            count++;
            dfs(i, j, visited, grid);
        }
    }
}
return count;
}

void dfs(int i, int j, Set<Integer> visited, char[][] a){
//Check if the node we are traversing is actually a valid node.
if(i&lt;0 || i&gt;=a.length 
      || j&lt;0 || j&gt;=a[0].length 
      || a[i][j] == '0') return;

int cell = i * a[0].length + j;
if(visited.contains(cell)) return;
visited.add(cell); 

//Visit neighbors
dfs(i+1, j, visited, a);
dfs(i-1, j, visited, a);
dfs(i, j+1, visited, a);
dfs(i, j-1, visited, a);
}

The complexity of the above code is O(n * m) where m and n are rows and columns of the matrix.

Critical Connections in a Network

There are n servers numbered from 0 to n-1 connected by undirected server-to-server connections forming a network where connections[i] = [a, b] represents a connection between servers a and b. Any server can reach any other server directly or indirectly through the network.
critical connection is a connection that, if removed, will make some server unable to reach some other server.
For example,
Input: n = 4, connections = [[0,1],[1,2],[2,0],[1,3]] Output: [[1,3]] Explanation: [[3,1]] is also accepted.
critical connections in a network

Before going through this solution, I would strongly recommend reading Tarjan’s algorithm and find bridges in a graph problem.

class Solution {
    int timer;

    List<List<Integer>> g;
      
    boolean [] visited;
    /* This map stores the time when the
    current node is visited
     */
    int [] tin;
    int [] low;
    
    void dfs(int u, int parent, 
                    List<List<Integer>> res ) {
        visited[u] = true;

        //Put the current timer.
        tin[u] = timer;
        //Low is the time of entry to start with
        low[u] = timer;

        timer++;

        /*
            Go through all the neighbors
         */
        for (int to : g.get(u)) {
            //If it is parent, nothing to be done
            if (to == parent) continue;

            /* If the neighbor was already visited
                get the minimum of the neighbor entry time
                or the current low of the node.
             */
            if (visited[to]) {
                low[u] = Math.min(low[u], tin[to]);
            } else {
                //Else do the DFS
                dfs(to, u, res);
                /*
                 Normal edge scenario,
                 take the minimum of low of the parent 
                  and the child.
                 */
                low[u] = Math.min(low[u], low[to]);

                /* If low of the child node is less than
                   time of entry of current node, then
                   there is a bridge.
                 */
                if (low[to] > tin[u]){
                   //List<Integer> l = new ArrayList<>();
                    List<Integer> l = 
                           Arrays.asList(new Integer[]{u, to});
                   res.add(l);
                }
            }
        }


    }

    public void findBridges(List<List<Integer>> res) {
        timer = 0;
        for(int i=0; i<g.size(); i++){
            if (!visited[i])
                dfs(i, -1, res);
        }
    }

  public List<List<Integer>> criticalConnections(int n, 
                             List<List<Integer>> connections) {
      

    g = new ArrayList<>(); 
      
    visited = new boolean[n];
    /* This map stores the time when the
    current node is visited
     */
    tin = new int[n];
    low = new int[n];
    
    Arrays.fill(visited, false);
    Arrays.fill(tin, n);
    Arrays.fill(low, n);
    
    for(int i=0; i<n; i++){
        g.add(new ArrayList<>());
    }
    for(List<Integer> l : connections){
          g.get(l.get(0)).add(l.get(1));  
          g.get(l.get(1)).add(l.get(0));   
      }
      
      List<List<Integer>> res = new ArrayList<>();
      findBridges(res);
    
      return res;
        
    }
}

The complexity of this code is O(V + E) where V is number of vertices and E is number of edges in the graph.

If you want help with interview preparations, please feel free to book a free demo session with us.

Super Egg drop problem

Egg dropping problem goes like given n eggs and f floors, our task is to find the minimum number of trials required to find the highest floor from which it is safe to drop eggs (i.e. the eggs don’t break) considering worst-case scenario and a few conditions:

  • An egg can be reused only if it survives the fall
  • If an egg breaks from the xth floor, it will also break from all floors above it (x+1th floor to fth floor)
  • If an egg survives a fall from the xth floor, it will also survive the falling from all floors below it (1st floor to x-1th floor)

A couple of variations of this problem where the number of eggs and the number of floors is fixed are famous puzzles like 2 eggs and 100 floors puzzle or 2 eggs and 36 floors puzzle. Finding solutions to such puzzles take a more mathematical approach.

Solution
Let’s say we have 1 egg and 4 floors and we need to find the minimum number of trials required to determine the highest floor from which it is safe to drop eggs. In this case, if we try to drop the egg from a random floor, it might break and we would not be able to conduct further trials to find the required floor.
Hence in such a case, we have to be careful and start from the 1st floor and keep checking on each floor above, as soon as the egg breaks on a floor, we would have found the required floor.

In this example let’s say starting from the 1st floor and reaching the 3rd floor (f-1thth floor) we still couldn’t determine the required floor, only after the trial on the 4th floor, we would find the answer. Hence in the worst case, we require 4 trials.

Now instead of 1 egg if we have 2 eggs and 4 floors, will it reduce the minimum number of trials required? As we have an extra egg we can afford to start from any floor.
If we start from the 4th floor, the egg might break or it might not –

  • If it doesn’t break we can say that it won’t break on any of the lower floors and hence we were lucky enough to have found the answer in the first trial itself.
  • If the egg breaks, we now have 1 egg fewer, but we also know the 4th floor is not the required floors and hence we would perform further trials on floors below it only.
  • But now since we have just 1 egg, we again have to be careful and fall back to the strategy of starting trials on the 1st floor and checking each floor one by one until the 3rd floor. Again in the worst case, we will check each floor from 1st to 3rd and in this way, we will need 3 trials for that.

So in total, we need (1 trial on 4th floor) + (3 trials from 1st to 3rd floor) = 4 trials if we start trials from the 4th floor. Also note in this case the worse of the two possibilities is 4 and we will consider the worst-case scenario always.

But what if we choose some other floor?

If we start from the 3rd floor, the egg might break or it might not –

  • If it doesn’t break, we now have to perform trials on floors above the 3rd floor. In this case, there’s only one floor above: 4th floor and we have two eggs. When we have just one floor to check we just have to drop an egg from it and we can determine our answer from just that one trial.
  • So in total we will have to do 2 trials = (1 trial on 3rd floor) + (1 trial on floors above it i.e. 4th floor).

  • If the egg breaks, we now have 1 egg fewer, but we also know that the 3rd floor is not the required floors and even floors above it will also give the same outcomes, and hence we would perform further trials on floors below it only.
  • Now since we have just 1 egg, we again have to be careful and fall back to the strategy of starting trials on the 1st floor and checking each floor one by one until the 2nd floor. Again in the worst case, we will check each floor from 1st to 2nd and in this way, we will need 2 trials for that.

So in total, we need (1 trial on 3rd floor) + (2 trials from 1st to 2nd floor) = 3 trials if we start trials from the 3rd floor. Also note in this case the worse of the two possibilities is 3 and we will consider the worst-case scenario always.

Similarly, we will start trials from 2nd floor –

  • If the egg breaks on the 2nd floor, we just have to check the 1st floor. Hence total trials required are: 2
  • If the egg doesn’t break, then we have to check for the floors above that are 3rd and 4th floor, i.e. we have 2 eggs and 2 floors for trials.

Note that we do not care about which floor we are or we’ll be (either floor above us or below) doing trials on, after the current trial we just consider which direction will we do next trials and how many floors remaining (either ground or terrace) so that we can calculate the number of required trials accordingly.

Using this similar approach we can find that with 2 eggs and 2 floors we’ll need 2 trials in the worst case. Hence total trials required are: (1 trial on 2nd floor) + (2 trials on 2 floors above 2nd floor) = 3 trials.

Also note in this case the worse of the two possibilities is 3 and we will consider the worst-case scenario always.

Similarly when we start trials from 1st floor-

  • If the egg breaks on the 1st floor, we don’t need more trials as we know there’s no safe floor. Hence total trials required are 1.
  • If the egg doesn’t break, then we have to check for the floors above that are 2nd, 3rd and 4th floor, i.e. we have 2 eggs and 3 floors for trials.

Using this similar approach we can find that with 2 eggs and 3 floors we’ll need 2 trials in the worst case. Hence total trials required are: (1 trial on 1st floor) + (2 trials on 3 floors above 1st floor) = 3 trials. The worse of the two possibilities is 3.

So we tried on all 4 floors: except for starting from the 4th floor all others require 3 trials to find the required answer. So our answer for min_total_trials(2 eggs, 4 floors): 3.

Considering the above simulation and our approach we can, in general, say that when we conduct a trial at xth floor, there are two possible outcomes:

The egg breaks or it does not break

In both cases, we have to conduct more trials, but in case if egg breaks at xth floor, we need to conduct trials on floors below x and in case if the egg does not break at xth floor we need to conduct trials on floors above x. Only exceptions to “conducting more trials” statement are:

  • When we are doing a trial on the 1st floor and the egg breaks, we don’t need more trials as we now know that there are no safe floors.
  • When we are doing a trial at the fth floor (i.e. highest floor) and the egg doesn’t break, we don’t need more trials as we now know that all floors are safe.

And obviously, if we run out of eggs, no more trials can be conducted.

Note that while conducting a trial at any floor we need to consider the worst possible scenario meaning that we cannot make assumptions about the outcomes of the trial conducted and we have to consider the worse of the two.
Also from our approach in the above example, one can observe that we have to think less about finding the highest safe floor and more about minimizing the number of trials. We actually don’t have to find the highest safe floor but have to focus on minimizing the number of trials by simulating drops from different floors.

We’ll consider doing trials on all the floors and take the worst case for every floor and find our final answer by getting the minimum of all the trials from all floors we just calculated. We aim to reduce our current problem to sub-problems. This shows this problem has optimal substructure (i.e. an optimal solution can be constructed from optimal solutions of sub-problems).

Also, we can derive a formula for our finalized approach that will give us the minimum number of trials required to find the highest safe floor given n eggs and f floors:

min_total_trials(n, f) = 1 + MIN( MAX(min_total_trials(n-1, x-1), min_total_trials(n, f – x) ) for each x in {1…f} )

We conduct trials on each floor from 1 to f and for each trial, we consider worse (MAX) of the two possible outcomes, and of all those we consider the one that is minimum. You must have observed in our simulation that we added 1 to the outcome from the sub-problem that was for each of the trials that were done on the current floor, that’s the same 1 that we can see in the formula here.

Using this formula and the exceptions for “conducting more trials” described above as base cases we write a recursive implementation:

int min_total_trials(int e, int f)
{
     /* for 0 floors no trials are required
         for 1 floor only 1 trial is required
        if only 1 egg is available then we have to be 
        careful and perform trials on each of the f floors, 
        hence f trials
     */
   if (f == 0 || f == 1 || e == 1) return f;
   int min_trials = INT_MAX, current_floor_min_trials;
   for (int current_floor = 1 ; current_floor <= f ; ++current_floor)
   {	
      current_floor_min_trials = 1 + 
                max( min_total_trials(e-1, current_floor-1), 
                min_total_trials(e, f - current_floor) );
       min_trials = min(min_trials, current_floor_min_trials); 
   }
   return min_trials;
}
 
int main(void) 
{
    int e = 2, f = 4;
    int min_trials = min_total_trials(e, f);
    printf("%d", min_trials);
    return 0;
}
package com.company;
import java.io.*;
 
class MinTrials
{
    static int max(int a, int b) { return (a > b)? a: b; } 
    static int min(int a, int b) { return (a < b)? a: b; } 
    
    static int min_total_trials(int e, int f)
    {
         /* for 0 floors no trials are required
            for 1 floor only 1 trial is required
            if only 1 egg is available then we have to 
            be careful and perform trials on each of
            the f floors, hence f trials
         */
        if (f == 0 || f == 1 || e == 1) return f;
       int min_trials = Integer.MAX_VALUE, current_floor_min_trials;
       for (int current_floor = 1 ; current_floor <= f ; ++current_floor)
       {
          current_floor_min_trials = 1 
                 + max( min_total_trials(e-1, current_floor-1), 
                        min_total_trials(e, f - current_floor) );
           min_trials = min(min_trials, current_floor_min_trials); 
       }
       return min_trials;
    }
 
	public static void main (String[] args) 
	{
		int e = 2, f = 4;
        System.out.println(min_total_trials(e, f));
    }
}

This implementation takes more than 3 secs for e = 2 and f = 28, and the execution time increases exponentially with increasing inputs.
To find the reason behind such high runtime let’s have a look at the recursive function call tree:
egg drop problem dynamic programming

As we can see while calculating min_total_trials(2,4), min_total_trials(2,2) and min_total_trials(2,2) are calculated repeated.
This means recursive solution solves the same sub-problem over and over rather than always generating new sub-problems. These are called overlapping sub-problems.
As the two properties required for using Dynamic Programming: optimal substructure and overlapping sub-problems hold, we can use dynamic programming for this problem.
Let’s look at the DP solution for this problem: We’ll use a 2D array / DP table to store solutions to the sub-problems and reuse it when required. In DP when moving from bottom-up we will find a solution for each of the sub-problems where-

e = 1 and f = 1, e = 1 and f = 2, e = 1 and f = 3,….. e = 1 and f = F
e = 2 and f = 1, e = 2 and f = 2, e = 2 and f = 3,….. e = 2 and f = F
.
.
.
e = E and f = 1, e = E and f = 2, e = E and f = 3,…… e = E and f = F

DP_table[E][F] will give the final answer for ‘E’ eggs and ‘F’ floors.

int min_total_trials(int e, int f)
{
    int DP_table[e+1][f+1] = {};
     for(int i=1; i<=e; ++i)
    {
         /* for 0 floors no trials are required */
         DP_table[i][0] = 0; 
         /* for 1 floor only 1 trial is required */    
         DP_table[i][1] = 1;    
    }
     for(int i=1; i <= f; ++i)
    {
          /* if only 1 egg is available then we have to 
          be careful and perform trials on each of 
         the f floors, hence f trials */
         DP_table[1][i] = i;
    }
 
     for(int i=2; i <= e; ++i)
    {
          for(int j=2; j <= f; ++j)
         {
             int min_trials = INT_MAX, current_floor_min_trials;
             for (int current_floor=1 ; current_floor<=j; ++current_floor)
             {
                  current_floor_min_trials = 1 
                        + max( DP_table[i-1][current_floor-1], 
                               DP_table[i][j - current_floor] );
                 min_trials = min(min_trials, current_floor_min_trials); 
              }
             DP_table[i][j] = min_trials;
         }
    }
    return DP_table[e][f];
}

int main(void) 
{
    int e = 2, f = 100;
    int min_trials = min_total_trials(e, f);
    printf("%d", min_trials);
    return 0;
}

Javacode

package com.company;	
import java.io.*;
 
class MinTrials
{
    static int max(int a, int b) { return (a > b)? a: b; } 
    static int min(int a, int b) { return (a < b)? a: b; } 
    
    static int min_total_trials(int e, int f)
    {
        int DP_table[][] = new int[e+1][f+1];
         for(int i=1; i<=e; ++i)
        {
             DP_table[i][0] = 0;
             DP_table[i][1] = 1;
        }
         for(int i=1; i <= f; ++i)
        {
             DP_table[1][i] = i;
        }
    
         for(int i=2; i <= e; ++i)
        {
              for(int j=2; j <= f; ++j)
             {
                 int min_trials = Integer.MAX_VALUE, current_floor_min_trials;
                 for (int current_floor=1; current_floor<=j; ++current_floor)
                 {
                      current_floor_min_trials = 1 
                         + max( DP_table[i-1][current_floor-1], 
                                DP_table[i][j - current_floor] );
                     min_trials = min(min_trials, current_floor_min_trials); 
                  }
                 DP_table[i][j] = min_trials;
             }
        }
        return DP_table[e][f];
    }
 
      public static void main (String[] args) 
      {
            int n = 2, f = 100;
            System.out.println(min_total_trials(n, f));
    }
}

Time complexity of DP solution is O(n * f * f) Space complexity of DP solution is O(n * f) where n is no. of eggs and f is no. of floors.

Even though this works for small input, above implementation times out when you run it at Leetcode. We need to change our thinking process. What if we do not think floors and eggs together and think differently in terms of moves and the number of eggs. How about we find that give m moves and k eggs, how many floors can we test?

If we have 1 egg and 1 move only, then we can test only 1 floor. Imagine we are at floor at X, and we have k eggs and m moves allowed. If we make a move on floor x, then there are two possibilities, either the egg will break or it will survive. If the egg breaks, then we are left with k-1 eggs and m-1 moves, this means with k eggs and m moves, we cannot find certainly the floor greater than dp[k-1][m-1]+1. If it does not break, then we have m-1 moves, but still, k eggs remaining, then we can test dp[k][m-1] floors.
If the egg doesn’t break, the egg which is still the first one can start its second try from x + (x-1)floor. Because, if the current egg breaks when it is dropped from the x + (x-1) floor, the second egg still has x-2 opportunities and it can still try from x+1 floor to x + (x-2) floor. If the second egg doesn’t break again, the third try can start from x + (x-1) + (x-2) floor. Hence, the DP recurrence comes up as

dp[m][k] = dp[k-1][m-1] + 1 + dp[k][m-1]

Below is the implementation.

    public int superEggDrop(int K, int N) {
        int[][] dp = new int[N + 1][K + 1];
        int m = 0;
        while (dp[m][K] < N) {
            ++m;
            for (int k = 1; k <= K; ++k)
                dp[m][k] = dp[m - 1][k - 1] + dp[m - 1][k] + 1;
        }
        return m;
    }

This article is contributed by Bhushan Aggrawal. If you would like to contribute, please reach out to us on [email protected]

Arrays With Elements In The Range 1 to N

When an array has all its elements in the range of 1 to N ( where N is the length of the array ) we can use the indices to store the ordered state of the elements in the array. This ordered-state can in-turn be used to solve a variety of problems which we’ll explore soon. First, a very simple demonstration of this property. 

Here is an array which has unique elements in the range of 1 to N.
Given array (A) : 5,3,1,4,2
Indices:                0,1,2,3,4

Sort in Linear Time

The first use-case of this unique property is being able to sort in O(N) time i.e. a special-case(all unique elements) of the Counting Sort. The crux of this sort is to check whether an element is at its corresponding index and swap it to its correct index if it’s not. Following is a demonstration of this logic:

Given array (A) : 5,3,1,4,2
Indices:                0,1,2,3,4

For each A[i] check if A[A[i] – 1] equals A[i] or not. If they are not equal then swap element at A[A[i] – 1] with A[i]. Basically the correct value for any index i is when A[i] contains i+1. 

In the above case, let’s start with i = 0.

A[A[0] – 1] or A[5-1] orA[4] which is 2 and A[0] = 5. This means that A[A[i] – 1] is not equal to A[i] and hence not in its correct position. So we need to swap in order to put A[0] -> 5 to its correct position which is index 4 and A[0] will hold 4 after the swap. Similarly, we need to repeat this check & swap for all the elements.

What if we cancel-out the common terms and modify the check from  A[i] != A[A[i] - 1] to i != A[i]-1 ?

Find The Missing Integer

A similar approach can help us find the smallest missing positive-integer in a given array. By smallest missing positive-integer, we just mean the smallest positive integer that does not exist in the given list of numbers. For example: 

Given Array: -2, 3, 0, 1, 3
In the above case, the smallest missing positive integer is 2.

If we were to apply the usual sorting techniques and then scan the array for the smallest positive integer absent it would imply a time-complexity of O(NLog(N)) + O(N). We can definitely do better than this! At first glance, it seems that this problem does not lie within the unique property of elements being in the range of 1 to N since the numbers in the given array are well outside the range, but to solve this problem we still only need to figure out whether we have elements from 1 to N present in the given array or not.

How do we know whether the given array has elements from 1 to N? We can use the counting-sort discussed earlier to put each element in its “correct position”, i.e index 0 should hold 1, index 1 should hold 2 and so on. The smallest index that does not hold its correct element is the missing integer.

If we sort the given array using counting sort described above, we will get: 1, 0, 3, -2, 3. And the smallest index i to not hold its correct value i.e. i+1 will give us the answer to the smallest missing positive integer. In this case, that index is 1 since it does not hold 2, thus the smallest positive missing integer is 2.

Find The Duplicate Element

The third use-case of this property is to figure out the duplicate elements without using any extra space. We can iterate over the array A and mark the corresponding index of the encountered element as negative – unless it has already been marked negative! For example: if A[1] = 3 (or -3 ) then mark A[ Abs[3] - 1] as negative, this way whenever we encounter 3 (or -3) again in any of the A[i] we will know that the value 3 has been visited before since A[3-1] will be negative.

Given array (A) : 5,3,1,4,3
Indices:                0,1,2,3,4

When we encounter A[0] i.e. 5, we make A[5-1] i.e. A[4] negative, so the array becomes: 
5,3,1,4,-3
Next, we encounter A[1] i.e. 3, we make A[3-1] i.e. A[2] negative, so the array becomes: 
5,3,-1,4,-3
Next, we encounter A[2] i.e. -1, we make A[1-1] i.e. A[0] negative, so the array becomes: 
-5,3,-1,4,-3
Next, we encounter A[3] i.e. 4, we make A[4-1] i.e. A[3] negative, so the array becomes: 
-5,3,-1,-4,-3
Next, we encounter A[4] i.e. -3, we want to make A[3-1] i.e. A[2] negative, but in this case, A[2] is already negative thus we know that A[2] has been visited before! Which means Abs(A[4]) i.e 3 is the duplicate element.


Here is a snippet to demonstrate the code for sorting an array in linear time as per the above approach. The exact same approach can be used to solve the other two applications i.e. Finding the Duplicate and Finding The Missing Integer.

        int swap=0;

        for(int i = 0; i < nums.length;){
            
            if(nums[i] > 0 && nums[i] < nums.length) {

                if(nums[nums[i]-1] != nums[i]){                     
                    swap = nums[i];
                    nums[i] = nums[nums[i] - 1];
                    nums[swap - 1] = swap;
                }else{
                    i++;
                }
                
            }else{
                i++;
            }
        }

 

If you are preparing for a technical interview in companies like Amazon, Facebook, etc and want help with preparation, please register for a coaching session with us.

Maximum area rectangle in a histogram

A histogram is a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. Below is an example of a histogram.

maximum area rectangle in histogram

Given a histogram, whose class interval is 1, find maximum area rectangle in it. Let me explain the problem in more details.

In the histogram above, there are at least 6 rectangles with areas 2, 1,5,6,2, and 3. Are there more rectangles? Yes, we can make more rectangles by combining some of these rectangles. A few are shown below.

Apparently, the largest area rectangle in the histogram in the example is 2 x 5 = 10 rectangle. The task is to find a rectangle with maximum area in a given histogram. The histogram will be given as an array of the height of each block, in the example, input will be [2,1,5,6,2,3].

Maximum area rectangle: thoughts

First insight after looking at the rectangles above is: block can be part of a rectangle with a height less than or equal to its height. For each block of height h[i], check what all blocks on the left can be part of a rectangle with this block. All the blocks on the left side with a height greater than the current block height can be part of such a rectangle.
Similarly, all the blocks on the right side with a height greater than the current block height can be part of such a rectangle.
Idea is to calculate leftLimit and rightLimit and find the area (rightLimit - leftLimit) * h[i].
Check if this area is greater than previously known area, then update the maximum area else, continue to the next block.

class Solution {
    public int largestRectangleArea(int[] heights) {
        
        if(heights.length == 0) return 0;
        int maxArea = Integer.MIN_VALUE;

        for(int i=0; i<heights.length; i++){
            //Find the left limit for current block
            int leftLimit = findLeftLimit(heights, i);

            //Find the right limit for current block
            int rightLimit = findRightLimit(heights, i);

            int currentArea = (rightLimit - leftLimit-1) * heights[i];
            maxArea = Integer.max(maxArea, currentArea);
        }

        return maxArea;
    }

    private int findLeftLimit(int [] heights, int index){
        int j = index-1;
        while (j >= 0 && heights[j] >= heights[index]) j--;

        return j;
    }

    private int findRightLimit(int [] heights, int index){
        int j = index+1;
        while (j < heights.length && heights[j] >= heights[index])
            j++;

        return j;
    }
}

The time complexity of the implementation is O(n2); we will left and right of each block which will take n operations, we do it for n blocks and hence the complexity is quadratic. Can we optimize the time complexity?

If heights[j] >= heights[i] and leftLimit of index j is already known, can we safely say that it will also be the leftLimit of index i as well?
Can we say the same thing for rightLimit well? Answers to all the questions are yes. If we store the left and right limit for all indices already seen, we can avoid re-calculating them.

class Solution {
    public int largestRectangleArea(int[] heights) {
        
        if(heights.length == 0) return 0;

        int maxArea = Integer.MIN_VALUE;

        //Finds left limit for each index, complexity O(n)
        int [] leftLimit = getLeftLimits(heights);
        //Find right limit for each index, complexity O(n)
        int [] rightLimit = getRightLimits(heights);

        for(int i=0; i<heights.length; i++){
            int currentArea = 
                (rightLimit[i] - leftLimit[i] -1) * heights[i];
            maxArea = Integer.max(maxArea, currentArea);
        }

        return maxArea;
    }

    private int[] getLeftLimits(int [] heights){

        int [] leftLimit = new int[heights.length];
        leftLimit[heights.length-1] = -1;

        for(int i=0; i<heights.length; i++) {
            int j = i - 1;
            while (j >= 0 && heights[j] >= heights[i]) {
                j = leftLimit[j];
            }
            leftLimit[i] = j;
        }
        return leftLimit;
    }

    private int[] getRightLimits (int [] heights){

        int [] rightLimit = new int[heights.length];
        rightLimit[heights.length-1] = heights.length;

        for(int i=heights.length-2; i>=0; i--){
            int j = i+1;
            while(j<heights.length 
                      && heights[j] > heights[i]){
                j = rightLimit[j];
            }
            rightLimit[i] = j;
        }
        return rightLimit;
    }
}

The array leftLimitcontains at index i the closest index j to the left of i such that height[j] < height[i]. You can think about each value of the array as a pointer (or an arrow) pointing to such j for every i. How to calculate leftLimit[i]? Just point the arrow one to the left and if necessary just follow the arrows from there, until you get to proper j. The key idea here to see why this algorithm runs in O(n) is to observe that each arrow is followed at most once.

Largest area rectangle: stack-based solution

There is a classic method to solve this problem using the stack as well. Let’s see if we can build a stack-based solution using the information we already have. Let’s we do not calculate the area of the rectangle which includes the bar when we are processing it. When should we process it? Where should this bar be put on? If we want to create a rectangle with a height of this bar, we should find the left and right boundaries of such a rectangle. We should put this bar on a stack.
Now when you are processing bar j if height[j] is less than the bar on the top of the stack, we pop out the bar at the top. Why? Because this is the first bar on the right which has a height less than the height of the bar at top of the stack. This means if we want to make a rectangle with a height of the bar at the top of the stack, this index means the right boundary. This also gives away that all the blocks on the stack are in increasing order, as we never put a block which has a height less than the height of block at the top on to the stack. It means the next bar on the stack is the first bar which has a height lower than the bar at the top. To calculate the area of the rectangle with height as h[top], we need to take width as current index j - stack.peek() - 1

So the idea is that:

  1. For each bar, take its height as the rectangle’s height. Then find the left and right boundaries of this rectangle.
  2. The second top bar in the stack is always the first bar lower than the top bar on the stack on the left.
  3. The bar that j points to is always the first bar lower than the top bar in the stack on the right.
  4. After step 2 and 3, we know the left and right boundaries, then know the width, then know the area.
private int maxAreaUsingStack(int[] heights){

        Stack<Integer> s = new Stack<>();

        int maxArea = 0;
        for(int i=0; i<=heights.length; i++){
            //Handling the last case
            int h = i == heights.length ? 0 : heights[i];
            while(!s.empty() && h < heights[s.peek()]){
                int top = s.pop();
                int leftLimit = s.isEmpty() ? -1 : s.peek();
                int width = i-leftLimit-1;

                int area = width * heights[top];
                maxArea = Integer.max(area, maxArea);
            }
            s.push(i);
        }
        return maxArea;
    }
The time complexity of the code is O(n) with an additional space complexity of O(n) If you are preparing for a technical interview in companies like Amazon, Facebook, etc and want help with preparation, please register for a coaching session with us.

Minimizing maximum lateness

Minimizing maximum lateness : Greedy algorithm

Since we have chosen the greed, let continue with it for one more post at least. Today’s problem is to minimize maximum lateness of a task. Let me clarify the problem: given a processor which processes one process at a time and as always given a list of processes to be scheduled on that processor, with the intention that maximum late process should be minimized. Contrary to previous problems, this time, we are not provided with start time and end time, but we are given length of time ti process will run and deadline it has to meet di, fi is actual finish time of process completion.

Lateness of a process is defined as
li = max{0, fi − di}, i.e. the length of time past its deadline that it finishes.
Goal here to schedule all tasks to minimize maximum lateness L = max li For example:

minimize maximum lateness

Minimizing maximum lateness : algorithm

Let’s decide our optimization strategy. There is some order in which jobs can be decided: shortest job first, earliest deadline first, least slack time first.

Let’s see if any of the above strategies work for the optimal solution. For shortest processing time first, consider example P1 = (1,100) P2 = (10, 10). If we schedule the shortest job first as in order (P1, P2), lateness will be 91, but if we take them as (P2, P1), lateness will be 0. So, clearly taking the shortest process first does not give us an optimal solution.

Check for the smallest slack time approach. See if you can come up with some counterexample that it does not work.

That leaves us with only one option, take the process which has the most pressing deadline, that is the one with the smallest deadline and yet not scheduled. If you have noticed, the example given for the problem statement is solved using this method. So, we know it works.

  1. Sort all job in ascending order of deadlines
  2. Start with time t = 0
  3. For each job in the list
    1. Schedule the job at time t
    2. Finish time = t + processing time of job
    3. t = finish time
  4. Return (start time, finish time) for each job

Minimizing maximum lateness : implementation

from operator import itemgetter

jobs = [(1, 3, 6), (2, 2, 9), (3, 1, 8), (4, 4, 9), 
        (5, 3, 14), (6, 2, 15)] 

def get_minimum_lateness():
	schedule =[];
	max_lateness = 0
	t = 0;
	
	sorted_jobs = sorted(jobs,key=itemgetter(2))
	
	for job in sorted_jobs:
		job_start_time = t
		job_finish_time = t + job[1]

		t = job_finish_time
		if(job_finish_time > job[2]):
			max_lateness =  max (max_lateness, (job_finish_time - job[2]))
		schedule.append((job_start_time, job_finish_time))

	return max_lateness, schedule

max_lateness, sc = get_minimum_lateness();
print "Maximum lateness will be :" + str(max_lateness)
for t in sc:
	print t[0], t[1]

The complexity of implementation is dominated by sort function, which is O(nlogn), rest of processing takes O(n).

Please share your suggestions or if you find something is wrong in comments. We would love to hear what you have to say. If you find this post interesting, please feel free to share or like.

Coin change problem : Greedy algorithm

Coin change problem : Greedy algorithm

Today, we will learn a very common problem which can be solved using the greedy algorithm. If you are not very familiar with a greedy algorithm, here is the gist: At every step of the algorithm, you take the best available option and hope that everything turns optimal at the end which usually does. The problem at hand is coin change problem, which goes like given coins of denominations 1,5,10,25,100; find out a way to give a customer an amount with the fewest number of coins. For example, if I ask you to return me change for 30, there are more than two ways to do so like

 
Amount: 30
Solutions : 3 X 10  ( 3 coins ) 
            6 X 5   ( 6 coins ) 
            1 X 25 + 5 X 1 ( 6 coins )
            1 X 25 + 1 X 5 ( 2 coins )

The last solution is the optimal one as it gives us a change of amount only with 2 coins, where as all other solutions provide it in more than two coins.

Solution for coin change problem using greedy algorithm is very intuitive and called as cashier’s algorithm. Basic principle is : At every iteration in search of a coin, take the largest coin which can fit into remaining amount we need change for at the instance. At the end you will have optimal solution.

Coin change problem : Algorithm

1. Sort n denomination coins in increasing order of value.
2. Initialize set of coins as empty. S = {}
3. While amount is not zero:
3.1 Ck is largest coin such that amount > Ck
3.1.1 If there is no such coin return “no viable solution”
3.1.2 Else include the coin in the solution S.
3.1.3 Decrease the remaining amount = amount – Ck

Coin change problem : implementation

#include <stdio.h>
 
int coins[] = { 1,5,10,25,100 };
 
int findMaxCoin(int amount, int size){
	for(int i=0; i<size; i++){
	    if(amount < coins[i]) return i-1;
	}
	return -1;
}

int findMinimumCoinsForAmount(int amount, int change[]){
 
	int numOfCoins = sizeof(coins)/sizeof(coins[0]);
	int count = 0;
	while(amount){
	    int k = findMaxCoin(amount, numOfCoins);
	    if(k == -1)
                printf("No viable solution");
	    else{
                amount-= coins[k];
		change[count++] = coins[k];
            }
	}
	return count;
}
 
int main(void) {
	int change[10]; // This needs to be dynamic
	int amount = 34;
	int count = findMinimumCoinsForAmount(amount, change);
 
	printf("\n Number of coins for change of %d : %d", amount, count);
	printf("\n Coins : ");
	for(int i=0; i<count; i++){
		printf("%d ", change[i]);
	}
	return 0;
}

What will the time complexity of the implementation? First of all, we are sorting the array of coins of size n, hence complexity with O(nlogn). While loop, the worst case is O(amount). If all we have is the coin with 1-denomination. Overall complexity for coin change problem becomes O(n log n) + O(amount).

Will this algorithm work for all sort of denominations? The answer is no. It will not give any solution if there is no coin with denomination 1. So be careful while applying this algorithm.

Please share if you have any suggestion or if you want me to write on a specific topic. If you liked the post, share it!

Range minimum query (RMQ)

Range minimum query RMQ

Given an array A[0..n], find the index of the element with the minimum value in a given range. This problem is known as Range Minimum Query or RMQ.
For example, if given array below, find the index of minimum value between index 2 and 7, RMQ answer would be 5, which is the index of element 1.

 RMQ range minimum query

Going by the brute force, every time a query is fired, we scan the range and find the minimum in a given range in the same way as we do for an entire array. The complexity of each query being answered is O(n) wherein the worst-case range is the entire array.

Can we preprocess our data, so that our query operations are less costly? If we do so, there are two parts to the solution now, first preprocessing and the second query. Let’s assume complexity of each step is f(n) and g(n) respectively, then the complexity of solution can be denoted as (f(n), g(n)).

What kind of preprocessing can be done? Basic idea is to calculate the minimum index of all the ranges possible in the array. How many ranges are possible for an array with n elements? It’s n2 ranges. Why?

So, to store the index of minimum value element of each range, O(n2) order space is required and time complexity goes to O(n3). However, complexity of query is O(1). So overall complexity of solution is ( O(n3), O(1) ).

#include <stdio.h>

int M[100][100];

int findMinimum(int a[], int start, int end, int size){
	if(start >= size || end >= size) return -1;
	int min = start;
	for(int i=start; i<=end; i++){
		if( a[i] < a[min] ){
			min = i;
		}
	}
	return min;
	
}
void preprocess(int a[], int size ){
    for(int i=0; i<size; i++){
        for(int j=0; j<size; j++){
            for(int k=i; k<=j; k++){
                M[i][j] = findMinimum(a,i,j,size); 
            }
        }
    }
}

int rmq(int start, int end){
	return M[start][end];
}

int main(void) {
	
	int a[] = { 2,3,1,5,9,7,10,5,6,3 };
	int size = sizeof(a)/sizeof(a[0]);
	
	//Preprocessing step
	preprocess(a, size);
	printf("\n Minimum index in range is : %d ", rmq(3,9) );
	printf("\n Minimum index in range is : %d ", rmq(2,7) );
	
	return 0;
}

With application of dynamic programming, the complexity of the preprocessing step can be reduced to O(n2).

#include <stdio.h>

int M[100][100];

void preprocess(int a[], int size)
{
	int i,j;
	for (i=0; i<size; i++)
		M[i][i] = i;
	
	for (i=0; i<size; i++){
		for (j=i+1; j<size; j++){
			if (a[M[i][j - 1]] < a[j])
				M[i][j] = M[i][j - 1];
			else
				M[i][j] = j;
		}
	}
}

int rmq(int start, int end){
	return M[start][end];
}

int main(void) {
	
	int a[] = { 2,3,1,5,9,7,10,5,6,3 };
	int size = sizeof(a)/sizeof(a[0]);
	
	//Preprocessing step
	preprocess(a, size);
	printf("\n Minimum index in range is : %d ", rmq(3,9) );
	printf("\n Minimum index in range is : %d ", rmq(2,7) );
	
	return 0;
}

Range minimum query with O(n), O(√n) complexity solution

Can we do better for preprocessing step while trading off query step? If we divide the array into smaller chunks and store index of minimum value element in those chunks, will it help? And what should be the size of chunks? How about we divide the array in √n parts, where √n is size of part.

RMQ or range minimum query based on square root partitioning

Now, find minimum element index in each of this chunk, and store it. Extra space required is (√n). Finding minimum for each chunk has a complexity of (√n * √n) as O(n).

To find minimum element index in the given range, follow three steps:
1. Find the index of the element with the minimum value in all chunks lying between the start and end of the given range. (Max √n operations if all chunks fall in the range)
2. Find minimum index in chunk where the start of the range lies. ( Max √n comparisons from start of the range to end of the chunk).
3. Find minimum index in chuck where end of the range lies from the start of chunk to end of the range.
4. Compare all these values and return the index of the minimum of them.

No matter, how big or small range is to find the index of an element with the minimum value, the worst case will be O(√n) as there are only 3*√n operations.

Let’s take an example and see how it works. Find minimum in range (2,7)

range minimum query or RMQ example

To get RMQ(2,7), what are the chunks with are lying within range? There is only one: chunk 1. Minimum index of chunk 1 is M[1] which is 5, so, minimum element in those chunks is A[5].

Find the index of the minimum value in chunk 0 where start of the range lies (starting from start of the range which 2). There is only one element, which is index 2, so element to compare is A[2].

Find minimum from the start of chunk where the end of the range lies. So, we will be comparing A[6] and A[7].

At the end, compare A[5] (minimum of all chunks between start and end of range ), A[2] (minimum in chunk where the start of the range lies) and A[6], A[7] (minimum in chunk where end of the range lies) and we have the answer as 5 as A[5] is the minimum of all these values.

Aggregating all things, we found a way to optimize solution of range minimum query with complexity as ((o(n), O(√n)).

RMQ using sparse table

Method 3 uses only O(√n) space, however, query time complexity is also O(√n). To reduce query time at the expense of space, there is another method called as sparse table method. This method uses features of method 2 (dynamic programming) and features of method 3 (find minimums of chunks).

In this approach, split input array into chunks of size 2j where j varies from 0 to log n and n is number of elements in array. There will be n log n such chunks and hence the space complexity becomes O(n log n).

After splitting, find the index of the minimum element in each chunk and store it in a lookup table. 

M[i][j] stores minimum in range from i with size 2j.

RMQ using sparse matrix table

For example, M[0][3] stores index of the minimum value between 0 and 7 (23 = 8 elements).

Now problem is how to create this lookup table? This table can be created using dynamic programming from bottom up. Specifically, we find index of the minimum value in a block of size 2j by comparing the two minima of its two constituent blocks of size 2j-1. More formally,

M[i,j] = M[i, j-1] if A[M[i, j-1]] >= A[M[i+2^j-1, j-1]] 
M[i,j] = M[i+2^j-1, j-1] otherwise.

How to find the index of the minimum value in a given range? Idea is to find two subranges which cover the entire range and then find the minimum of minimums of these two ranges.
For example, find RMQ(i,j). If 2k be size of largest block that fits into the range from i to j, then k = log(j – i + 1)

Now, we have two parts to look in from i to i+2k + 1 (already computed as M[i,k] ) and from j-2k+1 (already computed as M[j-2k+1, k]).

Formally,

    RMQ(i,j) =  M[i][k] if A[ M[i][k] ] >= A[M[j-2^k+1, k]]
    RMQ(i,j) =  M[j-2^k+1, k]

RMQ implementatio using sparse table

#include <stdio.h>
#include <math.h>

int M[100][100];

void preprocess(int a[], int size)
{
    int i, j;
	
    for (i = 0; i < size; i++)
        M[i][0] = i;
		
    for (j = 1; 1 << j <size ; j++){
        for (i = 0; i + (1 << j) - 1 < size; i++){
            if (a[M[i][j - 1]] < a[M[i + (1 << (j - 1))][j - 1]])
                M[i][j] = M[i][j - 1];
            else
                M[i][j] = M[i + (1 << (j - 1))][j - 1];
        }
    }
}  
  
int rmq(int a[], int start, int end){
    int j = floor(log(start-end+1));

    if ( a[M[start][j]] <= a[M[end-(1<<j)+1][j]] )
        return M[start][j];
    else 
        return M[end-(1<<j)+1][j];
}

int main(void) {
	
	int a[] = { 2,3,1,5,9,7,10,5,6,3 };
	int size = sizeof(a)/sizeof(a[0]);
	
	//Preprocessing step
	preprocess(a, size);
	printf("\n Minimum index in range is : %d ", rmq(a,3,9) );
	printf("\n Minimum index in range is : %d ", rmq(a,2,7) );
	
	return 0;
}

These two blocks entirely cover the range and since only once comparison required, the complexity of lookup will be O(1).

In this post, we discussed various ways to implement range minimum query based on space and time complexity tradeoff. In future posts, we will discuss applications of RMQ such as segmented trees and least common ancestor problem.

Please share if something is wrong or missing, we would love to hear from you.

Prune nodes not on paths with given sum

Prune nodes not on paths with given sum

Prune nodes not on paths with given sum is a very commonly asked question in Amazon interviews. It involves two concepts in one problem. First, how to find a path with a given sum and then second, how to prune nodes from binary tree. The problem statement is:

Given a binary tree, prune nodes which are not paths with a given sum.

For example, given the below binary tree and given sum as 43, red nodes will be pruned as they are not the paths with sum 43.

Prune nodes not on path with given sum
Prune nodes not on path with given sum

Prune nodes in a binary tree: thoughts

To solve this problem, first, understand how to find paths with a given sum in a binary tree.  To prune all nodes which are not on these paths,  get all the nodes which are not part of any path and then delete those nodes one by one. It requires two traversals of the binary tree.
Is it possible to delete a node while calculating the path with a given sum? At what point we find that this is not the path with given sum? At the leaf node.
Once we know that this leaf node is not part of the path with given sum, we can safely delete it.  What happens to this leaf node? We directly cannot delete the parent node as there may be another subtree which leads to a path with the given sum. Hence for every node, the pruning is dependent on what comes up from its subtrees processing.

At the leaf node, we return to parent false if this leaf node cannot be part of the path and delete the leaf node. At parent node, we look for return values from both the subtrees. If both subtrees return false, it means this node is not part of the path with the given sum. If one of the subtrees returns true, it means the current node is part of a path with the given sum. It should not be deleted and should return true to its parent.

Prune nodes from a binary tree: implementation

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
 
struct node{
	int value;
	struct node *left;
	struct node *right;
};
typedef struct node Node;

#define true 1
#define false 0

int prunePath(Node *node, int sum ){
	
	if( !node ) return true;
	
	int subSum =  sum - node->value;
	/* To check if left tree or right sub tree 
	contributes to total sum  */
	
	int leftVal = false, rightVal = false;
	
	/*Check if node is leaf node */
	int isLeaf = !( node->left || node->right );
	
	/* If node is leaf node and it is part of path with sum
	= given sum return true to parent node so tha parent node is
	not deleted */
	if(isLeaf && !subSum )
		return true;
		
	/* If node is leaf and it not part of path with sum 
	equals to given sum
    Return false to parent node */
    else if(isLeaf && subSum ){
    	free(node);
    	return false;
    }
    /* If node is not leaf and there is left child 
	Traverse to left subtree*/
    leftVal = prunePath(node->left, subSum);
    
    /* If node is not leaf and there is right child
	 Traverse to right subtree*/
    rightVal = prunePath(node->right, subSum);
    
    /* This is crux of algo.
    1. If both left sub tree and right sub tree cannot lead to
	path with given sum,Delete the node 
    2. If any one sub tree can lead to path with sum equal
	to given sum, do not delete the node */ 
    if(!(leftVal || rightVal) ){
    	free(node);
    	return false;
    }
    if(leftVal || rightVal ){
    	if(leftVal)
    		node->right = NULL;
    	if(rightVal)
    		node->left = NULL;
    	return true;
    }
    return true ;
}

void inoderTraversal(Node * root){
	if(!root)
		return;
	
	inoderTraversal(root->left);
	printf("%d ", root->value);
	inoderTraversal(root->right);
}
Node *createNode(int value){
	Node * newNode =  (Node *)malloc(sizeof(Node));
	newNode->value = value;
	newNode->right= NULL;
	newNode->left = NULL;
	
	return newNode;
}
Node *addNode(Node *node, int value){
	if(node == NULL){
		return createNode(value);
	}
	else{
		if (node->value > value){
			node->left = addNode(node->left, value);
		}
		else{
			node->right = addNode(node->right, value);
		}
	}
	return node;
}

/* Driver program for the function written above */
int main(){
	Node *root = NULL;
	//Creating a binary tree
	root = addNode(root,30);
	root = addNode(root,20);
	root = addNode(root,15);
	root = addNode(root,25);
	root = addNode(root,40);
	root = addNode(root,37);
	root = addNode(root,45);
	
	inoderTraversal(root);	
	prunePath(root, 65);
	
	printf( "\n");
	if( root ){
		inoderTraversal(root);	
	}
	return 0;
}

The complexity of this algorithm to prune all nodes which are not on the path with a given sum is O(n).

Please share if there is something wrong or missing. If you are preparing for interviews, please signup for free interview material.