Design a data structure with insert delete and getRandom in O(1)

Design a data structure with insert delete and getRandom in O(1)

The problem statement is to design a data structure which performs the following operations in O(1) time complexity:
1. Insert an element, insert(int value)
2. Remove an element, remove(int value)
3. Get random element, getRandom()

For example, insert 1 into the data structure insert(1): [1]
insert 2 into the data structure insert(2): [1,2]
insert 3 into the data structure insert(3): [1,2,3]

Remove 2 from it, remove(2). [1,3]
getRandom() should return 1 and 3 with equal probabilities.

These kind of problems are easy and hard at the same time. Idea is to go step by step and solve each part. The first step is to define an interface for this data structure, which is easy given the definition of the problem.

public interface IRandomNumberGenerator {
    public boolean insert(int value);
    public boolean remove (int value);
    public int getRandom();
}

Now that interface is ready, time to start implementing the class which implements this interface. First of all, we have to find a container to store all the elements. If we take an ArrayList, insert() is O(1) as we will always add new element at the end of the ArrayList. getRandom is also O(1). However, there is problem with remove(). To remove an element from ArrayList, we have to scan the whole ArrayList and remove the element, the move all the elements on the right of the deleted element to one index left. This is O(n) operation.

Insert delete and getRandom in O(1): selection of data structures

A problem with storing elements in an ArrayList is that while removal, we have to scan the list and find the location of the element to be removed. What if we already knew the location of the element? If we store the position of each element in ArrayList in a HashMap which maps the value to its index on ArrayList

Now, insert() has to insert a value to two data structures, first into the ArrayList and then the location of the value in ArrayList to the HashMap. Remove operation can simply go to the location in the ArrayList and delete the element. Wait, still, we have to move all the elements on the right one position left. It means the worst case complexity of remove() still O(n).

We know one thing: if I remove the last element from the ArrayList then there is no shifting required. What if we copy the last value at the index of the element to be removed and then just remove the last element. Be careful, we have to update the HashMap with the new value for the element at the last index of ArrayList. In this way, remove() is also O(1).

Insert, delete and getRandom in O(1): implementation

package AlgorithmsAndMe;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;

public class RandomNumberGenerator implements IRandomNumberGenerator {

    private ArrayList<Integer> list;
    private Map<Integer, Integer> loc;
    private Random random;

    //Initializing the class
    public RandomNumberGenerator(){
        list = new ArrayList<>();
        loc = new HashMap<>();
        random = new Random();
    }

    @Override
    public boolean insert(int value) {
        /*If hash already contains key then it is a duplicate key.
          So, we just return false.
         */
        if(loc.containsKey(value)) return false;

        //Insert into list
        list.add(value);

        //Save the location on hash map
        loc.put(value, list.size()-1);
        return true;
    }

    @Override
    public boolean remove(int value) {
        /* If there is no entry in hash, that means
        there is no element in ArrayList */
        if(!loc.containsKey(val)) return false;
 
        int location = loc.get(val);
        //Remove from hash
        loc.remove(val);

        if(location != list.size()-1){
            /*Copy the last value in the array
            list to the current location*/
            list.set(location, list.get(list.size()-1));

            //Update the location of last element in hash
            loc.put(list.get(location), location);
        }

        //remove the last location from ArrayList
        list.remove(list.size()-1);
 
        return true;
    }

    @Override
    public int getRandom() {
        return list.get(random.nextInt(list.size()));
    }
}

package AlgorithmsAndMe;

import static org.junit.Assert.*;

public class RandomNumberGeneratorTest {

    RandomNumberGenerator randomNumberGenerator =
           new RandomNumberGenerator();

    @org.junit.Test
    public void testInterface() {
        assertEquals(true, randomNumberGenerator.insert(4));
        assertEquals(true, randomNumberGenerator.insert(5));
        assertEquals(true, randomNumberGenerator.insert(3));
        assertEquals(true, randomNumberGenerator.insert(2));

        assertEquals(true, randomNumberGenerator.remove(4));

        int random = randomNumberGenerator.getRandom();
        System.out.println(random);
    }
}

The complexity of the whole data structure for insert, delete and getRandom is O(1).

Insert, delete and get random when duplicates are allowed

Let’s make this problem a bit more complex by making duplicate elements possible in the list. The first problem with the existing implementation is that it stores the location of an element in ArrayList in a HashMap. If the same element can appear multiple times in the list, then which location should we store? We should store all the locations. It will change the definition of our HashMap as

Map<Integer, HashSet<Integer>> 

Hashset implements the Set interface, backed by a hash table which is actually a HashMap instance. No guarantee is made as to the iteration order of the set which means that the class does not guarantee the constant order of elements over time, that is what we require. We require that insert and remove operation on this data structure should be O(1) or constant time complexity.
To know more about the complexity of various data structures in Java, follow Runtime Complexity of Java Collections and read reason why HashSet provides constant time insert and remove operations.
Everything else follows the same process. To insert(), we should insert the location of the element at the HashSet in the hash table. While removing we find the last location of the element, put the last element of ArrayList in that location and update the HashSet of the location corresponding to the value at the last index of the ArrayList. Remove the last element from ArrayList.
We also have to move the last element in ArrayList of location in Hash, which is O(1) operation.

getRandom() implementation remains same.

package AlgorithmsAndMe;

import java.util.*;

public class RandomNumberGenerator implements IRandomNumberGenerator {

    private ArrayList<Integer> list;
    private Map<Integer, HashSet<Integer>> loc;
    private Random random;

    //Initializing the class
    public RandomNumberGenerator(){
        list = new ArrayList<>();
        loc = new HashMap<>();
        random = new Random();
    }

    @Override
    public boolean insert(int value) {

        if(!loc.containsKey(value)){
            loc.put(value, new HashSet<>());
        };

        //Insert into list
        list.add(value);

        //Save the location on hash map
        loc.get(value).add(list.size()-1);
        return true;
    }

    @Override
    public boolean remove(int value) {
        /* If there is no entry in hash, that means
        there is no element in ArrayList */
        if(!loc.containsKey(value)) return false;

        //Get the last location of the element in ArrayList
        HashSet<Integer> listLocations = loc.get(value);
        int location = listLocations.iterator().next();
        loc.get(value).remove(location);

        int lastElement = list.get(list.size()-1);
        if( lastElement != value) {
        /*Copy the last value in the array
        list to the current location*/
            list.set(location, lastElement);
            //Update the location of last element in hash
            loc.get(lastElement).remove(list.size()-1);
            loc.get(lastElement).add(location);
        }
        //remove the last location from ArrayList
        list.remove(list.size()-1);

        if(listLocations.isEmpty()) loc.remove(value);
        return true;
    }

    @Override
    public int getRandom() {
        return list.get(random.nextInt(list.size()));
    }
}

Other problems which are very similar to this concept are: design an LRU cache, first non-repeated character in stream etc.

Please share if there is anything wrong or missing. If you are preparing for an interview and need one to one personalized coaching, please reach out to us on communications@algorithmsandme.com

Merge overlapping intervals

Merge overlapping intervals

Given N intervals S = {E1,E2,…..En} with each Ei has start time si and end time ei. Some of these intervals can be overlapping, Just to clarify, Ei and Ej overlap when start time of Ej i.e sj is less than end time of Ei i.e ei. For example, [(1,3),(2,4),(5,8), (6,9)] should transform into [(1, 4),(5,9)] has interval (1,3) and (2,4) overlap and interval (5,8) and (6,9) also overlap.

merge overlapping intervals

Merge overlapping intervals  : Thought process

As we always do, first try to come up with brute force solution, given enough time and space and money, how would you solve this?
Natural course is to take ith interval and compare start time of all jth intervals with end time of ith, if the start time of jth interval is less than the end time of ith event, then you can merge two intervals. What should be end time for merged interval then?  It should be maximum of end times of two merged intervals.

What will be time complexity of this approach? We are not using any additional space, however, worst case time complexity is O(n2). Can we do better?

What are two times we are comparing in brute force solution? It’s the start time of one interval with the end time of another. If we arrange input in a specific order, can we reduce processing some entries?

If we sort all intervals based on their start time, si < si+1< si+2. Also, interval is always forward looking, ei > si, ei+1 > si+1 and so on.

If si is greater ei-1, then si+1 will be greater than ei-1, so no need to compare si+1 with ei-1, that is no need to go beyond immediate previous interval for any interval Ei. If si is less than ei-1, update ei-1 with maximum of ei-1 and ei and move to Ei+1.
Notice that we need last interval Ei-1 to decide if to merge new interval into previous one or keep it as standalone. A stack is the best data structure to use. The algorithm will look like:

  1. Consider interval Ei.
  2. If stack is empty, push Ei to stack.
  3. If stack is not empty, then pop interval at top of stack call it Ei-1.
  4. Compare si, start time of Ei with ei-1, end time of Ei-1.
  5. If si less than ei-1, update ei-1 as max(ei-1, ei), as in maximum of end times of two intervals and push back Ei-1on to stack.
  6. Else push Ei on to stack.
  7. Continue till all events are considered.
  8. At the end of processing, stack will contain all merged interval.

Let’s take an example and see how this algorithm works. We have following intervals and we have to merge overlapping intervals.

First of all, sort all interval based on their start time.

Create a stack, start with the first interval, since the stack is empty, we will push the first event on to the stack.

After pushing the first event, the problem state looks like this

Take the second interval, start time (2) of the second interval is less than the end time of the previous event on the stack (3), hence, find the maximum of end times of these two intervals and update the last interval with that end time and push back on to the stack.

 

Look at the third interval, the start time of it is greater than the end time of interval on top of the stack, just push interval on to the stack.

Last interval, this time, the start time of the new interval is less than the end time of interval on top of the stack.

Find the maximum of end times of two intervals and update the previous interval with that end time and push it back on to stack.

merge overlapping intervals

At this point, when there is no more interval remaining, stack contains all merged overlapping intervals.

Merge overlapping intervals : Implementation

package com.company;


import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.Stack;

/**
 * Created by sangar on 8.4.18.
 */
public class OverlappingIntervals {
    public  static ArrayList<Interval>
        mergeOverlappingIntervals(ArrayList<Interval> intervals){

        ArrayList<Interval> mergedIntervals = new ArrayList<>();
        Stack<Interval> s = new Stack();

        //Sort the ArrayList of interval based on start time.
        Collections.sort(intervals, Comparator.comparing(p -> p.getStartTime()));
        for(Interval currentInterval : intervals){
            if(s.empty())s.push(currentInterval);
            else {
                Interval previousInterval = s.pop();
                if(previousInterval.getEndTime() > 
                     currentInterval.getStartTime()){
                    /*
                    If current interval's start time is less than end time of
                    previous interval, find max of end times of two intervals
                    and push new interval on to stack.
                     */
                    int endTime = Integer.max(previousInterval.getEndTime(),
                                              currentInterval.getEndTime());
                    /* Notice that we have created new interval and 
                       did not update the old one
                       This concept is called as immutability of class
                     */
                    s.push(new Interval(previousInterval.getStartTime(),
                                        endTime));
                }
                else{
                    s.push(previousInterval);
                    s.push(currentInterval);
                }
            }
        }
        while(!s.empty()){
            mergedIntervals.add(s.pop());
        }

        return mergedIntervals;
    }

    public static void main(String[] args) {
        ArrayList<Interval> intervals = new ArrayList<>();

        intervals.add(new Interval(1,3));
        intervals.add(new Interval(2,4));
        intervals.add(new Interval(5,8));
        intervals.add(new Interval(6,9));
        ArrayList<Interval> mergedIntervals = mergeOverlappingIntervals(intervals);
        for (Interval interval : mergedIntervals){
            System.out.print("(" + interval.getStartTime() +"," + interval.getEndTime() + ")");
        }
    }
}

Complexity of algorithm to merge overlapping intervals will be O(n log N) due to sorting with O(n) extra space for stack and then copying into the list to return also takes O(n) space.

There is another way to implement the same function without using the stack, here we use the fact that ArrayList in Java is implemented using the array as the base and getting an element at a particular index should be O(1) operation. The code looks more or less the same, however, there is no traversal of the stack at the end to create the list to return.

public List<Interval> mergeOptimized(List<Interval> intervals) {

        if(intervals.size() == 0) return intervals;

        Collections.sort(intervals, 
           (Interval a, Interval b) -> a.getStartTime() - b.getStartTime());

        List<Interval> mergedIntervals = new ArrayList<Interval>();
        for(Interval interval : intervals){

            /*If the merged list is empty add the interval to 
              it or check if the last interval in merged list overlaps

            /*Remember the get function on ArrayList is O(1) operation
              because Arraylists in Java are backed by arrays */
            if(mergedIntervals.isEmpty()
                    || mergedIntervals.get(mergedIntervals.size()-1).getEndTime() < 
                       interval.getStartTime() ){
                mergedIntervals.add(interval);
            }
            else {
                int lastEndTime = Math.max(
                        mergedIntervals.get(mergedIntervals.size()-1).getEndTime(),
                        interval.getEndTime()
                );
                mergedIntervals.get(mergedIntervals.size()-1).setEndTime(lastEndTime);
            }
        }

        return mergedIntervals;
    }

You can use the above snippet of code to submit for this leetcode problem and it should be accepted.

Please share if there is something missing or wrong. Also, please reach out to us at communications@algorithmsandme.com if you want to contribute to the website and help others to learn by sharing your knowledge. If you are preparing for an interview and need some coaching to prepare for it, please sign up for the free session with us.

Word break problem

Word break problem

This problem is commonly asked in the Google and Amazon interview. We all know that if you typed string in Google search box does not make sense, Google breaks that into meaningful words and asks us back if we meant those words instead of a single word. This post discusses how can we find if the given string can be broken into meaningful dictionary words. For example, if I typed algorithmsandme and given dictionary is [“algorithms”, “and”, “me”], this string is breakable in meaningful words. but if the string is algorithmsorme this is not breakable into meaningful words. You can find this problem for practice at leetcode.

Word break problem : thoughts

We start with the first character of the string, check if the character itself is a word in the dictionary? If yes, then our problem reduces to the smaller problem, that is to check if substring from index 1 to s.length is breakable or not.
If not, then we check two characters and then three characters and so on till we can check the whole string. As with every character inclusion, the problem reduces in size but remains the same, so ideal case for recursive implementation.

package AlgorithmsAndMe;

import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreak {

    public boolean wordBreak(String s, List<String> wordDict) {
        return wordBreakUtil(s, wordDict, 0, table);
    }

    private boolean wordBreakUtil(String s, 
                                   List<String> wordDict, 
                                   int index) {

        if (index == s.length()) return true;

        boolean isBreakable = false;
        for(int i=index; i<s.length(); i++) {
            isBreakable = isBreakable 
                   || wordDict.contains(s.substring(index, i+1))
                    && wordBreakUtil(s, wordDict, i + 1);
        }

        return isBreakable;
    }
}

If you notice we are solving the same problems again and again in recursive function wordBreakUtil, how can we save that repeated calculations? Best way to save the already solve problems in a cache, that way we can refer to the cache if the problem is already solved or not. If yes, do not solve it again and use the cached value. This approach is called a Top Down approach and uses memoization to avoid repeated subproblems.

package AlgorithmsAndMe;

import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreak {

    public boolean wordBreak(String s, List<String> wordDict) {
        int [] table =  new int[s.length()];
        for(int i=0; i<s.length(); i++){
            table[i] = -1;
        }
        return wordBreakUtilTopDown(s, wordDict, 0, table);
    }

    private boolean wordBreakUtilTopDown(String s, 
                            List<String> wordDict,
                            int index,
                            int[] table) {

        if (index == s.length()) return true;

        if(table[index] < 0) {
            boolean isBreakable = false;
            for (int i = index; i < s.length(); i++) {
                isBreakable = isBreakable 
                        || wordDict.contains(s.substring(index, i + 1))
                        && wordBreakUtilTopDown(s, wordDict, i + 1);
            }
            table[index] = isBreakable ? 1 : 0;
        }
        return table[index] == 1 ? true : false;
    }
  }

If you run the first solution, it will exceed the time limit on leetcode, however, the second implementation should be accepted with 4ms as the time to run. Now you can appreciate the efficiency by memoization.

Word break problem using dynamic programming

In the last two implementations, two things are evident: first, the optimal solution of a subproblem leads to the optimal solution of the original problem. Second, there are overlapping subproblems. These are two must have conditions for applying dynamic programming. We already saw the memoization and top-down approach of DP to avoid repeated solving of subproblems. How can we do it bottom up?

What if store an information if the string till index i is breakable? What will be the base case? The string before index 0 is alway breakable as empty string. So table[0] can be always true. To check if string till index i is breakable or not, we check from index 0 to index i-1 if there is any index j till which string is breakable. If yes, then we just check if substring from index j to i, that will make table[i] as true.

package AlgorithmsAndMe;

import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreak {

    public boolean wordBreak(String s, List<String> wordDict) {
        return wordBreakBottomUp(s, wordDict, 0, table);
    }

    private boolean wordBreakUtilBottomUp(String s, List<String> wordDict){

        if(s == null || s.length() == 0) return false;

        boolean[] table  = new boolean[s.length()+1];

        table[0] = true;
        for (int i = 1; i <= s.length(); i++) {
            for (int j = i - 1; j >= 0; j--) {
                if (table[j] && wordDict.contains(s.substring(j, i))) {
                        table[i] = true;
                    }
                }
            }
        }
        return table[s.length()];
    }
}

The time complexity of the above implementation of the word break problem is O(n2)

If you want to store all the strings which can be generated by breaking a particular word, below is the code.

package AlgorithmsAndMe;

import java.util.*;

public class WordBreak2 {

    public List<String> wordBreak(String s, List<String> wordDict) {
        Map<String, List<String>> map = new HashMap<>();
        return wordBreakUtil2(s, wordDict, map);
    }

    private List<String> wordBreakUtil2(String s,
                                        List<String> wordDict,
                                        Map<String, List<String>> map) {

        if(map.containsKey(s)){
            return map.get(s);
        }

        List<String> result = new ArrayList<String>();
        if (wordDict.contains(s)){
            result.add(s);
        }

        for(int i=1; i<=s.length(); i++) {
            String prefix = s.substring(0, i);
            if(wordDict.contains(prefix)){
                List<String> returnStringsList = wordBreakUtil2(s.substring(i), wordDict, map);

                for(String returnString :returnStringsList ){
                    result.add(prefix + " " + returnString);
                }
            }
        }
        map.put(s,result);

        return result;
    }
}

Please share if there is something is wrong or missing. If you are preparing for an interview and need any help with preparation, please reach out to us or book a free session.