Merge k sorted arrays

Given k sorted arrays of varying lengths, merge these k sorted arrays into one sorted array.

For example, given 3 arrays:
merge k sorted arrays

The resulting array should be like

merge k sorted array

Companies this problem is asked in
Microsoft, Amazon, Facebook, Salesforce, Indeed

Merge k sorted arrays divide and conquer

Since all the input arrays are sorted, the first element in the output sorted array will be one of these first elements of input arrays. How can we find the minimum among all the elements plucked from the first index of each array? Easy, take those k elements (there are k arrays, so k first elements) and build a min-heap. The root of the min-heap will be the least element among each of the first elements of the given k sorted arrays, i.e.

result[0] = min(arr1[0], arr2[0], arr3[0]…arrK[0])

merging k sorted arrays

The initial root above will be the first element in the result array. Now the second element for the result array can be found from the set of first elements of all input arrays except the array from which the first element of result array was taken. For example, if arr1 had the least of all first elements while finding the initial root, then:

result[1] = min(arr1[1], arr2[0], arr3[0] … arrK[0])

merge k sorted arrays java

Next iteration, 5 will be picked and put into the result array and index of arr 3 will be increased.
merging k sorted arrays

After putting 5 in the result array, we will move the next element in array 3 to the min-heap.
merge n sorted arrays

In order to know which array gave the minimum element at a particular time, we will store additional information of about array and index at which minimum element was.

If i represents the array number, and j represents the index of the minimum number currently in the heap from the ith array, then we add (j+1)th element to the min-heap next and re-heapify.
If we have put all the element from the ith array in the heap then we need to reduce the size of min-heap to k-1.

Follow the procedure for (n-1)*k times. When all array elements are processed the result array will be the sorted array for all nk element.

Algorithm

  • Build min heap with the first element of all k arrays.
  • Pick the root of min element and put it in the result array.
  • If there are remaining elements in the array,  put next element at the root of min heap and heapify again
  • If all elements are already of an array are processed, reduce the size of min heap by 1.
  • Repeat step 2, 3 and 4 till min heap is empty.

Show me the implementation

package com.company;
import java.util.PriorityQueue;

/**
 * Created by sangar on 2.12.18.
 */
public class MergeKSortedArrays {
    private class HeapNode {
        public int arrayNum;
        public int index;
        public int value;

        public HeapNode(int arrayNum, int index, int value) {
            this.arrayNum = arrayNum;
            this.index = index;
            this.value = value;
        }
    }

    public int[] mergeKSortedArrays(int[][] arrays) {

        if (arrays == null) return null;

        PriorityQueue<HeapNode> minHeap =
                new PriorityQueue<>(arrays.length,
                        (HeapNode a, HeapNode b) -> a.value - b.value);

        int size = 0;
        for (int i = 0; i < arrays.length; i++) {
            size += arrays[i].length;
        }
        int[] result = new int[size]; // k * n

        //add first elements in the array to this heap
        for (int i = 0; i < arrays.length; i++) {
            minHeap.add(new HeapNode(i, 0, arrays[i][0]));
        }

        //Complexity O(n * k * log k)
        for (int i = 0; i < size; i++) {
            //Take the minimum value and put into result
            HeapNode node = minHeap.poll();

            if (node != null) {
                result[i] = node.value;
                if (node.index + 1 < arrays[node.arrayNum].length) {
                    //Complexity of O(log k)
                    minHeap.add(new HeapNode(node.arrayNum,
                            node.index + 1,
                            arrays[node.arrayNum][node.index + 1]));
                }
            }
        }
        return result;
    }
}

The complexity of the code to merge k sorted arrays is O(nklogk) along with space complexity of O(k).

Please share if there is something wrong or missing. If you are preparing for an interview, please sign up to receive interview preparation kit for free.

Heaps fundamentals

In this post will learn heap fundamentals, which are very important for solving priority queue problems. Heap is a kind of data structure based on the complete tree principle. By definition, a complete binary search tree of N levels has at least 2^N-1 nodes in it. There two properties heap holds :

1. Structural property: This means every level of the heap will be completely full except the last level. The last level will be filled from left to right order.
2. Heap property: It means parent node will be either greater or smaller than its children nodes. There are two kinds of heaps based on second property:

Max Heap
Max heap maintains the property that every parent node is greater than its children. The root of the max heap is the greatest element.

heap fundamentals

Min Heap
Min heap maintains the property that every parent node is less than its children. The root of the min-heap is the least element.

min heap

Implementation Notes
Usually, heaps are implemented with an array, which eases traversing from child to parent and parent to child; and viewed as a tree. The height of a heap with n elements will be O(logN). 

Children of a parent node i are 2i and 2i+1.
Parent of a node i will be [i/2].

    private int parent(int pos) {
        return pos / 2;
    }
    
    private int leftChild(int pos){
        return (2 * pos);
    }

    private int rightChild(int pos){
        return (2 * pos) + 1;
    }
    

Given that multiply and divide by 2 can be very efficiently implemented by left and right shifting, these operations are very efficient.

Maximum elements in a heap with height h will be 2h-1 while the minimum number of elements in heap with the same height will be 2h-1+1 (One more than nodes with height h-1)

Heap operations

1. Insert an element in heap
To insert a new element in a heap, insert it as the leftmost position available. This new element may violate the heap property, for example, in a min-heap, the newly added element is less than the parent node. We have to push this newly added node to its correct position, this process is called a heapification.
Let’s take an example and see how it works.
insert in heap

    public void insert(int element) {
        if (size >= maxsize) {
            return;
        }

        Heap[++size] = element;
        int current = size;

        while (Heap[current] < Heap[parent(current)]) {
            swap(current, parent(current));
            current = parent(current);
        }
    }

The complexity of this procedure is for O(logn).

2. Pop from heap
The deletion of node is usually done at the root. To delete a root, replace the root node with the last node in heap and then do downshift. As the new node replacing root may violate heap property, we need to check with its children. In Max heap, check if new node is greater than both its children. If not then swap it with the largest child and then again repeat it till node replacing root finds its valid place.

Algorithm (This is for max heapify)

  1. Let i be the index which we doubt that might violate heap property.
  2. left = 2i, right = 2i+1, largest = a[i]
  3. Check if left child is with in array limits, if yes, check if it is greater than the a[i].
  4. If left is greater than a[i], then largest till now is left, largest = left.
  5. Check if right is with in array limit, if yes, check if it is greater than largest, then change the largest, largest = right.
  6. Swap largest and a[i].
  7. Now, repeat step 1 to 6 with largest, till we see an element which does not violate heap property.

Let’s take an example:
pop in heap

   public int pop() {
        int popped = Heap[1];
        Heap[1] = Heap[size--];
        minHeapify(1);
        return popped;
    }
    private void swap(int i, int j) {
        int tmp = Heap[i];
        Heap[i] = Heap[j];
        Heap[j] = tmp;
    }

    // Function to heapify the node at pos
    private void minHeapify(int pos){

        if (!isLeaf(pos)) {
            if (Heap[pos] > Heap[leftChild(pos)]
                    || Heap[pos] > Heap[rightChild(pos)]) {

                if (Heap[leftChild(pos)] < Heap[rightChild(pos)]) {
                    swap(pos, leftChild(pos));
                    minHeapify(leftChild(pos));
                }
                else {
                    swap(pos, rightChild(pos));
                    minHeapify(rightChild(pos));
                }
            }
        }
    }

3. Building a heap from a given array of elements.
This operation can easily be done using the heapify methods explained above. 

  1. Start from the middle element of the array, let’s say i
  2. Heapify with given index.
  3. Decrease index by one. Repeat step 2 till we reach first element.
   public void minHeap() {
        for (int pos = (size / 2); pos >= 1; pos--) {
            minHeapify(pos);
        }
    }

The complexity of this procedure is O(n).

How this complexity becomes O(n) while for adding each node will need logn time to heapify and if there are n nodes, it should be O(nlogn). The complexity of the heapify method for a particular element depends on its position in the heap. It takes O(1) time when the node is a leaf node (which makes up at least half of the nodes) and O(log n) time when it’s at the root.
If the height of the heap is h, the number of nodes will be 2h. So 2h/2 = 2h-1 nodes never move. 2h-2 nodes move by one level and so on.

4. Heapsort
Heap sort combines property of both insertion sort (no extra space required) and merge sort (time complexity being O(nlogn)). It can be easily achieved by using the above two procedures.

  1. Build max heap from the given array, with complexity of O(n)
  2. Swap first and last element of the array. Last element is now at its proper position.
  3. Decrease the size of heap by 1 to be heapify.
  4. Heapify with first element of the array.
  5. Repeat step 2 , 3 and 4 until there are elements to be sorted.

The complexity of the above procedure is O(nlogn).

Problems based on heaps
Reference : http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844

Check if tree is BST or not

This is one of the most asked programming interview questions. How to check or validate that a given binary tree is BST or not or if a given tree is a binary search tree? For example, the first and second binary trees are BST but not the third one.

binary tree is BST or not
binary tree is BST
binary tree is BST or not
binary tree is BST
binary tree is bst or not
binary tree is not BST

In binary tree introduction  we touched upon the topic of recursive structure of binary search tree.  The first property to satisfy to be qualified as BST is: value in all nodes on the left subtree of the root node are smaller and the value of all nodes in the right subtree is greater than the root node. This property should be valid at all nodes.

Check if binary tree is (BST) or not: Recursive implementation

So, to see if the binary tree rooted a particular node is BST, the root is greater than all nodes on the left subtree and less than all nodes on the right subtree. However, is it sufficient condition? Let’s take a counterexample and prove that even root is greater than all nodes on the left side and smaller than all nodes on the right subtree, a binary tree may not be binary search tree. Look at the tree below.

In this tree above condition is satisfied, but we cannot call this binary tree a BST.

This is a recursive structure of a binary search tree that plays an important role. For a binary tree root at a node to be BST, it’s left subtree and right subtree should also be BST. So, there are three conditions which should be satisfied:

  1. Left subtree is BST
  2. Right subtree is BST
  3. Value of root node is greater than the max in the left subtree and less than minimum in right subtree

Check if binary tree is (BST) or not  : Recursive implementation

#include<stdio.h>
#include<stdlib.h>

#define true 1
#define false 0

struct node{
	int value;
	struct node *left;
	struct node *right;
};

typedef struct node Node;

Node * findMaximum(Node * root){
	if( !root ) return root;
	while( root->right ){
		root = root->right;
	}
	return root;
}

Node * findMinimum(Node * root){
	if( !root ) return root;
	while( root->left ){
		root = root->left;
	}
	return root;
}

int isBST(Node * node){

  	if(!node)
  		return true;
    
    if( ! ( node->left || node->right ) ) return true;   
  	int isLeft  = isBST(node->left);
  	int isRight = isBST(node->right);

  	if(isLeft && isRight){
  		/* Since we already know that left sub tree and
 		right sub tree are Binary search tree, finding min and max in them would be easy */
   	
   		Node *max  =  NULL;
   		Node *min  =  NULL;
   		if( node->left )
   			max = findMaximum(node->left);
   		if( node->right )
   			min = findMinimum(node->right);

   		//Case 1 : only left sub tree is there
    	if(max && !min)
        	return node->value > max->value;
   		//Case 2 : Only right sub tree is there
    	if(!max && min)
       		return node->value < min->value;
   		//Case 3 : Both left and right sub tree are there
    	return (node->value > max->value && node->value < min->value);
   }
   return false;
}

Node * createNode(int value){
  Node *newNode =  (Node *)malloc(sizeof(Node));
  
  newNode->value = value;
  newNode->right= NULL;
  newNode->left = NULL;
  
  return newNode;
}

Node * addNode(Node *node, int value){
  if(node == NULL){
      return createNode(value);
  }
  else{
     if (node->value < value){
        node->left = addNode(node->left, value);
      }
      else{
        node->right = addNode(node->right, value);
      }
  }
  return node;
}
 
/* Driver program for the function written above */
int main(){
  Node *root = NULL;
  //Creating a binary tree
  root = addNode(root,30);
  root = addNode(root,20);
  root = addNode(root,15);
  root = addNode(root,25);
  root = addNode(root,40);
  root = addNode(root,37);
  root = addNode(root,45);
  
  printf("%s", isBST(root ) ? "Yes" : "No" );
  
  return 0;
}

Check if binary tree is (BST) or not  : Optimized implementation

Above implementation to check if the binary tree is binary search tree or not is correct but inefficient because, for every node,  its left and right subtree are scanned to find min and max. It makes implementation non-linear.

How can we avoid re-scanning of left and right subtrees? If we can keep track max on left subtree and min on right subtree while checking those subtrees for BST property and use the same min and max.

Start with INTEGER_MAX and INTEGER_MIN, check if the root node is greater than max and less than min. If yes, then go down left subtree with max changed to root value, and go down to right subtree with min changed to root value. It is a similar implementation as above, except revisiting nodes.

#include<stdio.h>
#include<stdlib.h>

#define true 1
#define false 0
#define INT_MIN -32999
#define INT_MAX 32999

struct node{
	int value;
	struct node *left;
	struct node *right;
};

typedef struct node Node;

int isBSTHelper(Node *root, int max, int min){
    if(!root) return true;

    if(root->value < min || root->value > max){
        return false;
    }

    return isBSTHelper(root->left, root->value, min) &&
           isBSTHelper(root->right, max, root->value);
}

int isBST(Node *root){
    return isBSTHelper(root, INT_MAX, INT_MIN);
}


Node * createNode(int value){
  Node *newNode =  (Node *)malloc(sizeof(Node));
  
  newNode->value = value;
  newNode->right= NULL;
  newNode->left = NULL;
  
  return newNode;
}

Node * addNode(Node *node, int value){
  if(node == NULL){
      return createNode(value);
  }
  else{
     if (node->value < value){
        node->left = addNode(node->left, value);
      }
      else{
        node->right = addNode(node->right, value);
      }
  }
  return node;
}
 
/* Driver program for the function written above */
int main(){
  Node *root = NULL;
  //Creating a binary tree
  root = addNode(root,30);
  root = addNode(root,20);
  root = addNode(root,15);
  root = addNode(root,25);
  root = addNode(root,40);
  root = addNode(root,37);
  root = addNode(root,45);
  
  printf("%s", isBST(root ) ? "Yes" : "No" );
  
  return 0;
}

The complexity of the above implementation is O(n) as we are traversing each node only once.

Another method to see if the binary tree is BST or not is to do an inorder traversal of the binary tree and keep track of the previous node. As we know in-order traversal of a binary search tree gives nodes in sorted order, previously visited node should be always smaller than the current node. If all nodes satisfy this property, a binary tree is a binary search tree. If this property is violated at any node, the tree is not a binary search tree.

The complexity of this implementation is also O(n) as we will be traversing each node only once

Please share if there is something missing or not correct. If you want to contribute and share your knowledge with thousands of learner around the world, please reach out to us at [email protected]

Lowest common ancestor in binary tree

Lowest common ancestor (LCA) in BST

Given a binary search tree and two nodes, find the lowest node which is the parent of both given nodes, that is the lowest common ancestor (LCA). For example, in the following tree, LCA of 6 and 1 is node(5), whereas the lowest common ancestor of nodes 17 and 6 would be a node(10).

lowest common ancestor lca

What is the condition for a node to be LCA of two nodes?  If paths for given nodes diverge from the node, then the node is the lowest common ancestor. While the path is common for both the nodes, nodes are common ancestor but they are not lowest or least. How can we find where paths are diverging?

Paths are diverging when one node is on the left subtree and another node is on the right subtree of the node. The brute force solution would be to find one node and then go up the tree and see at what parent node, other given node falls on the opposite subtree.

Implementation wise, traverse to node 1 and node 2, and store both paths on the stack. Then pop from two stacks till you find the same node on both paths, that node would be the lowest common ancestor. There will be two scans of the tree and additional space complexity to store paths which in the worst case be O(n).

However, the brute force solution does not use the property of a binary search tree. Property is that all the nodes on the left side of a node are smaller and all the nodes on the right side of a node are greater than node. Can we use that property to solve this problem?

Basic idea is to return the node if it is found in any of the subtrees. At any node, search for both given nodes in the left subtree.  If we get a non-empty node returned from the left subtree, there is at least one of the two nodes is on the left subtree.

Again, search in right subtree these two nodes, if a non-empty node is returned from the right subtree, that means at least one of the nodes is on the right subtree.

What does it means if we have a non-empty node on both left and right subtree? It means two nodes are on the left and right subtree, one on each side. It means the root node is the lowest common ancestor.

What if one of the returned nodes is empty? It means both nodes are on one side of the root node, and we should return the upwards the non-empty node returned.

Let’s take an example and see how does it work? Given the below tree, find the lowest common ancestor of node(1) and node(9).

lowest common anestor in binary tree

Start with the node(10) and look for the left subtree for both node(1) and node(9). Go down all the way to the node(1), at the time, we return 1 as the node as node.value is equal to one of the nodes.

lowest common anestor in binary tree

lowest common anestor in binary treeAt node(5), we have got node(1) return from left subtree. We will search for node(1) and node(9) on right subtree. We go all the way to node(6), which is leaf node.

least common ancestor in binary search tree

At node(8), the left subtree returns nothing as none of the nodes in the left subtree of node(8). However, the right subtree returns node(9).

lowest common ancestor

As per our algorithm, if either of the subtrees returns a non-empty node, we return the node return from the subtree.

lca in binary search tree

At node(5), we get a non-empty node from the right subtree and we already know, from the left subtree, we got node(1). At this point at node(5), we have both left and right subtree returning non-empty node, hence return the node(5).

lca in binary treee

Two nodes will be searched on the right subtree of node(10), which will return nothing, hence, final lowest common ancestor will be node(5).

Implementation

#include<stdio.h>
#include<stdlib.h>
 
struct node{
	int value;
	struct node *left;
	struct node *right;
};

typedef struct node Node;

Node * findLCA(Node *root, int val1, int val2)
{
    // Base case
    if (root == NULL) return NULL;
 
    /* If either val1 or val2 matches with root's key, 
       report the presence by returning the root
       (Note that if a key is the ancestor of other,
       then the ancestor key becomes LCA 
   */
    if (root->key == val1 || root->key == val2)
        return root;
 
    // Look for keys in left and right subtrees
    Node *left  = findLCA(root->left, val1, val2);
    Node *right = findLCA(root->right, val1, val2);
 
    /* If both of the above calls return Non-NULL,
       then one key is present in once subtree
       and other is present in other,
       So this node is the LCA */
    if (left && right)  return root;
 
    // Otherwise check if left subtree or right subtree is LCA
    return (left != NULL)? left : right;
}

Node * createNode(int value){
  Node *newNode =  (Node *)malloc(sizeof(Node));
  
  newNode->value = value;
  newNode->right= NULL;
  newNode->left = NULL;
  
  return newNode;
}

Node * addNode(Node *node, int value){
  if(node == NULL){
      return createNode(value);
  }
  else{
     if (node->value > value){
        node->left = addNode(node->left, value);
      }
      else{
        node->right = addNode(node->right, value);
      }
  }
  return node;
}
 
/* Driver program for the function written above */
int main(){
  Node *root = NULL;
  //Creating a binary tree
  root = addNode(root,30);
  root = addNode(root,20);
  root = addNode(root,15);
  root = addNode(root,25);
  root = addNode(root,40);
  root = addNode(root,37);
  root = addNode(root,45);
  
  printf("\n least common ancestor: %d ",
      leastCommonAncestor(root, 15, 25));
  
  return 0;
}

Below implementation only works for binary search tree and not for the binary tree as above method works.

#include<stdio.h>
#include<stdlib.h>
 
struct node{
	int value;
	struct node *left;
	struct node *right;
};

typedef struct node Node;

int leastCommonAncestor(Node *root, int val1, int val2){

 	if(!root)
       return -1;

 	if(root->value == val1 || root->value == val2)
    	return root->value;

 	/* Case 3: If one value is less and other greater
             than the current node
             Found the LCS return */
 	if((root->value > val1 && root->value <= val2) ||
  		(root->value <= val1 && root->value >val2)){
             return root->value;
 	}
  	/*Case 2 : If Both values are greater than current node, 
           look in right subtree */
 	else if(root->value < val1 && root->value <val2){
        return leastCommonAncestor(root->right, val1, val2);
 	}
 	/*Case 1 : If Both values are less than current node,
           look in left subtree */
 	else if(root->value > val1 && root->value > val2){
        return leastCommonAncestor(root->left, val1, val2);
 	}
}

Node * createNode(int value){
  Node *newNode =  (Node *)malloc(sizeof(Node));
  
  newNode->value = value;
  newNode->right= NULL;
  newNode->left = NULL;
  
  return newNode;
  
}

Node * addNode(Node *node, int value){
  if(node == NULL){
      return createNode(value);
  }
  else{
     if (node->value > value){
        node->left = addNode(node->left, value);
      }
      else{
        node->right = addNode(node->right, value);
      }
  }
  return node;
}
 
/* Driver program for the function written above */
int main(){
  Node *root = NULL;
  //Creating a binary tree
  root = addNode(root,30);
  root = addNode(root,20);
  root = addNode(root,15);
  root = addNode(root,25);
  root = addNode(root,40);
  root = addNode(root,37);
  root = addNode(root,45);
  
  printf("\n least common ancestor: %d ",
      leastCommonAncestor(root, 15, 25));
  
  return 0;
}

The worst complexity of the algorithm to find the lowest common ancestor in a binary tree is O(n). Also, keep in mind that recursion is involved. More skewed the tree, more stack frames on the stack and more the chances that stack will overflow.

This problem is solved using on traversal of tree and managing states when returning from recursive calls.

Please share if there is something wrong or missing. If you are willing to contribute and share your knowledge with thousands of learners across the world, please reach out to us at [email protected]

Find Kth smallest element in array

Given an array of integers which is non sorted, find kth smallest element in that array. For example: if input array is A = [3,5,1,2,6,9,7], 4th smallest element in array A is 5, because if you sort the array A, it looks like A = [1,2,3,5,6,7,9] and now you can easily see that 4th element is 5.

Companies asked in

This problem is commonly asked in Microsoft and Amazon interviews as it has multiple layers and there are some many things that can be tested with this one problem.

Kth smallest element : Line of thought

First of all, in any interview, try to come up with brute force solution. Brute force solution to find Kth smallest element in array of integers would be to sort the array and return A[k-1] element (K-1 as array is zero base indexed).

What is the complexity of brute force solution? It’s O(n2)? Well, we have sort algorithms like merge sort and heap sort which work in O(nlogn) complexity.

The problem with both searches is that they use additional space. Quick sort is another sorting algorithm. It has problem that it’s worst-case complexity will be O(n2), which happens when input is completely sorted.
In our case, the input is given as unsorted already, so we can expect that quicksort will function with O(n log n) complexity which is its average-case complexity. Advantage of using quicksort is that there is no additional space complexity.

Optimising quick sort

Let’s see how quicksort works and see if we can optimize solution further?
Idea behind quicksort is to find the correct place for the selected pivot. Once the pivot is at the correct position, all the elements on the left side of the pivot are smaller and on the right side of the pivot are greater than the pivot. This step is partitioning.

If after partitioning, pivot is at position j, can we say that pivot is actually jth smallest element of the array? What if j is equal to k? Well problem solved, we found the kth smallest element.

If j is less than k, left subarray is less than k, we need to include more elements from right subarray, therefore kth smallest element is in right subarray somewhere. We have already found j smallest elements, all we need to find is k-j elements from right subarray.

What if j is greater than k? In this case, we have to drop some elements from left subarray, so our search space would be left subarray after partition.

Theoretically, this algorithm still has the complexity of O(n log n), but practically, you do not need to sort the entire array before you find k smallest elements.

If you are preparing for a technical interview and need personal coaching along with mock interviews, book a free demo session with us

Algorithm to find Kth smallest element in array

  1. Select a pivot and partition the array with pivot at correct position j
  2. If position of pivot, j, is equal to k, return A[j].
  3. If j is less than k, discard array from start to j, and look for (k-j)th smallest element in right sub array, go to step 1.
  4. If j is greater than k, discard array from j to end and look for kth element in left subarray, go to step 1

Let’s take an example and see if this algorithm works? A =  [4, 2, 1, 7, 5, 3, 8, 10, 9, 6 ], and we have to find fifth smallest element in array A.

Kth smallest element in array

Start with pivot as first index of array, so pivot = 0, partition the array into two parts around pivot such that all elements on left side of pivot element, i.e. A[pivot] are smaller and all elements on right side are greater than A[pivot].

Start with pivot as first index of array, so pivot = 0, partition the array into two parts around pivot such that all elements on left side of pivot element, i.e. A[pivot] are smaller and all elements on right side are greater than A[pivot].

In our example, array A will look like below after pivot has found it’s the correct position.

kth smallest element
After partition, correct position of pivot is index 3

If pivot == k-1 (array is represented as zero base index), then A[pivot] is kth smallest element. Since pivot (3) is less than k-1 (4), look for kth smallest element on right side of the pivot.

k remains as it is as opposed to k-j mentioned in algorithm as pivot is given w.r.t entire array and not w.r.t subarray.

In second iteration, pivot = 4 (index and not element). After second execution of quick sort array A will be like

kth smallest element
After partition of right subarray, correct position of pivot is index 4

pivot(4) which is equal to k-1(5-1). 5th smallest element in array A is 5.

Implementation

package com.company;

/**
	* Created by sangar on 30.9.18.
*/
public class KthSmallest {
	private void swap(int[] a, int i, int j){
		int temp = a[i];
		a[i] = a[j];
		a[j] = temp;
	}
	private int partition(int[] a, int start, int end){
		int pivot = a[start];
		int i  = start+1;
		int j  = end;

		while(i <= j){
			while(a[i] < pivot) i++;
			while(a[j] > pivot) j--;

			if(i < j) {
				swap(a, i, j);
			}
		}
		swap(a, start, j);
		return j;
	}

	public int findKthSmallestElement(int a[], int start, 
				int end, int k){
		if(start <= end){
		int p = partition(a, start, end);
		if(p == k-1){
			return a[p];
		}
		if(p > k-1)
			return findKthSmallestElement(a, start, p, k);
		if(p < k-1)
			return findKthSmallestElement(a, p+1, end, k);
		}
		return -1;
	}
}
Test cases
package test;

import com.company.KthSmallest;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;

/**
 * Created by sangar on 28.8.18.
 */
public class KthSmallestTest {

	KthSmallest tester = new KthSmallest();
	private int[] a = {4, 2, 1, 7, 5, 3, 8, 10, 9};
	@Test
	public void kthSmallest() {
		assertEquals(7, tester.findKthSmallestElement(a,0,8,6));
	}

	@Test
	public void firstSmallest() {
		assertEquals(1, tester.findKthSmallestElement(a,0,8,1));
	}

	@Test
	public void lastSmallest() {
		assertEquals(10, tester.findKthSmallestElement(a,0,8,9));
	}

	@Test
	public void kGreaterThanSize() {
		assertEquals(-1, tester.findKthSmallestElement(a,0,8,15));
	}
	@Test
	public void emptyArray() {
		int[] a = {};
		assertEquals(-1, tester.findKthSmallestElement(a,0,0,1));
	}

	@Test
	public void nullArray() {
		assertEquals(-1, tester.findKthSmallestElement(null,0,0,1));
	}
}

Complexity of using quicksort algorithm to find the kth smallest element in the array of integers is still O(n logn).

Kth smallest element using heaps

Before going into details of this problem, I strongly recommend reading heap fundamentals.

Imagine a case where there are a billion integers in the array and you have to find 5 smallest elements from that array. The complexity of O(n log n) is too costly for that use case. Above algorithm using quicksort does not take into consideration disparity between k and n.

We want top k elements, how about we chose those k elements randomly, call it set A and then go through all other n-k elements, call it set B, check if element from set B (n-k elements) can displace element in set A (k elements)?

What will be the condition for an element from set B to replace an element in set A? Well, if the new element is less than maximum in set A than the maximum in set A cannot be in the set of k smallest elements right?  Maximum element in set A would be replaced by the new element from set B.

Now, the problem is how to quickly find the maximum out of set A. Heap is the best data structure there. What kind of heap: min heap or max heap? Max heap as it store the maximum of the set at the root of it.

Let’s defined concrete steps to find k smallest elements using a max heap. 

  1. Create a max heap of size k from first k elements of array.
  2. Scan all elements in array one by one.
    1.  If current element is less than max on heap, add current element to heap and heapify.
    2. If not, then go to next element.
  3. At the end, max heap will contain k smallest elements of array and root will be kth smallest element.

Let’s take an example and see if this algorithm works? The input array is shown below and we have to find the 6th smallest element in this array.

kth smallest element
input array

Step 1 : Create a max heap with first 6 elements of array.

Create a max heap with set A

Step 2: Take the next element from set B and check if it is less than the root of max heap. In this case, yes it is. Remove the root and insert the new element into max heap.

kth largest element
Element from set B removes root from max heap and added to max heap

Step 2: It continues to 10, nothing happens as the new element is greater than the root of max heap. Same for 9.  At 6, again the root of max heap is greater than 6. Remove the root and add 6 to max heap.

nth smallest number in an integer array
Again, new element from set B is less than root of max heap. Root is removed and new element is added.

Array scan is finished, so just return the root of the max heap, 6 which is the sixth smallest element in given array.

Implementation

	public int findKthSmallestElementUsingHeap(int a[], int k){
	//https://stackoverflow.com/questions/11003155/change-priorityqueue-to-max-priorityqueue

	PriorityQueue<Integer>  maxHeap =
			new PriorityQueue<>(k, Collections.reverseOrder());

		if(a == null || k > a.length) return -1;
		//Create max with first k elements
		for(int i=0; i<k; i++){
			maxHeap.add(a[i]);
		}

		/*Keep updating max heap based on a new element
		If new element is less than root, 
		remove root and add new element
		*/

		for(int i=k; i<a.length; i++){
			if(maxHeap.peek() > a[i]){
				maxHeap.remove();
				maxHeap.add(a[i]);
			}
		}
		return maxHeap.peek();
	}

Can you calculate the complexity of above algorithm? heapify() has complexity of log(k) with k elements on heap. In worst case, we have to do heapify() for all elements in array, which is n, so overall complexity of algorithm becomes O(nlogk). Also, there is additional space complexity of O(k) to store heap.
When is very small as compared to n, this algorithm again depends on the size of array.

We want k smallest elements, if we pick first k elements from a min heap, will it solve the problem? I think so. Create a min heap of n elements in place from the given array, and then pick first k elements.
Creation of heap has complexity of O(n), do more reading on it. All we need to do is delete k times from this heap, each time there will be heapify(). It will have complexity of O(log n) for n element heap. So, overall complexity would be O(n + klogn).

Depending on what you want to optimize, select correct method to find kth smallest element in array.

Please share if there is something wrong or missing. If you are interested in taking coaching sessions from our experienced teachers, please reach out to us at [email protected]