Merge sort algorithm

Merge Sort Algorithm

We can classify sorting algorithms based on their complexity, selection sort, bubble and insertion sort have complexity of O(n2) while heap sort, merge sort and quick sort (with some assumptions) have complexity of  O(nlogn) and count and Radix sorts are linear O(n) algorithms.In this post, we will concentrate on merge sort algorithm.

Key points about merge sort algorithm

  • class of algorithm : divide and conquer
  • Complexity : Time O(n2)
  • Space : O(n)
  • Stable : Yes
  • In place : NO

Merge sort algorithm falls in divide and conquer class of algorithms where input space is divided into subproblems and these subproblems are then solved individually and combined together to solve original problem. Below figure explains divide and conquer paradigm.

Merge sort algorithm
Divide and conquer strategy

In merge sort algorithm, input space is divided into two subproblems till the time we achieve smallest subproblem and then smallest sub-problem is sorted and then combined up till original input is sorted. Question arises is what is the smallest subproblem? Smallest subproblem is where input cannot be further divided, a subproblem with one item to sorted.

Once, split is done till smallest size, start merging. Going from bottom, start with two one-item subproblems, combine that to create a two item subproblem, then combine two two-items subproblem to create a four item subproblem. Go on till you get problem with original size.

merge sort algorithm

Implementation of Merge sort algorithm

As is the case with all divide and conquer algorithm problems, same processing is applied to subproblem which was applied to original problem, a case for recursive solution. Recursion needs a termination condition. Base condition to stop recursion in merge sort algorithm is when subproblem has only one element to be sorted. How can you sort single element? You don’t do, just return the element.

Now, how do we combine two subproblems solutions? This step is conquer part. Sort smaller parts and combine them together with merge operation. In merge operation, scan both sorted arrays one by one element and based on output of comparison, put one of the two items into output array, till both arrays are scanned. If one array is finished before other, just copy all elements from other array to output array. Copy this output array back to original input array so that it can be combined with bigger sub problems till solution to original problem is derived.

int merge(int a[], int start, int mid, int end){
 
    int i,j,k;
    int temp[end-start+1];
 
    i = start; j = mid+1; k =0;
    /* Compare all elements of two sorted arrays and store
      them in sorted array. */
    while(i <= mid && j <= end){
        if(a[i]< a[j]){
            temp[k++]= a[i++];
        }
        else {
            temp[k++]  = a[j++];
        }
    }
    while(i <= mid){
        temp[k++] = a[i++]; 
    }
    while(j <= end){
        temp[k++] = a[j++]; 
    }
    //Copy the temporary array into original array
    for(i=0; i<k; i++){
        a[i+start] = temp[i];
    }
}
int mergeSort(int a[], int start, int end ){
 
    int mid  = (start + end)/2;
    if(start<end){
        //sort the left part
        mergeSort(a, start, mid);
        //sort right part
        mergeSort(a, mid+1, end);
 
        //merge them together
        merge(a, start, mid, end);
    }
}
int main(){
	int a[] = {2,3,4,1,8,7,6,9};
	int size = sizeof(a)/sizeof(a[0]);
 
	mergeSort(a, 0, size-1);
	for(int i=0; i<size; i++){
		printf("%d ", a[i]);
	}
}

Let’s analyse complexity of merge sort algorithm, it can be divided into 2 parts :
1. Split step, which takes the constant amount of time, all it does is create two array half the size of original array, irrespective of size of array. So, complexity of this step is O(1).

2. Combine step complexity is O(n) where n is number of elements.
Complexity of each divide and conquer step is O(n). So, how many such divide and conquer steps do we have?
For n elements in array, there will be O(logn) such steps. Hence, complexity of merge sort algorithm is O(nlogn)

However, there is one caveat, every time we merge, two sub-arrays an auxiliary array is needed to temporarily store array elements and that incurs O(n) space complexity on merge sort algorithm.

There are some improvements which can be done on this algorithm.
1. When number of elements are less than some threshold, one can use insertion or selection sort to sort those numbers.  Why? Because when n is small, O(n2) is as good as O(nlogn) and it saves extra overhead of split and merging. All in all, using insertion sort in input array with small size, can give better performance.

2. Before calling merging, check if all the elements in right array are greater then left array, if yes, no need of merging. This can be easily checked by comparing a[mid] with a[mid+1]. If a[mid] is less than a[mid+1],  two sub arrays are already sorted and we don’t need to perform merge.

Please let us what do you think of this article and if there is something wrong or missing.