Merge sort algorithm

Tags: , , , ,

Merge Sort Algorithm

We can classify sorting algorithms based on their complexity, selection sort, bubble, and insertion sort have the complexity of O(n2) while heap sort, merge sort and quick sort (with some assumptions) have a complexity of O(nlogn) and count and Radix sorts are linear O(n) algorithms. In this post, we will concentrate on the merge sort algorithm.

Key points about merge sort algorithm

  • class of algorithm : divide and conquer
  • Complexity : Time O(n2)
  • Space : O(n)
  • Stable : Yes
  • In place : NO

Merge sort algorithm falls in divide and conquer class of algorithms where input space is divided into subproblems and these subproblems are then solved individually and combined to solve the original problem. Below figure explains the divide and conquer paradigm.

Merge sort algorithm
Divide and conquer strategy

In the merge sort algorithm, input space is divided into two subproblems till the time we achieve the smallest subproblem and then the smallest sub-problem is sorted and then combined up till the original input is sorted. The question arises is what is the smallest subproblem? The smallest subproblem is where input cannot be further divided, a subproblem with one item to sorted.

Once, the split is done to the smallest size, start merging. Going from the bottom, start with two one-item subproblems, combine that to create a two-item subproblem, then combine two two-items subproblems to create a four-item subproblem. Go on till you get a problem with the original size.

merge sort algorithm

Implementation of Merge sort algorithm

As is the case with all divide and conquer algorithm problems, same processing is applied to subproblem which was applied to original problem, a case for recursive solution. Recursion needs a termination condition. Base condition to stop recursion in merge sort algorithm is when subproblem has only one element to be sorted. How can you sort single element? You don’t do, just return the element.

Now, how do we combine two subproblems solutions? This step is to conquer part. Sort smaller parts and combine them with a merge operation. In merge operation, scan both sorted arrays one by one element, and based on the output of comparison, add one of the two items into output array, till both arrays are scanned. If one array is finished before the other, just copy all elements from another array to the output array. Copy this output array back to the original input array so that it can be combined with bigger sub-problems till a solution to the original problem is derived.

int merge(int a[], int start, int mid, int end){
 
    int i,j,k;
    int temp[end-start+1];
 
    i = start; j = mid+1; k =0;
    /* Compare all elements of two sorted arrays and store
      them in sorted array. */
    while(i <= mid && j <= end){
        if(a[i]< a[j]){
            temp[k++]= a[i++];
        }
        else {
            temp[k++]  = a[j++];
        }
    }
    while(i <= mid){
        temp[k++] = a[i++]; 
    }
    while(j <= end){
        temp[k++] = a[j++]; 
    }
    //Copy the temporary array into original array
    for(i=0; i<k; i++){
        a[i+start] = temp[i];
    }
}
int mergeSort(int a[], int start, int end ){
 
    int mid  = (start + end)/2;
    if(start<end){
        //sort the left part
        mergeSort(a, start, mid);
        //sort right part
        mergeSort(a, mid+1, end);
 
        //merge them together
        merge(a, start, mid, end);
    }
}
int main(){
	int a[] = {2,3,4,1,8,7,6,9};
	int size = sizeof(a)/sizeof(a[0]);
 
	mergeSort(a, 0, size-1);
	for(int i=0; i<size; i++){
		printf("%d ", a[i]);
	}
}

Let’s analyze the complexity of merge sort algorithm, it can be divided into 2 parts :
1. Split step, which takes the constant amount of time, all it does is create two arrays half the size of the original array, irrespective of the size of the array. So, the complexity of this step is O(1).

2. Combine step complexity is O(n) where n is the number of elements.
The complexity of each divide and conquer step is O(n). So, how many such divides and conquer steps do we have?
For n elements in the array, there will be O(logn) such steps. Hence, the complexity of merge sort algorithm is O(nlogn)

However, there is one caveat, every time we merge, two sub-arrays an auxiliary array is needed to temporarily store array elements and that incurs O(n) space complexity on merge sort algorithm.

There are some improvements which can be done on this algorithm.
1. When the number of elements is less than some threshold, one can use insertion or selection sort to sort those numbers. Why? Because when n is small, O(n2) is as good as O(nlogn) and it saves the extra overhead of split and merging. All in all, using insertion sort in input array with small size, can give better performance.

2. Before calling merging, check if all the elements in right array are greater then left array, if yes, no need of merging. This can be easily checked by comparing a[mid] with a[mid+1]. If a[mid] is less than a[mid+1], two sub arrays are already sorted and we don’t need to perform merge.

Please let us what do you think of this article and if there is something wrong or missing.