Count sort : Sorting in linear time

Count sort : Sorting in linear time

Are there any sorting algorithm which has worst case complexity of O(n)? There are a few like count sort and decision tree algorithms. In this post we would discuss about count sort and couple of problems where this counting sort algorithm can be applied.

Counting sort was invented by Harold H. Seward
To apply counting sort, we need to keep in mind following assumptions:

  1. There should be duplicate values in the input
  2. There should be at most K different type of values in input.
  3. The input data ranges in 0 to K

Count sort goes for O(n2) if K is very close to n i.e. a very few elements are duplicate and rest all are unique. So above three conditions are necessary to have this algorithm work in linear time.

Count Sort -Approach
Let’s see how does it work? There are three steps involved in the algorithm:

  • Sampling the data, counting the frequencies of each element in the input set.
  • Accumulating frequencies to find out relative positions of each element.
  • Third step distributes each element to its appropriate place.
  • Now let’s take one example and go through the above three steps:
    Input is  an array A = [1,2,3,2,2,3,3,1,1,1,1,3,2,2]. Here we can see that K=3. Now let’s perform the first step of the method, i.e. count the frequency of each element. We keep a hash and keep track of each element’s count. Since values are upper bound by K, we need at most K sized hash. We initialize this hash as zero. A is input array and n is the size of the array.

    int char count [K];
    
    for(i=0; i<K;i++){
            count[i]=0;
    }
    
    for(i=0;i<n;i++){
            count[a[i]]++;
    }
    

    Count looks likes count =[5,5,4]. The complexity of this step is O(n). The second step involves the accumulation of frequencies where we add frequencies of previous elements to the current element.
    F(j) = summation of f[i] for all i=0

    F(j) = summation of f[i] for all i<=j and i>=0
    
    for(i=1;i<K;i++){
       Count[i] +=count[i-1];
    }
    

    The second step runs for K times hence the complexity of O(K). The third step involves distribution. Let’s create an output array called as B of size n For every element in A we check the corresponding count in count array and place the element at that position. So, first 1 will be at position 4 (there are total 5 and array is an index from 0), the first two will be placed at 9 and the first three will be at 13. Subsequently, lower positions will be filled as we encounter them in the input array.

    for(i=0; i<n;i++){
           B[--count[a[i]]] = a[i];
    }
    

    This step has complexity of O(n)

    #include<stdlib.h>
    #include<stdio.h>
    
    void count_sort(int a[], int n, int K){
    
            int count [K];
            int i,j;
            int b[n];
            for(i=0; i<=K;i++){
                    count[i]=0;
            }
    
            for(i=0;i<n;i++){
                    count[a[i]]++;
            }
    
            for(i=1;i<=K;i++){
                    count[i] +=count[i-1];
            }
            for(j=0;j<n;j++){
                    b[j]=0;
            }
    
            for(i=0; i<n;i++){
                    count[a[i]]--;
                    b[count[a[i]]] = a[i];
            }
    }
    int main(){
            int a[]= {0,1,1,1,0,0,0,3,3,3,4,4};
            int n  =  sizeof(a)/sizeof(a[0]);
            
            count_sort(a,n,4);
    }
    
    Iter 0 :0  0  0  0  1  0  0  0  0  0  0  0  0  0  
    Iter 1 :0  0  0  0  1  0  0  0  0  2  0  0  0  0  
    Iter 2 :0  0  0  0  1  0  0  0  0  2  0  0  0  3  
    Iter 3 :0  0  0  0  1  0  0  0  2  2  0  0  0  3  
    Iter 4 :0  0  0  0  1  0  0  2  2  2  0  0  0  3  
    Iter 5 :0  0  0  0  1  0  0  2  2  2  0  0  3  3  
    Iter 6 :0  0  0  0  1  0  0  2  2  2  0  3  3  3  
    Iter 7 :0  0  0  1  1  0  0  2  2  2  0  3  3  3  
    Iter 8 :0  0  1  1  1  0  0  2  2  2  0  3  3  3  
    Iter 9 :0  1  1  1  1  0  0  2  2  2  0  3  3  3  
    Iter 10 :1  1  1  1  1  0  0  2  2  2  0  3  3  3  
    Iter 11 :1  1  1  1  1  0  0  2  2  2  3  3  3  3  
    Iter 12 :1  1  1  1  1  0  2  2  2  2  3  3  3  3  
    Iter 13 :1  1  1  1  1  2  2  2  2  2  3  3  3  3  
    
    Final O/P :1  1  1  1  1  2  2  2  2  2  3  3  3  3
    

    Total complexity of the algorithm being O(K)+ O(n) + O(K) +O(n) = O(K+n). Please refer to the master theorem, how this is derived. Now if we see, K+n remains in order of n till the time K<n, if K goes on to n^2, the complexity of algorithm goes for polynomial time instead of linear time.

    There is immediate application of this algorithm in following problem:
    Let’s there is an array which contains Black, White and Red balls, we need to arrange these balls in such a way that all black balls come first, white second and Red at last. Hint: assign black 0, white 1 and Red 2 and see. 🙂

    In the next post, we would discuss how extra space can be saved, how initial positions of elements can be maintained. We would go through an interesting problem to discuss the above optimization.

    Why this filling from the end of the span? Because this step makes count sort stable sort. Here we have seen a sorting algorithm which can give us o/p linear time given some particular conditions met on the i/p data.