Median of two sorted arrays

Before finding the median of two sorted arrays, let’s understand what is the median?

Median is the middle value in a list of numbers.

For example, median of A = [2,4,5,6,7,8,9] will be 6.

To find the median, the input should be sorted. If it is not sorted, then first sort it and return the middle of that list. The question arises is what if the number of elements in the list is even? In that case, the median is the average of two middle elements.

Find the median of two sorted arrays

There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

median of two sorted arrays

Before going into the post, find a pen and paper and try to work out an example. And as I tell in our posts, come up with a method to solve this considering, you have all the time and resources to solve this problem. I mean think of most brute force solutions.

Let’s simplify the question first and then work it upwards. If question was to find the median of one sorted array, how would you solve it?
If array has odd number of elements in it, return A[mid], where mid = (start + end)/2; if array has even number of elements, return average of A[mid] + A[mid+1].
For example for array A = [1,5,9,12,15], median is 9.
Complexity of this operation is O(1).

Focus back on 2 sorted arrays. To find a median of 2 sorted arrays in no more simple and definitely not O(1) operation. For example,

A = [ 1,5,9,12,15] and B = [ 3,5,7,10,17], median is 8.

How about merging these two sorted arrays into one, the problem is reduced to find the median of one array.

Although to find median in a sorted array is O(1), merge step takes O(n) operations. Hence, overall complexity would be O(n).
Reuse the merge part of Merge sort algorithm to merge two sorted arrays.

Start from the beginning of two arrays and advance the pointer of the array whose current element is smaller than the current element of the other. This smaller element is put on to output array which is sorted merged array. Merge will use an additional space to store N elements (Note that N is here sum of the size of both sorted arrays). The best part of this method is that it does not consider if the size of the two arrays is the same or different.

This can be optimized, by counting number of elements n, in two arrays in advance. Then we need to merge only n/2 + 1 elements if n is even and n/2 if n is odd. This saves us O(n/2) space.

There is another optimization: do not store all n/2 or n/2 + 1 elements while merging, keep track of last two elements in sorted array, and count how many elements are sorted. When n/2 + 1 elements are sorted return average of last two elements if n is even, else return n/2 element as the median. With these optimizations, time complexity remains O(n), however, space complexity reduces to O(1).

Impementation with merge function

package com.company;

/**
 * Created by sangar on 18.4.18.
 */
public class Median {

    public static double findMedian(int[] A, int[] B){
        int[] temp = new int[A.length + B.length];

        int i = 0;
        int j = 0;
        int k = 0;
        int lenA = A.length;
        int lenB = B.length;

        while(i<lenA && j<lenB){
            if(A[i] <= B[j]){
                temp[k++] = A[i++];
            }else{
                temp[k++] = B[j++];
            }
        }
        while(i<lenA){
            temp[k++] = A[i++];
        }
        while(j<lenB){
            temp[k++] = B[j++];
        }

        int lenTemp = temp.length;

        if((lenTemp)%2 == 0){
            return ( temp[lenTemp/2-1] + temp[lenTemp/2] )/2.0;
        }
        return temp[lenTemp/2];
    }

    public static void main(String[] args){
        int[] a = {1,3,5,6,7,8,9,11};
        int[] b = {1,4,6,8,12,14,15,17};

        double median = findMedian(a,b);
        System.out.println("Median is " + median);
    }
}

Optimized version to find median of 2 sorted arrays

package com.company;

/**
 * Created by sangar on 18.4.18.
 */
public class Median {

    public  static int findMedianOptimized(int[] A, int[] B){
        int i = 0;
        int j = 0;
        int k = 0;
        int lenA = A.length;
        int lenB = B.length;

        int mid = (lenA + lenB)/2;
        int midElement = -1;
        int midMinusOneElement = -1;

        while(i<lenA && j<lenB){
            if(A[i] <= B[j]){
                if(k == mid-1){
                    midMinusOneElement = A[i];
                }
                if(k == mid){
                    midElement = A[i];
                    break;
                }
                k++;
                i++;
            }else{
                if(k == mid-1){
                    midMinusOneElement = B[j];
                }
                if(k == mid){
                    midElement = B[j];
                    break;
                }
                k++;
                j++;
            }
        }
        while(i<lenA){
            if(k == mid-1){
                midMinusOneElement = A[i];
            }
            if(k == mid){
                midElement = A[i];
                break;
            }
            k++;
            i++;
        }
        while(j<lenB){
            if(k == mid-1){
                midMinusOneElement = B[j];
            }
            if(k == mid){
                midElement = B[j];
                break;
            }
            k++;
            j++;
        }

        if((lenA+lenB)%2 == 0){
            return (midElement + midMinusOneElement)/2;
        }
        return midElement;
    }

    public static void main(String[] args){
        int[] a = {1,3,5,6,7,8,9,11};
        int[] b = {1,4,6,8,12,14,15,17};

        double median = findMedianOptimized(a,b);
        System.out.println("Median is " + median);
    }
}

Binary search approach

One of the properties which lead us to think about binary search is that two arrays are sorted. Before going deep into how binary search algorithm can solve this problem, first find out mathematical conditions which should hold true for a median of two sorted arrays.

As explained above, median divides input into two equal parts, so first condition median index m satisfy is a[start..m] and a[m+1..end] are equal size. We have two arrays A and B, let’s split them into two. The first array A is of size m, and it can be split into m+1 ways at 0 to m.

If we split at i, len(Aleft) – iand len(Aright) = m-i.
When i=0, len(Aleft) = 0 and when i=m, len(Aright) = 0.

find median of two sorted arrays

Similarly, for array B, we can split it into n+1 way, j being from 0 to n.

median of sorted arrays

After splitting at specific indices i and j, how can we derive the condition for the median: left part of the array should be equal to the right part of the array?

If len(Aleft) + len(Bleft) == len(Aright) + len(Bleft) , it satisfies our condition. As we already know these values for split at i and j, equation becomes

i+j = m-i + n-j

median of two sorted array

But is this the only condition to satisfy for median? As we know, median is middle of the sorted list, we have to guarantee that all elements on the left array should be less than elements in the right array.

It is must that max of left part is less than min of right part. What is max of left part? It can be either A[i-1] or B[j-1]. What can be min of right part? It can be either A[i] or B[j].

We already know that, A[i-1] < A[i] and B[j-1] < B[j] as arrays A and B are sorted. All we need to check if A[i-1] <= B[j] and B[j-1] <= A[i], if index i and j satisfy this conditions, then median will be average of max of left part and min of right part if n+m is even and max(A[i-1], B[j-1]) if n+m is odd.

Let’s make an assumption that n>=m, then j = (n+m+1)/2 -i, it will always lead to j as a positive integer for possible values of i (o ~m) and avoid array out of bound errors and automatically makes the first condition true.

Now, problem reduces to find index i such that A[i-1] <= B[j] and B[j-1]<=A[i] is true.

This is where binary search comes into the picture. We can start as mid of array A, j = (n+m+1)/2-i, and see if this i satisfies the condition. There can be three possible outcomes for the condition.
1. A[i-1] <= B[j] and B[j-1]<=A[i] is true, we return the index i.
2. If B[j-1] > A[i], in this case, A[i] is too small. How can we increase it? by moving towards right. If i is increased, value A[i] is bound to increase, and also it will decrease j. In this case, B[j-1] will decrease and A[i] will increase which will make B[j-1]<=A[i] true. So, limit search space for i to mid+1 to mand go to step 1.
3. A[i-1] > B[j], means A[i-1] is too big. And we must decrease i to get A[i-1]<=B[j]. Limit search space for i to 0 mid-1 and go to step 1

Let’s take an example and see how this works. Out initial two arrays as follows.

The index i is mid of array A and corresponding j will as shown

Since condition B[j-1] <= A[i] is not met, we discard left of A and right of B and find new i and j based on remaining array elements.

Finally, our condition that A[i-1]<= B[j] and B[j-1] <=A[i] is satisfied, find the max of left and min of right and based on even or odd length of two arrays, return average of the max of left and min of right or return a max of left.

This algorithm has dangerous implementation caveat, what if i or j is 0, in that case, i-1 and j-1 will be invalid indices. When can j be zero, when i==m. Till i<m, no need to worry about j being zero. So be sure to check i<m and i>0, when we are checking j-1 and i-1 respectively.

Implementation

package com.company;

/**
 * Created by sangar on 18.4.18.
 */
public class Median {

    public static double findMedianWithBinarySearch(int[] A, int[] B){

        int[] temp;

        int lenA = A.length;
        int lenB = B.length;

        /*We want array A to be always smaller than B
          so that j is always greater than zero
         */
        if(lenA > lenB){
            temp = A;
            A = B;
            B = temp;
        }

        int iMin = 0;
        int iMax = A.length;
        int midLength =  ( A.length + B.length + 1 )/2;

        int i = 0;
        int j = 0;

        while (iMin <= iMax) {
            i = (iMin + iMax) / 2;
            j = midLength - i;
            if (i < A.length && B[j - 1] > A[i]) {
                // i is too small, must increase it
                iMin = i + 1;
            } else if (i > 0 && A[i - 1] > B[j]) {
                // i is too big, must decrease it
                iMax = i - 1;
            } else {
                // i is perfect
                int maxLeft = 0;
                //If there we are at the first element on array A
                if (i == 0) maxLeft = B[j - 1];
                //If we are at te first element of array B
                else if (j == 0) maxLeft = A[i - 1];
                //We are in middle somewhere, we have to find max
                else maxLeft = Integer.max(A[i - 1], B[j - 1]);

                //If length of two arrays is odd, return max of left
                if ((A.length + B.length) % 2 == 1)
                    return maxLeft;

                int minRight = 0;
                if (i == A.length) minRight = B[j];
                else if (j == B.length) minRight = A[i];
                else minRight = Integer.min(A[i], B[j]);

                return (maxLeft + minRight) / 2.0;
            }
        }
        return -1;
    }

    public static void main(String[] args){
        int[] a = {1,3,5,6,7,8,9,11};
        int[] b = {1,4,6,8,12,14,15,17};

        double median = findMedian(a,b);
        System.out.println("Median is " + median);
    }
}

The complexity of this algorithm to find the median of two sorted arrays is log(max(m,n)) where m and n are the size of two arrays.

Please share your views and suggestions.