GuilinDev

Lc0004

05 August 2008

4 Median of Two Sorted Arrays

原题概述

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

You may assume nums1 and nums2 cannot be both empty.

Example 1:

nums1 = [1, 3]
nums2 = [2]

The median is 2.0

Example 2:

nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5

题意和分析

求两个有序数组的中位数，要求复杂度O(log (m+n))，看到这个只能是二分法了；如果将两个数组合并后再查找中位数的话为O(m+n)，不满足条件（不过这本身是Merge Sort的一个步骤），做法参考了下面链接的第三个解法

假设我们要找第 k 小数（从1开始数），我们可以每次循环排除掉 k / 2 个数。看下边一个例子。

假设我们要找第 7 小的数字。

我们比较两个数组的第 k / 2 个数字(如果 k 是奇数，向下取整，方便排除)，也就是比较第 3 个数字，上边数组中的 4 和下边数组中的 3 ，如果哪个小，就表明该数组的前 k / 2 个数字都不是第 k 小数字（两个数组一个出一半，较小数和其左边的数肯定不会是二者共同考虑的中位数），所以较小数和其在数组中左边的数都“太小了”，可以去掉。也就是 1，2，3 这三个数字不可能是第 7 小的数字，我们可以把它排除掉。将 1349 和 45678910 两个数组作为新的数组进行比较。

总结， A [ 1 ]，A [ 2 ]，A [ 3 ]，A [ k / 2] … ，B[ 1 ]，B [ 2 ]，B [ 3 ]，B[ k / 2] … ，如果 A [ k / 2 ] < B [ k / 2 ] ，那么 A [ 1 ]，A [ 2 ]，A [ 3 ]，A [ k / 2] 都不可能是第 k 小的数字。

证明，现在A 数组中比 A [ k / 2 ] 小的数有 k / 2 - 1 个，而在B 数组中，如果B [ k / 2 ]这个数字比 A [ k / 2 ] 大，那么就算B [ k / 2 ] 前边的所有数字都比 A [ k / 2 ] 小，也只有 k / 2 - 1 个，所以比 A [ k / 2 ] 小的数字最多有 k / 2 - 1 + k / 2 - 1 = k - 2 个（索引从0开始），所以 A [ k / 2 ] 最多是第 k - 1 小的数，而左边的比 A [ k / 2 ] 小的数那就更不可能是第 k 小的数了，所以可以把从A[k / 2]开始向左边的所有数排除。

下图橙色的部分表示已经可以去掉的数字，

由于我们已经排除掉了 3 个数字，就是这 3 个数字一定在最前边，所以在两个新数组中，我们只需要找第 7 - 3 = 4 小的数字就可以了，也就是 k = 4 。此时两个数组，比较第k / 2 = 2 个数字，3 < 5，所以我们可以把小的那个数组中的 1 ，3 排除掉了。

我们又排除掉 2 个数字，所以现在找第 (4 - 2) / 2 = 1 小的数字就可以了。此时比较两个数组中的第 k / 2 = 1 个数，4 = 4 ，怎么办呢？由于两个数相等，所以我们无论去掉哪个数组中的都行，因为去掉 1 个总会保留 1 个的，所以没有影响。为了统一，我们就假设 4 > 4 吧，所以此时将下边的 4 去掉。

由于又去掉 1 个数字，1 / 2 = 0，此时我们要找第 0 小的数字，就是找到的比较小的数本身，所以只需判断两个数组中第一个数字哪个小就可以了，这里也就是 4 。

所以第 7 小的数字是 4 。

我们每次都是取 k / 2 的数进行比较，有时候可能会遇到数组长度小于 k / 2 的时候。

此时 k / 2 等于 3 ，而上边的数组长度是 2 ，我们此时将箭头指向它的末尾就可以了。这样的话，由于 2 < 3 ，所以就会导致上边的数组 1，2 都被排除。造成下边的情况。

由于 2 个元素被排除，所以此时 k = 5 ，又由于上边的数组已经空了，我们只需要返回下边的数组的第 5 个数字就可以了，如果上面的数组的数字较大也是同样的道理，总之第k大的数字不会出现在较短的数组里面。

从上边可以看到，无论是找第奇数个还是第偶数个数字，对我们的算法并没有影响，而且在算法进行中，k 的值都有可能从奇数变为偶数，最终都会变为 1 或者由于一个数组空了，直接返回结果。

所以我们采用递归的思路，为了防止数组长度小于 k / 2 ，所以每次比较 min ( k / 2，len ( 数组 ) ) 对应的数字，把小的那个对应的数组的数字排除，将两个新数组进入递归，并且 k 要减去排除的数字的个数。递归出口就是当 k = 1 或者其中一个数字长度是 0 了。

代码

如同上述解释的做法

class Solution {
    public double findMedianSortedArrays(int[] nums1, int[] nums2) {
        int n = nums1.length;
        int m = nums2.length;
        int left = (n + m + 1) / 2;
        int right = (n + m + 2) / 2;

        return (getKth(nums1, 0, n - 1, nums2, 0, m - 1, left) + getKth(nums1, 0, n - 1, nums2, 0, m - 1, right)) * 0.5;
    }

    //虽然用到了递归，但是可以看到这个递归属于尾递归，所以编译器不需要不停地堆栈，所以空间复杂度为 O（1）
    private int getKth(int[] nums1, int start1, int end1, int[] nums2, int start2, int end2, int k) {
        int len1 = end1 - start1 + 1;
        int len2 = end2 - start2 + 1;

        //确保len1 的长度小于 len2，这样就能保证如果有数组空了，一定是 len1，方便控制流程
        if (len1 > len2) return getKth(nums2, start2, end2, nums1, start1, end1, k);

        if (len1 == 0) return nums2[start2 + k - 1];//如果有数组空了就直接返回nums2剩下元素中的中位数

        if (k == 1) return Math.min(nums1[start1], nums2[start2]);

        int i = start1 + Math.min(len1, k/2) - 1;
        int j = start2 + Math.min(len2, k/2) - 1;

        if (nums1[i] > nums2[j]) {
            return getKth(nums1, start1, end1, nums2, j + 1, end2, k - (j - start2 + 1));
        } else {
            return getKth(nums1, i + 1, end1, nums2, start2, end2, k - (i - start1 + 1));
        }
    }
}

同样的思路，另一种写法，更清楚一点

class Solution {
    /**
    * 寻找一个unioned sorted array中的第k大（从1开始数）的数。因而等价于寻找并判断两个sorted array中第k/2（从1开始数）大的数。
    * 特殊化到求median，那么对于奇数来说，就是求第(m + n) / 2 + 1（地板除法，从1开始数）大的数。
    * 而对于偶数来说，就是求第(m + n) / 2大（地板除法，从1开始数）和第(m + n) / 2 + 1大（地板除法，从1开始数）的数的算术平均值。
    */
    public double findMedianSortedArrays(int[] nums1, int[] nums2) { 
        /**
        * 那么如何判断两个有序数组A,B中第k大的数呢？

        * 我们需要判断A[k/2-1]和B[k/2-1]的大小。

        * 如果A[k/2-1]==B[k/2-1]，那么这个数就是两个数组中第k大的数。

        * 如果A[k/2-1] < B[k/2-1], 那么说明A[0]到A[k/2-1]都不可能是第k大的数，所以需要舍弃这一半，继续从A[k/2]到A[A.length-1]继续找。当然，因为这里舍弃了A[0]到A[k/2-1]这k/2个数，那么第k大也就变成了，第k-k/2个大的数了。

        如果 A[k/2-1]>B[k/2-1]，就做之前对称的操作就好。
        */
        
        int m = nums1.length;
        int n = nums2.length;
        int total = m + n;
        
        if (total % 2 != 0) { // exact one or algrebic divied by two 
            return (double) findKth(nums1, 0, m - 1, nums2, 0, n - 1, total / 2 + 1);// kth, but the index is k-1
        } else {
            double x = (double) findKth(nums1, 0, m - 1, nums2, 0, n - 1, total / 2);
            double y = (double) findKth(nums1, 0, m - 1, nums2, 0, n - 1, total / 2 + 1);
            return (double)(x + y) / 2;
        }
    }
    public static int findKth(int[] nums1, int aStart, int aEnd, int[] nums2, int bStart, int bEnd, int k) {
            
            int m = aEnd - aStart + 1;
            int n = bEnd - bStart + 1;
            
            if (m > n) {
                return findKth(nums2, bStart, bEnd, nums1, aStart, aEnd, k);
            }
            
            if (m == 0) {
                return nums2[k - 1];
            }
        
            if (k == 1) {
                return Math.min(nums1[aStart], nums2[bStart]);
            }
            
            //----------------------------------------
            
            int partA = Math.min(k/2,m); // half length of total length
            int partB = k - partA; // 
            
            // Recursion to 
            if (nums1[aStart + partA - 1] < nums2[bStart + partB - 1]) {
                return findKth(nums1, aStart+partA, aEnd, nums2, bStart, bEnd, k - partA);
            } else if (nums1[aStart + partA - 1] > nums2[bStart + partB - 1]) {
                return findKth(nums1, aStart, aEnd, nums2, bStart + partB, bEnd, k - partB);
            } else {
                return nums1[aStart + partA - 1];
            }
        }
}

使用二分搜索的办法，推荐这个办法，寻找nums1 nums2两个数组的划分，同时满足nums1中左半边的最右边的数 <= nums2中右半边的最左边的数 + nums2中左半边的最右边的数 <= nums1中右半边的最左边的数。

class Solution {
    /**
    * 二分搜索，log(min(m, n))
    */
    public double findMedianSortedArrays(int[] nums1, int[] nums2) {
        //if nums1 length is greater than switch them so that 
        // nums1 is always smaller than nums2, 方便复用代码
        if (nums1.length > nums2.length) {
            return findMedianSortedArrays(nums2, nums1);
        }
        int x = nums1.length;
        int y = nums2.length;

        int low = 0;
        int high = x;
        while (low <= high) {
            // 把两个数组分别划分
            int partitionX = (low + high) / 2; // nums1的中间
            int partitionY = (x + y + 1) / 2 - partitionX; // nums2的中间

            //if partitionX is 0 it means nothing is there on left side. Use -INF for maxLeftX
            //if partitionX is length of input then there is nothing on right side. Use +INF for minRightX
            int maxLeftX = (partitionX == 0) ? Integer.MIN_VALUE : nums1[partitionX - 1];
            int minRightX = (partitionX == x) ? Integer.MAX_VALUE : nums1[partitionX];

            int maxLeftY = (partitionY == 0) ? Integer.MIN_VALUE : nums2[partitionY - 1];
            int minRightY = (partitionY == y) ? Integer.MAX_VALUE : nums2[partitionY];

            if (maxLeftX <= minRightY && maxLeftY <= minRightX) {
                //We have partitioned array at correct place
                // Now get max of left elements and min of right elements to get the median in case of even length combined array size
                // or get max of left for odd length combined array size.
                if ((x + y) % 2 == 0) {
                    return ((double)Math.max(maxLeftX, maxLeftY) + Math.min(minRightX, minRightY))/2;
                } else {
                    return (double)Math.max(maxLeftX, maxLeftY);
                }
            } else if (maxLeftX > minRightY) { //we are too far on right side for partitionX. Go on left side.
                high = partitionX - 1;
            } else { //we are too far on left side for partitionX. Go on right side.
                low = partitionX + 1;
            }
        }

        //Only we we can come here is if input arrays were not sorted. Throw in that scenario.
        throw new IllegalArgumentException();
    }
}

O（m + n）解法，合并两个链表

class Solution {
    public double findMedianSortedArrays(int[] nums1, int[] nums2) {
        /**
        * 找到两个数组中的中位数，然后
        */
        int index1 = 0; // nums1中的索引
        int index2 = 0; // nums2中的索引
        int med1 = 0;
        int med2 = 0;
        for (int i = 0; i <= (nums1.length+nums2.length) / 2; i++) {
            med1 = med2;
            if (index1 == nums1.length) { // nums1中元素遍历完
                med2 = nums2[index2];
                index2++;
            } else if (index2 == nums2.length) { // nums2中元素遍历完
                med2 = nums1[index1];
                index1++;
            } else if (nums1[index1] < nums2[index2] ) {
                med2 = nums1[index1];
                index1++;
            }  else {
                med2 = nums2[index2];
                index2++;
            }
        }

        // the median is the average of two numbers，判断一下奇偶
        if ((nums1.length + nums2.length) % 2 == 0) {
            return (double)(med1 + med2) / 2;
        }

        return (double)med2;
    }
}

This site is open source. Improve this page.