Median of Two Sorted Arrays [Hard] — Binary Search on Partition (The Definitive Guide)
Advertisement
Problem Statement
Given two sorted arrays
nums1andnums2of sizemandnrespectively, return the median of the two sorted arrays.The overall run time complexity must be O(log(m + n)).
Example 1 — Odd total length:
Input: nums1 = [1, 3], nums2 = [2]
Output: 2.0
Reason: Merged array = [1, 2, 3]. Median is the middle element: 2.
Example 2 — Even total length:
Input: nums1 = [1, 2], nums2 = [3, 4]
Output: 2.5
Reason: Merged array = [1, 2, 3, 4]. Median = (2 + 3) / 2 = 2.5
Constraints:
0 <= m, n <= 10001 <= m + n <= 2000-10^6 <= nums1[i], nums2[i] <= 10^6- The time complexity requirement is O(log(m + n)), which effectively means O(log(min(m, n))).
- Why This Problem Matters
- The Naive Approach — Merge and Find (O(m+n))
- The Partition Insight — What Is a Median, Really?
- The Binary Search on Partition
- Edge Cases — The Sentinel Values
- The Median Formula
- Visual Dry Run
- Example 1: nums1 = [1, 3], nums2 = [2]
- Example 2: nums1 = [1, 2], nums2 = [3, 4]
- Common Mistakes
- Solutions
- Python — Naive Merge Approach (O(m+n))
- Python — Optimal Binary Search on Partition (O(log(min(m,n))))
- JavaScript — Naive Merge Approach (O(m+n))
- JavaScript — Optimal Binary Search on Partition (O(log(min(m,n))))
- Complexity Analysis
- Follow-up Questions
- This Pattern Solves
- Key Takeaways
Why This Problem Matters
LeetCode 4 is the only Hard problem that appears on virtually every "must-know" FAANG list — not because the code is long, but because the insight it requires is genuinely non-obvious. It is a top-asked problem at Google, Amazon, Facebook, and Microsoft, frequently used as the "separating question" that filters candidates who truly understand binary search from those who just pattern-match.
Here is what makes it special: almost every binary search problem you have ever seen searches on array values — you are looking for a specific number inside the array. This problem is different. You are performing binary search on an abstract index space — the partition point — and the correctness condition is a relational check between four values, not a simple arr[mid] == target comparison.
That abstraction jump is exactly what interviewers are testing. Can you recognise that binary search is applicable even when there is no "target value" to look for? Can you translate a complex constraint (median of a merged array) into a binary search condition? If you can, you have demonstrated a level of algorithmic thinking that separates senior engineers from the rest.
Why O(log(m+n)) and not O(m+n)? The naive approach — merge both arrays and return the middle element — is O(m+n) in both time and space. The problem explicitly bans this. The only way to achieve logarithmic time on sorted data is binary search. But binary search on what? That is the puzzle.
The Naive Approach — Merge and Find (O(m+n))
Before building up to the optimal solution, it helps to see what you are trying to improve on.
The straightforward approach is the classic merge step from merge sort:
- Allocate a new array of size
m + n. - Merge
nums1andnums2into it using two pointers. - Return
merged[(m+n)//2]for odd total, or(merged[(m+n)//2 - 1] + merged[(m+n)//2]) / 2for even total.
# Naive — O(m+n) time, O(m+n) space — violates the constraint
def findMedianSortedArrays_naive(nums1, nums2):
merged = []
i, j = 0, 0
while i < len(nums1) and j < len(nums2):
if nums1[i] <= nums2[j]:
merged.append(nums1[i]); i += 1
else:
merged.append(nums2[j]); j += 1
merged.extend(nums1[i:]); merged.extend(nums2[j:])
total = len(merged)
if total % 2 == 1:
return float(merged[total // 2])
return (merged[total // 2 - 1] + merged[total // 2]) / 2.0
This is completely correct and easy to reason about. The problem? It is O(m+n). With m and n each up to 1000, you process up to 2000 elements. The interviewer wants O(log(min(m,n))). You need to cut the work down from 2000 operations to roughly 10.
The Partition Insight — What Is a Median, Really?
To get from O(m+n) to O(log(min(m,n))), you need to reframe what "finding the median" means at a conceptual level.
The median of a sorted array of length L is the value that splits the array into two equal halves:
- If
Lis odd: the median is the single middle element. Left half has(L-1)/2elements, right half has(L-1)/2elements. - If
Lis even: the median is the average of the two middle elements. Both halves haveL/2elements.
Now, instead of merging and splitting, ask: can we find a split in each array simultaneously such that the left portion of array1 + left portion of array2 forms exactly the correct left half of the merged sorted sequence?
Concretely, define:
i= number of elements taken fromnums1into the left halfj= number of elements taken fromnums2into the left half
For the partition to be valid, two conditions must both hold:
Condition 1: i + j == (m + n + 1) // 2
(the left halves together contain exactly half the elements)
Condition 2: maxLeft1 <= minRight2 AND maxLeft2 <= minRight1
(every element on the left is ≤ every element on the right)
Where:
maxLeft1 = nums1[i-1](the largest element taken from nums1's left)minRight1 = nums1[i](the smallest element left in nums1's right)maxLeft2 = nums2[j-1](the largest element taken from nums2's left)minRight2 = nums2[j](the smallest element left in nums2's right)
Condition 1 determines j once we pick i. If i + j = half, then j = half - i. So we only have one free variable: i. Binary search on i.
Condition 2 is the correctness check. Because both arrays are sorted internally, we already know nums1[i-1] <= nums1[i] and nums2[j-1] <= nums2[j]. The only cross-checks we need are maxLeft1 <= minRight2 and maxLeft2 <= minRight1.
This is the key insight: we do not need to see the entire merged array. We only need to check four boundary values.
The Binary Search on Partition
Since we must have 0 <= i <= m (we can take zero elements or all elements from nums1), and j is fully determined by i, we binary search i in the range [0, m].
Always binary search the smaller array. If m > n, swap the arrays. This keeps the search space minimal (O(log(min(m,n)))) and guarantees j is never negative.
At each midpoint i, compute j = half - i and then evaluate Condition 2:
| Situation | What it means | Action |
|---|---|---|
maxLeft1 <= minRight2 AND maxLeft2 <= minRight1 | Valid partition found | Compute and return median |
maxLeft1 > minRight2 | i took too many elements from nums1; left side of nums1 is too large | Move i left: hi = i - 1 |
maxLeft2 > minRight1 | i took too few elements from nums1; left side of nums2 is too large | Move i right: lo = i + 1 |
The binary search terminates because:
- We always move
loorhistrictly toward each other. - A valid partition is guaranteed to exist (the problem guarantees at least one element total).
Edge Cases — The Sentinel Values
When i = 0, there are no elements taken from nums1's left. nums1[i-1] would be an out-of-bounds access. Similarly, when i = m, there are no elements in nums1's right.
The clean fix: use sentinel values (conceptual negative and positive infinity):
maxLeft1 = nums1[i-1] if i > 0 else -infinity
minRight1 = nums1[i] if i < m else +infinity
maxLeft2 = nums2[j-1] if j > 0 else -infinity
minRight2 = nums2[j] if j < n else +infinity
Sentinel logic:
-infinityfor a missing left element: it will never violatemaxLeft <= minRightsince no real value is smaller.+infinityfor a missing right element: it will never violatemaxLeft <= minRightsince no real value is larger.
This eliminates four separate if checks inside the validity test, keeping the core logic clean.
The Median Formula
Once you have found the valid partition:
If total length is odd ((m + n) % 2 == 1):
median = max(maxLeft1, maxLeft2)
The left half has one more element than the right. The median is the largest element of the combined left portion.
If total length is even ((m + n) % 2 == 0):
median = (max(maxLeft1, maxLeft2) + min(minRight1, minRight2)) / 2.0
Both halves are equal in size. The median is the average of the largest-left and smallest-right.
The formula half = (m + n + 1) // 2 — with the +1 — ensures that for odd totals, the left half always has the extra element. This is intentional: max(maxLeft1, maxLeft2) then directly gives the median for the odd case without any additional logic.
Visual Dry Run
Example 1: nums1 = [1, 3], nums2 = [2]
m = 2, n = 1. Sincem > n, swap:A = [2](length 1),B = [1, 3](length 2).half = (1 + 2 + 1) // 2 = 2- Binary search:
lo = 0, hi = 1
Iteration 1: i = (0+1)//2 = 0, j = 2 - 0 = 2
maxLeft1 = A[-1] → -inf (i=0, no left in A)
minRight1 = A[0] = 2
maxLeft2 = B[1] = 3
minRight2 = B[2] → +inf (j=2=n, no right in B)
Check: maxLeft1(-inf) <= minRight2(+inf) ✓, maxLeft2(3) <= minRight1(2)? 3 > 2, FAIL. maxLeft2 > minRight1 → need more from A → lo = i + 1 = 1
Iteration 2: i = (1+1)//2 = 1, j = 2 - 1 = 1
maxLeft1 = A[0] = 2
minRight1 = A[1] → +inf (i=1=m, no right in A)
maxLeft2 = B[0] = 1
minRight2 = B[1] = 3
Check: maxLeft1(2) <= minRight2(3) ✓, maxLeft2(1) <= minRight1(+inf) ✓ — Valid partition!
Total is odd (3): median = max(2, 1) = 2.0 ✓
Example 2: nums1 = [1, 2], nums2 = [3, 4]
m = 2, n = 2. Arrays already same length. KeepA = [1,2],B = [3,4].half = (2 + 2 + 1) // 2 = 2- Binary search:
lo = 0, hi = 2
Iteration 1: i = (0+2)//2 = 1, j = 2 - 1 = 1
maxLeft1 = A[0] = 1
minRight1 = A[1] = 2
maxLeft2 = B[0] = 3
minRight2 = B[1] = 4
Check: maxLeft1(1) <= minRight2(4) ✓, maxLeft2(3) <= minRight1(2)? 3 > 2, FAIL. maxLeft2 > minRight1 → need more from A → lo = i + 1 = 2
Iteration 2: i = (2+2)//2 = 2, j = 2 - 2 = 0
maxLeft1 = A[1] = 2
minRight1 = A[2] → +inf (i=2=m)
maxLeft2 = B[-1] → -inf (j=0)
minRight2 = B[0] = 3
Check: maxLeft1(2) <= minRight2(3) ✓, maxLeft2(-inf) <= minRight1(+inf) ✓ — Valid partition!
Total is even (4): median = (max(2, -inf) + min(+inf, 3)) / 2 = (2 + 3) / 2 = 2.5 ✓
Common Mistakes
These five bugs appear in the majority of failed submissions. Each causes either a wrong answer or a runtime error — neither is obvious at first glance.
Mistake 1 — Binary searching the larger array. Always swap so you binary search the smaller array. If you binary search nums1 when m > n, then j = half - i can go negative (since half < m is possible), causing out-of-bounds access. The swap costs O(1) and prevents the entire class of bugs.
Mistake 2 — Using wrong sentinel values. In Python, float('-inf') and float('inf') are the correct sentinels. In JavaScript, use -Infinity and Infinity — not Number.MIN_SAFE_INTEGER / Number.MAX_SAFE_INTEGER. Using safe integer boundaries fails on edge cases where actual array values are close to those bounds (e.g., [-10^6] and [10^6]).
Mistake 3 — Wrong median formula for the odd case. A frequent mistake is returning min(minRight1, minRight2) for the odd case (taking from the right half) instead of max(maxLeft1, maxLeft2) (taking from the left). The formula half = (m + n + 1) // 2 biases the left half to be larger, so the median for odd totals always lives at the top of the left half. If you use half = (m + n) // 2, you get the opposite bias and the formula breaks.
Mistake 4 — Off-by-one in binary search bounds. The search range must be lo = 0, hi = m (inclusive on both ends), not lo = 0, hi = m - 1. Why? i = m is a valid partition — it means all elements of nums1 go to the left half, and none go to the right. Capping hi at m - 1 silently misses this case, producing wrong answers specifically when nums1's largest element is less than everything in nums2.
Mistake 5 — Confusing which condition to check for binary search adjustment. When maxLeft1 > minRight2: you took too many elements from nums1 — move i left (hi = i - 1). When maxLeft2 > minRight1: you took too few elements from nums1 — move i right (lo = i + 1). Swapping these two adjustments produces a solution that binary searches in the wrong direction and loops forever or produces wrong results.
Solutions
Python — Naive Merge Approach (O(m+n))
def findMedianSortedArrays_naive(nums1: list[int], nums2: list[int]) -> float:
"""
Merge both sorted arrays, then pick the middle element(s).
Time: O(m+n) Space: O(m+n)
Correct, but violates the O(log(m+n)) constraint — shown for comparison only.
"""
merged = []
i, j = 0, 0
# Standard merge-sort merge step
while i < len(nums1) and j < len(nums2):
if nums1[i] <= nums2[j]:
merged.append(nums1[i]); i += 1
else:
merged.append(nums2[j]); j += 1
# Append any remaining elements
merged.extend(nums1[i:])
merged.extend(nums2[j:])
total = len(merged)
mid = total // 2
if total % 2 == 1:
return float(merged[mid]) # odd: single middle element
return (merged[mid - 1] + merged[mid]) / 2.0 # even: average of two middles
Python — Optimal Binary Search on Partition (O(log(min(m,n))))
def findMedianSortedArrays(nums1: list[int], nums2: list[int]) -> float:
"""
Binary search on the partition point of the smaller array.
Time: O(log(min(m, n))) Space: O(1)
"""
A, B = nums1, nums2
# Always binary search the smaller array so j = half - i is never negative
if len(A) > len(B):
A, B = B, A
m, n = len(A), len(B)
# half = size of the combined left half (left biased for odd totals)
half = (m + n + 1) // 2
lo, hi = 0, m # i can be anywhere from 0 (take nothing from A) to m (take all of A)
while lo <= hi:
i = (lo + hi) // 2 # elements taken from A into the left half
j = half - i # elements taken from B into the left half (auto-determined)
# Boundary sentinels: -inf means "no left element exists", +inf means "no right element exists"
maxLeft1 = A[i - 1] if i > 0 else float('-inf')
minRight1 = A[i] if i < m else float('inf')
maxLeft2 = B[j - 1] if j > 0 else float('-inf')
minRight2 = B[j] if j < n else float('inf')
if maxLeft1 <= minRight2 and maxLeft2 <= minRight1:
# Valid partition found — compute median
if (m + n) % 2 == 1:
# Odd total: median is the largest element of the combined left half
return float(max(maxLeft1, maxLeft2))
else:
# Even total: average of the boundary elements across the partition
return (max(maxLeft1, maxLeft2) + min(minRight1, minRight2)) / 2.0
elif maxLeft1 > minRight2:
# A's left is too large — move partition left in A
hi = i - 1
else:
# B's left is too large — move partition right in A (takes more from A)
lo = i + 1
# Should never reach here given valid inputs
return 0.0
JavaScript — Naive Merge Approach (O(m+n))
/**
* Naive: merge both arrays and pick the middle element(s).
* Time: O(m+n) Space: O(m+n)
* Shown for comparison — violates the problem's time constraint.
*/
function findMedianSortedArrays_naive(nums1, nums2) {
const merged = [];
let i = 0, j = 0;
while (i < nums1.length && j < nums2.length) {
if (nums1[i] <= nums2[j]) merged.push(nums1[i++]);
else merged.push(nums2[j++]);
}
while (i < nums1.length) merged.push(nums1[i++]);
while (j < nums2.length) merged.push(nums2[j++]);
const total = merged.length;
const mid = Math.floor(total / 2);
if (total % 2 === 1) return merged[mid];
return (merged[mid - 1] + merged[mid]) / 2;
}
JavaScript — Optimal Binary Search on Partition (O(log(min(m,n))))
/**
* Binary search on partition point of the smaller array.
* Time: O(log(min(m, n))) Space: O(1)
*
* @param {number[]} nums1
* @param {number[]} nums2
* @return {number}
*/
function findMedianSortedArrays(nums1, nums2) {
let A = nums1, B = nums2;
// Ensure we binary search the smaller array
if (A.length > B.length) [A, B] = [B, A];
const m = A.length, n = B.length;
// Left half size — biased up by 1 for odd totals so median lives at left's max
const half = Math.floor((m + n + 1) / 2);
let lo = 0, hi = m;
while (lo <= hi) {
const i = Math.floor((lo + hi) / 2); // elements from A in the left half
const j = half - i; // elements from B in the left half
// Sentinel values handle partitions at array boundaries cleanly
const maxLeft1 = i > 0 ? A[i - 1] : -Infinity;
const minRight1 = i < m ? A[i] : Infinity;
const maxLeft2 = j > 0 ? B[j - 1] : -Infinity;
const minRight2 = j < n ? B[j] : Infinity;
if (maxLeft1 <= minRight2 && maxLeft2 <= minRight1) {
// Found the correct partition — compute the median
if ((m + n) % 2 === 1) {
// Odd total: largest element of the combined left half
return Math.max(maxLeft1, maxLeft2);
} else {
// Even total: average of the two boundary elements
return (Math.max(maxLeft1, maxLeft2) + Math.min(minRight1, minRight2)) / 2;
}
} else if (maxLeft1 > minRight2) {
// A's left portion is too large — move partition left in A
hi = i - 1;
} else {
// B's left portion is too large — move partition right in A
lo = i + 1;
}
}
return 0; // Unreachable for valid inputs
}
Complexity Analysis
| Approach | Time | Space | Notes |
|---|---|---|---|
| Naive merge | O(m + n) | O(m + n) | Violates the constraint |
| Binary search on partition | O(log(min(m, n))) | O(1) | Always binary search smaller array |
Why O(log(min(m,n))) and not O(log(m+n))?
The binary search range is [0, m] where m = min(len(nums1), len(nums2)). Each iteration halves this range. So the number of iterations is log2(m + 1), which is O(log(min(m,n))). The problem states O(log(m+n)) in the constraint, which is a looser bound — our solution is strictly better.
Space: The algorithm only stores a constant number of variables (lo, hi, i, j, four boundary values). No auxiliary arrays are allocated. O(1) space.
Follow-up Questions
These are the most common extensions asked in follow-up rounds at Google, Amazon, and Facebook after you solve the main problem:
1. Kth smallest element in two sorted arrays. This is a generalisation of the exact same problem. The median is just the (m+n)//2th (or (m+n+1)//2th) smallest element. You can solve the general Kth-smallest problem using a similar binary search approach: at each step, eliminate k//2 elements from consideration by comparing the k//2th element of each array. Time: O(log k).
2. Kth smallest element in a sorted matrix. Binary search on the value space (not index space): for a given candidate value mid, count how many elements in the matrix are <= mid using binary search on each row. Use this count to converge on the Kth smallest value. Time: O(n log(max - min)).
3. Weighted median. Each element has a weight. The weighted median is the value where the cumulative weight crosses 0.5. Binary search approach still applies but you sum weights during the partition check instead of counting elements.
4. Median of a data stream (LeetCode 295). If elements arrive one at a time, maintain a max-heap for the left half and a min-heap for the right half. Each insertion is O(log n). The median is always at the top of one or both heaps.
5. Find the kth smallest pair distance (LeetCode 719). Binary search on the answer (the distance value), then count pairs with distance <= mid. A direct application of the "binary search on abstract space" pattern introduced here.
This Pattern Solves
The "binary search on partition / binary search on answer" pattern generalises far beyond this problem. Once you recognise it here, you will start seeing it everywhere:
- Split Array Largest Sum (LeetCode 410): Binary search on the maximum sum of a subarray.
- Capacity to Ship Packages (LeetCode 1011): Binary search on the ship capacity value.
- Koko Eating Bananas (LeetCode 875): Binary search on the eating speed.
- Find Peak Element (LeetCode 162): Binary search on index where the "peak" condition flips.
- H-Index II (LeetCode 275): Binary search on the h value.
In all of these, the key move is the same: reformulate the problem as "find the smallest/largest value of X such that a condition C(X) holds", then binary search on X.
For Median of Two Sorted Arrays, X is the partition index i, and C(i) is maxLeft1 <= minRight2 AND maxLeft2 <= minRight1.
Key Takeaways
The core insight of this problem is that the median does not require you to see the merged array — it only requires you to find a partition boundary where all left elements are at most all right elements. Because one partition index fully determines the other (their sum must equal half the total), you have a single free variable to binary search over.
The sentinel values (-inf for missing left elements, +inf for missing right elements) are not a hack — they are the mathematically correct way to express "no constraint exists from this side," and they eliminate four separate boundary checks that would otherwise obscure the algorithm's logic.
If you take one thing from this problem: binary search is not just for finding a value in an array. It is for any problem where you can reformulate the answer as a threshold on a monotone function and check each candidate in sub-linear time.
Advertisement