We have k sorted arrays of different lengths, and need to find the m largest elements among all of them.
Let’s say we have arrays A₁, A₂, …, Aₖ, each sorted in ascending order. Together, they contain n elements in total. Our goal is to find the m largest elements across all these arrays efficiently.
The Naive Approach: Merge and Sort
The simplest solution might be to merge all arrays into one and sort it:
def find_m_largest_naive(arrays, m):
# Merge all arrays into one
merged = []
for array in arrays:
merged.extend(array)
# Sort the merged array
MERGE-SORT(merged, 1, len(merged))
# Return last m elements
return merged[-m:]
This approach has a time complexity of O(n log n), where n is the total number of elements. While straightforward, it ignores a crucial piece of information: our input arrays are already sorted!
The Optimal Solution: Leveraging a Max-Heap
We can do much better by taking advantage of the sorted nature of our input arrays. Here’s the key insight: the largest element must be the last element of one of our k arrays. After we take that element, the next largest must be either:
- The last element of one of the other arrays, or
- The second-to-last element from the array we just took from
This suggests using a max-heap where each node represents the current largest unconsidered element from one of our arrays. Here’s how we can implement this:
def find_m_largest_heap(arrays, k, sizes, m):
# Initialize heap array and position trackers
H = new Array(k)
current_pos = new Array(k)
# Start with last element from each array
for i = 1 to k:
if sizes[i] > 0:
H[i] = arrays[i][sizes[i]]
current_pos[i] = sizes[i]
else:
H[i] = -∞
current_pos[i] = 0
# Build initial max heap
BUILD-MAX-HEAP(H, k)
result = new Array(m)
for i = 1 to m:
# Extract maximum element
result[i] = HEAP-EXTRACT-MAX(H, k)
# Find source array of maximum (j)
# Replace with next largest from array j
if current_pos[j] > 1:
current_pos[j] = current_pos[j] - 1
H[1] = arrays[j][current_pos[j]]
else:
H[1] = -∞
MAX-HEAPIFY(H, 1, k)
return result
Understanding the Time Complexity
Let’s break down why this algorithm achieves O(k + m log k) time complexity:
- Initial heap building with k elements: O(k)
- m iterations, each involving:
- Extracting maximum: O(log k)
- Inserting next element: O(log k) Total per iteration: O(log k)