Back to blog
Dec 29, 2024
3 min read

Finding Largest Elements from Multiple Sorted Arrays: An Elegant Heap Solution

Efficiently finding m largest elements across k sorted arrays using heap-based approach

We have k sorted arrays of different lengths, and need to find the m largest elements among all of them.

Let’s say we have arrays A₁, A₂, …, Aₖ, each sorted in ascending order. Together, they contain n elements in total. Our goal is to find the m largest elements across all these arrays efficiently.

The Naive Approach: Merge and Sort

The simplest solution might be to merge all arrays into one and sort it:

def find_m_largest_naive(arrays, m):
    # Merge all arrays into one
    merged = []
    for array in arrays:
        merged.extend(array)
    
    # Sort the merged array
    MERGE-SORT(merged, 1, len(merged))
    
    # Return last m elements
    return merged[-m:]

This approach has a time complexity of O(n log n), where n is the total number of elements. While straightforward, it ignores a crucial piece of information: our input arrays are already sorted!

The Optimal Solution: Leveraging a Max-Heap

We can do much better by taking advantage of the sorted nature of our input arrays. Here’s the key insight: the largest element must be the last element of one of our k arrays. After we take that element, the next largest must be either:

  1. The last element of one of the other arrays, or
  2. The second-to-last element from the array we just took from

This suggests using a max-heap where each node represents the current largest unconsidered element from one of our arrays. Here’s how we can implement this:

def find_m_largest_heap(arrays, k, sizes, m):
    # Initialize heap array and position trackers
    H = new Array(k)
    current_pos = new Array(k)
    
    # Start with last element from each array
    for i = 1 to k:
        if sizes[i] > 0:
            H[i] = arrays[i][sizes[i]]
            current_pos[i] = sizes[i]
        else:
            H[i] = -
            current_pos[i] = 0
    
    # Build initial max heap
    BUILD-MAX-HEAP(H, k)
    
    result = new Array(m)
    for i = 1 to m:
        # Extract maximum element
        result[i] = HEAP-EXTRACT-MAX(H, k)
        
        # Find source array of maximum (j)
        # Replace with next largest from array j
        if current_pos[j] > 1:
            current_pos[j] = current_pos[j] - 1
            H[1] = arrays[j][current_pos[j]]
        else:
            H[1] = -
        
        MAX-HEAPIFY(H, 1, k)
    
    return result

Understanding the Time Complexity

Let’s break down why this algorithm achieves O(k + m log k) time complexity:

  1. Initial heap building with k elements: O(k)
  2. m iterations, each involving:
    • Extracting maximum: O(log k)
    • Inserting next element: O(log k) Total per iteration: O(log k)