Back to blog
Dec 29, 2024
3 min read

Finding Top K: Efficient Algorithms for Partial Sorting

Finding k largest elements with O(n + k log k) complexity

Given a set of n numbers, we need to find and return the k largest elements in sorted order.

Approach 1: Sort Everything

The most intuitive approach might be to sort the entire array and take the k largest elements:

def find_k_largest_sort(A, n, k):
    # Sort the entire array
    MERGE-SORT(A, 1, n) # O(n * log n)
    
    # Return the k largest elements
    return A[n-k+1 ... n]

This approach has a time complexity of O(nlogn)O(n * log n), regardless of k. While simple to implement, it does more work than necessary when k is small compared to n.

Approach 2: The Heap Method

A more efficient approach uses a max-heap:

def find_k_largest_heap(A, n, k):
    # Build a max heap from the input array
    BUILD-MAX-HEAP(A, n) # O(n)
    
    result = new Array(k)
    for i = 1 to k: # O(k * log n)
        result[i] = HEAP-EXTRACT-MAX(A, n) # O(log n)
        n = n - 1
    
    return result

This gives us a time complexity of O(n+klogn)O(n + k * log n), which is better than the sorting approach when k is much smaller than n. However, it still has a log n factor that we can potentially avoid.

Approach 3: The Optimal Solution

We can do even better using a modified quickselect approach:

def find_k_largest_optimal(A, n, k):
    # Find the (n-k+1)th smallest element as partition point
    pivot_value = R-SELECT(A, 1, n, n-k+1) # O(n)
    
    # Partition around this element
    q = R-L-PARTITION(A, 1, n) # O(n)
    
    # Sort only the k largest elements
    MERGE-SORT(A, q+1, n) # O(k * log k)
    
    return A[n-k+1 ... n]

This algorithm achieves an expected time complexity of O(n+klogk)O(n + k * log k), which is optimal when k<n/lognk < n / log n. It combines the best of both worlds, avoiding the full sort while minimizing the heap overhead.

Understanding the Optimal Solution

  1. First, we use randomized selection to find the exact element that would be at position n-k+1 in the sorted array. This takes O(n) expected time.

    • Why n-k+1? Because we want to partition the array where the right side contains the k largest elements.
  2. Next, we partition the array around this element, ensuring all larger elements are to its right. This also takes O(n) time.

  3. Finally, we only sort the k largest elements, taking O(k log k) time.