Given a set of n numbers, we need to find and return the k largest elements in sorted order.
Approach 1: Sort Everything
The most intuitive approach might be to sort the entire array and take the k largest elements:
def find_k_largest_sort(A, n, k):
# Sort the entire array
MERGE-SORT(A, 1, n) # O(n * log n)
# Return the k largest elements
return A[n-k+1 ... n]
This approach has a time complexity of , regardless of k. While simple to implement, it does more work than necessary when k is small compared to n.
Approach 2: The Heap Method
A more efficient approach uses a max-heap:
def find_k_largest_heap(A, n, k):
# Build a max heap from the input array
BUILD-MAX-HEAP(A, n) # O(n)
result = new Array(k)
for i = 1 to k: # O(k * log n)
result[i] = HEAP-EXTRACT-MAX(A, n) # O(log n)
n = n - 1
return result
This gives us a time complexity of , which is better than the sorting approach when k is much smaller than n. However, it still has a log n factor that we can potentially avoid.
Approach 3: The Optimal Solution
We can do even better using a modified quickselect approach:
def find_k_largest_optimal(A, n, k):
# Find the (n-k+1)th smallest element as partition point
pivot_value = R-SELECT(A, 1, n, n-k+1) # O(n)
# Partition around this element
q = R-L-PARTITION(A, 1, n) # O(n)
# Sort only the k largest elements
MERGE-SORT(A, q+1, n) # O(k * log k)
return A[n-k+1 ... n]
This algorithm achieves an expected time complexity of , which is optimal when . It combines the best of both worlds, avoiding the full sort while minimizing the heap overhead.
Understanding the Optimal Solution
-
First, we use randomized selection to find the exact element that would be at position n-k+1 in the sorted array. This takes O(n) expected time.
- Why n-k+1? Because we want to partition the array where the right side contains the k largest elements.
-
Next, we partition the array around this element, ensuring all larger elements are to its right. This also takes O(n) time.
-
Finally, we only sort the k largest elements, taking O(k log k) time.