Top K Frequent Elements — Three Approaches from O(n log n) to O(n) [Google, Amazon, LinkedIn]

Sanjeev SharmaSanjeev Sharma
23 min read

Advertisement

Problem Statement

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Follow-up: Your algorithm's time complexity must be better than O(n log n), where n is the array's size.

Example 1:

Input:  nums = [1, 1, 1, 2, 2, 3],  k = 2
Output: [1, 2]
Reason: 1 appears 3 times, 2 appears 2 times, 3 appears 1 time.
        The 2 most frequent elements are 1 and 2.

Example 2:

Input:  nums = [1],  k = 1
Output: [1]
Reason: Only one element, and we want the top 1.

Constraints:

  • 1 <= nums.length <= 10^5
  • -10^4 <= nums[i] <= 10^4
  • k is in the range [1, number of unique elements in nums]
  • It is guaranteed that the answer is unique

Why This Problem Matters

LeetCode 347 is one of the most instructive problems in the entire FAANG preparation curriculum — not because it is hard to solve, but because it teaches you to think about three fundamentally different algorithmic strategies for the same problem, each with a distinct time/space trade-off.

Where it appears: Google, Amazon, LinkedIn, Meta, and Microsoft all include this problem or direct variants in their coding rounds. LinkedIn is particularly fond of it because frequency ranking is central to their feed ranking and "People You May Know" features. Amazon asks it in L5/L6 system design interviews as a jumping-off point for streaming top-k problems (imagine: find the top k most searched products in a sliding 24-hour window). Google uses it to probe whether candidates understand when a general-purpose sort is overkill.

The hint hiding in plain sight: The problem explicitly says your algorithm must be "better than O(n log n)." This is not a polite suggestion — it is a direct clue that the naive approach (sort by frequency, take first k) is the wrong answer. The follow-up constraint is telling you: there exists a smarter structure you should find. That structure is either a heap or bucket sort, depending on what you optimize for.

The pattern it teaches: "Top K" is one of the most universal algorithmic patterns. Once you internalize it here, you can immediately recognize it in:

  • Top K frequent words (LC 692)
  • K closest points to origin (LC 973)
  • Kth largest element in a stream (LC 703)
  • Find K pairs with smallest sums (LC 373)
  • Merge K sorted lists (LC 23)
  • Any "find the best K out of N" problem

The frequency counting + selection pattern (count with a hash map, then select with a heap or sort) appears in recommendation engines, analytics dashboards, log aggregation pipelines, and search autocomplete systems. This is not a toy problem.


Three Approaches Built From Scratch

Let us build intuition for each approach before touching code, because understanding why each approach works is more valuable than the code itself.

Approach 1 — Sort by Frequency: O(n log n)

The first instinct is correct in logic, just not in complexity.

The idea: Count how many times each element appears (using a hash map), then sort those elements by their frequency in descending order, then return the first k elements.

Step 1: Count frequencies
  nums = [1,1,1,2,2,3]
  freq = {1: 3, 2: 2, 3: 1}

Step 2: Sort unique elements by frequency (descending)
  sorted: [(1, 3), (2, 2), (3, 1)]

Step 3: Return first k keys
  k=2[1, 2]

Why it works: Frequencies are correctly counted, and sorting puts the most frequent at the front. Straightforward and correct.

Why it fails the follow-up: Sorting n unique elements takes O(n log n). Even though we only want k results, we pay the full sorting cost. When n is large and k is small (say, k=3 but n=100,000), we are doing enormous unnecessary work. The problem explicitly forbids this: your algorithm must beat O(n log n).

When would you use it anyway? If k is very close to n (nearly all elements), or if the input is already partially sorted, or if you are in a time-pressured interview and want to establish correctness before optimizing.


Approach 2 — Min-Heap of Size k: O(n log k)

This is the approach most interviewers consider the "standard" answer. It is better than sort, and it works elegantly regardless of the relationship between n and k.

The key insight: We do not need to sort all n unique elements. We only need to find the top k. A heap of size k lets us do exactly that: maintain a window of the k best candidates, evicting the worst one whenever a new challenger arrives.

Why a min-heap, not a max-heap? Counter-intuitive at first. Here is the reasoning:

  • We want to keep the k highest frequency elements.
  • A min-heap of size k keeps the k highest by always exposing the smallest of those k at the top.
  • When a new element comes in with frequency f:
    • If f > heap.top (the current minimum of our top-k), the newcomer beats our weakest candidate. Evict the minimum, push the newcomer.
    • If f <= heap.top, the newcomer cannot crack the top k. Ignore it.
  • After processing all unique elements, the heap contains exactly the top k.

A max-heap would not help here — it would expose the best element, but we cannot efficiently check whether a new arrival beats the worst of our current k candidates.

Heap invariant: At all times, the min-heap contains at most k elements, and every element in the heap has a frequency >= every element NOT in the heap. This invariant is maintained by the "evict minimum" step.

Complexity breakdown:

  • Building the frequency map: O(n)
  • Processing m unique elements through the heap, where each push/pop is O(log k): O(m log k) — since m <= n, this is O(n log k)
  • Total: O(n log k)

When k is small (k << n), log k is much smaller than log n, so this is significantly better than O(n log n). When k = n, this degrades to O(n log n) — but that edge case is where bucket sort shines.


Approach 3 — Bucket Sort: O(n)

This is the clever answer. It uses a domain-specific property of the problem that the heap approach ignores entirely.

The key insight: What is the maximum possible frequency of any element in nums? It is n — an element can appear at most n times (if all elements are the same). This means frequencies are bounded integers in the range [1, n].

When your keys are bounded integers, you do not need comparison-based sorting. You can use counting/bucket sort, which runs in O(range) time rather than O(n log n).

The construction: Create an array buckets of size n+1, where buckets[f] is a list of all elements that appear exactly f times. The array index is the frequency — no comparison needed.

nums = [1,1,1,2,2,3], n=6
freq = {1:3, 2:2, 3:1}

buckets (index = frequency):
  buckets[0] = []   (no element appears 0 times — placeholder)
  buckets[1] = [3]  (element 3 appears 1 time)
  buckets[2] = [2]  (element 2 appears 2 times)
  buckets[3] = [1]  (element 1 appears 3 times)
  buckets[4] = []
  buckets[5] = []
  buckets[6] = []

The retrieval: Walk the buckets array from right to left (high frequency to low). Collect elements until you have k of them.

Scan from index 6 down:
  i=6: [] — skip
  i=5: [] — skip
  i=4: [] — skip
  i=3: [1] — collect 1. result=[1], have 1 of 2 needed
  i=2: [2] — collect 2. result=[1,2], have 2 of 2 needed → stop

Why this is O(n):

  • Building freq map: O(n)
  • Placing elements into buckets: O(m) where m = number of unique elements <= n
  • Scanning buckets from right: O(n) in the worst case (scanning all n+1 buckets)
  • Total: O(n)

No comparisons between frequencies are ever made — we use frequencies as direct array indices. This is the definition of counting sort: it sidesteps the O(log n) comparison overhead by exploiting the integer nature of the keys.


The Bucket Sort Insight — Why It Is Brilliant

Let us dwell on why the bucket sort insight is so elegant, because it exemplifies a broader algorithmic principle.

The general principle: When you know the range of your keys in advance, and that range is polynomial in n, you can always use bucket/counting sort to reduce an O(n log n) sorting step to O(n). The "sorting" becomes trivial — you place items by address rather than by comparison.

Why frequencies are the perfect key: In this problem, the thing we want to sort by (frequency) is naturally bounded by the size of the input (n). This is a tight, useful bound — not a loose one. It means our bucket array has exactly the right size: n+1 slots, each addressable in O(1).

The array index = frequency relationship: Once you recognize that frequency f maps to bucket index f, the algorithm almost writes itself:

  • For each element x with frequency f: buckets[f].append(x) — O(1)
  • The "sorted order" by frequency is already encoded in the array indices — higher index = higher frequency
  • No comparison-based sort ever runs

The boundary case: Why size n+1 and not n? Frequencies range from 1 to n inclusive, so we need indices 1 through n. An array of size n+1 gives us valid indices 0 through n, and we simply ignore index 0. This is the "off-by-one" that trips up many implementations.

An important nuance — ties: Multiple elements can have the same frequency. In bucket sort, they all land in the same bucket list. When we collect results, we collect entire bucket lists. The problem says any order is acceptable, so ties within a bucket do not matter for correctness. However, if the problem asked for lexicographic ordering among ties (like LC 692 — Top K Frequent Words), we would need to sort within each bucket, adding O(b log b) per bucket where b is the bucket size.


Visual Dry Run — All 3 Approaches on [1,1,1,2,2,3], k=2

Approach 1: Sort by Frequency

Input: [1,1,1,2,2,3], k=2

Step 1Build frequency map (one pass):
  Process 1: freq={1:1}
  Process 1: freq={1:2}
  Process 1: freq={1:3}
  Process 2: freq={1:3, 2:1}
  Process 2: freq={1:3, 2:2}
  Process 3: freq={1:3, 2:2, 3:1}

Step 2Sort unique elements by frequency descending:
  Before sort: [(1,3), (2,2), (3,1)]   ← already sorted here by luck
  After sort:  [(1,3), (2,2), (3,1)]

Step 3Take first k=2:
  Result: [1, 2]

Cost: O(n) for counting + O(m log m) for sorting unique elements = O(n log n) total.


Approach 2: Min-Heap of Size k

Input: [1,1,1,2,2,3], k=2
freq = {1:3, 2:2, 3:1}

Process unique elements through min-heap (heap stores [freq, element]):

  Process (freq=3, elem=1):
    heap = [(3, 1)]        size=1 <= k=2, no eviction

  Process (freq=2, elem=2):
    heap = [(2, 2), (3, 1)]  size=2 <= k=2, no eviction
    (min-heap: smallest freq at top → top is (2,2))

  Process (freq=1, elem=3):
    heap size = 2 = k, must check:
    freq=1 <= heap.top.freq=23 cannot beat our weakest candidate
    OR: push then pop-smallest:
      push (1,3): heap=[(1,3),(2,2),(3,1)] → pop min → pop (1,3)
      heap = [(2,2),(3,1)]

  Final heap: [(2,2), (3,1)] → elements are 2 and 1
  Result: [2, 1]   (same as [1, 2] in any order)

Cost: O(n) for counting + O(m log k) for heap operations = O(n log k).


Approach 3: Bucket Sort

Input: [1,1,1,2,2,3], n=6, k=2
freq = {1:3, 2:2, 3:1}

Step 1Create buckets of size n+1 = 7:
  buckets = [[], [], [], [], [], [], []]
   index:    0   1   2   3   4   5   6

Step 2Place each element in its frequency bucket:
  elem=1, freq=3: buckets[3].append(1)  → buckets[3]=[1]
  elem=2, freq=2: buckets[2].append(2)  → buckets[2]=[2]
  elem=3, freq=1: buckets[1].append(3)  → buckets[1]=[3]

  buckets = [[], [3], [2], [1], [], [], []]
   index:    0    1    2    3   4   5   6

Step 3Scan from right, collect until we have k=2:
  i=6: buckets[6]=[] → skip
  i=5: buckets[5]=[] → skip
  i=4: buckets[4]=[] → skip
  i=3: buckets[3]=[1] → collect 1 → result=[1], need 1 more
  i=2: buckets[2]=[2] → collect 2 → result=[1,2], need 0 more → STOP

Result: [1, 2]

Cost: O(n) for counting + O(n) for bucket placement + O(n) for scan = O(n) total.


Common Mistakes

Mistake 1: Using a Max-Heap Instead of Min-Heap

This is the most common error. Candidates think "I want the maximum frequencies, so I use a max-heap." But a max-heap of size n gives you O(n log n) — no better than sorting. The reason we want a min-heap of size k is because we need to efficiently identify and evict the weakest element among our current top-k candidates. The min-heap exposes the minimum of the current k elements, so we can decide in O(1) whether a new arrival deserves a spot.

Mistake 2: Off-by-One in Bucket Size

Creating buckets of size n instead of n+1 causes an IndexError when an element appears n times (e.g., nums = [1, 1, 1], k = 1 — element 1 has frequency 3, and you need buckets[3], which requires size at least 4 = n+1). Always use n+1 to accommodate frequencies from 1 to n inclusive.

Mistake 3: Mishandling Ties in the Heap

In the heap approach, when two elements have the same frequency, the heap comparison falls through to comparing the elements themselves. In Python, this is fine for integers. But for strings or custom objects, you need an explicit comparison key or a tiebreaker. Also, some implementations forget that the heap might have extra elements — always extract exactly k elements, not "everything in the heap."

Mistake 4: Returning the Heap Instead of Its Contents

After building the heap of size k, some candidates return the heap data structure directly. In Python, heap is a list but in min-heap order, not sorted order. Extract elements properly: return [elem for freq, elem in heap] — the elements do not need to be in any particular order, but you must extract them from the (freq, elem) tuples.

Mistake 5: Forgetting the Frequency Map

A surprising number of candidates jump straight to a heap or sort without building a frequency map first. You cannot find the top-k frequent elements without first knowing the frequency of each element. Always spend O(n) to build freq = Counter(nums) or equivalent before applying any selection strategy.


Solutions

Approach 1: Sort by Frequency

Python:

from typing import List
from collections import Counter

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # Step 1: Count frequency of each element — O(n)
        freq = Counter(nums)

        # Step 2: Sort unique elements by frequency descending — O(m log m)
        # m = number of unique elements <= n, so this is O(n log n)
        sorted_elems = sorted(freq.keys(), key=lambda x: -freq[x])

        # Step 3: Return the top k elements
        return sorted_elems[:k]

JavaScript:

/**
 * @param {number[]} nums
 * @param {number} k
 * @return {number[]}
 */
function topKFrequent(nums, k) {
    // Step 1: Count frequency of each element — O(n)
    const freq = new Map();
    for (const n of nums) {
        freq.set(n, (freq.get(n) || 0) + 1);
    }

    // Step 2: Sort unique elements by frequency descending — O(m log m)
    const sortedElems = [...freq.keys()].sort((a, b) => freq.get(b) - freq.get(a));

    // Step 3: Return the top k elements
    return sortedElems.slice(0, k);
}

Complexity: O(n log n) time, O(n) space. Fails the follow-up constraint but useful as a baseline.


Approach 2: Min-Heap of Size k

Python:

from typing import List
from collections import Counter
import heapq

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # Step 1: Count frequency of each element — O(n)
        freq = Counter(nums)

        # Step 2: Use a min-heap of size k to track top-k frequent elements
        # Heap stores (frequency, element) tuples — Python heapq is a min-heap
        heap = []

        for elem, count in freq.items():
            # Push the new element onto the heap
            heapq.heappush(heap, (count, elem))

            # If heap exceeds size k, evict the element with lowest frequency
            # This maintains the invariant: heap always holds top-k candidates
            if len(heap) > k:
                heapq.heappop(heap)  # removes the minimum (least frequent)

        # Extract elements from the (frequency, element) tuples
        # Order is not guaranteed, but problem allows any order
        return [elem for count, elem in heap]

JavaScript:

/**
 * JavaScript has no built-in heap — we implement a simple min-heap class.
 * For interviews, it's acceptable to mention you'd use a library (e.g., 'heap' npm package).
 * @param {number[]} nums
 * @param {number} k
 * @return {number[]}
 */
function topKFrequent(nums, k) {
    // Step 1: Count frequencies — O(n)
    const freq = new Map();
    for (const n of nums) {
        freq.set(n, (freq.get(n) || 0) + 1);
    }

    // Step 2: Simulate a min-heap of size k using a sorted array
    // In a real interview, use a proper heap library for O(log k) push/pop
    // This simulation is O(k) per insertion — acceptable for explaining the concept
    const heap = []; // stores [count, elem], min by count at index 0

    function heapPush(item) {
        heap.push(item);
        // Bubble up to maintain min-heap property
        let i = heap.length - 1;
        while (i > 0) {
            const parent = Math.floor((i - 1) / 2);
            if (heap[parent][0] > heap[i][0]) {
                [heap[parent], heap[i]] = [heap[i], heap[parent]];
                i = parent;
            } else break;
        }
    }

    function heapPop() {
        // Replace root with last element, then bubble down
        heap[0] = heap.pop();
        let i = 0;
        while (true) {
            const left = 2 * i + 1, right = 2 * i + 2;
            let smallest = i;
            if (left < heap.length && heap[left][0] < heap[smallest][0]) smallest = left;
            if (right < heap.length && heap[right][0] < heap[smallest][0]) smallest = right;
            if (smallest === i) break;
            [heap[i], heap[smallest]] = [heap[smallest], heap[i]];
            i = smallest;
        }
    }

    for (const [elem, count] of freq) {
        heapPush([count, elem]);
        if (heap.length > k) heapPop(); // evict least frequent
    }

    // Return just the elements (discard frequency values)
    return heap.map(([count, elem]) => elem);
}

Complexity: O(n log k) time, O(n + k) space. Satisfies the follow-up constraint. Best when k << n.


Approach 3: Bucket Sort — True O(n)

Python:

from typing import List
from collections import Counter

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # Step 1: Count frequency of each element — O(n)
        freq = Counter(nums)

        # Step 2: Create buckets where index = frequency
        # Max possible frequency is n (all elements identical), so size = n+1
        # buckets[f] holds all elements that appear exactly f times
        n = len(nums)
        buckets: List[List[int]] = [[] for _ in range(n + 1)]

        for elem, count in freq.items():
            buckets[count].append(elem)
            # No sorting — we use the index itself as the frequency key

        # Step 3: Scan from highest frequency (index n) down to 1
        # Collect elements until we have k of them
        result = []
        for freq_idx in range(n, 0, -1):
            for elem in buckets[freq_idx]:
                result.append(elem)
                if len(result) == k:
                    return result  # early exit once we have k elements

        return result  # reached only if all buckets exhausted (edge case: k == total unique)

JavaScript:

/**
 * Bucket sort approach — O(n) time, O(n) space
 * @param {number[]} nums
 * @param {number} k
 * @return {number[]}
 */
function topKFrequent(nums, k) {
    // Step 1: Count frequencies — O(n)
    const freq = new Map();
    for (const n of nums) {
        freq.set(n, (freq.get(n) || 0) + 1);
    }

    // Step 2: Create n+1 buckets (index = frequency value)
    // Frequency can range from 1 to n, so we need indices 0..n
    const n = nums.length;
    const buckets = Array.from({ length: n + 1 }, () => []);

    for (const [elem, count] of freq) {
        buckets[count].push(elem);  // place element in its frequency bucket
    }

    // Step 3: Scan from right (highest frequency) and collect top k
    const result = [];
    for (let i = n; i >= 1 && result.length < k; i--) {
        for (const elem of buckets[i]) {
            result.push(elem);
            if (result.length === k) return result; // early exit
        }
    }

    return result;
}

Complexity: O(n) time, O(n) space. Optimal. Best when k is large or when you need guaranteed linear time.


Complexity Analysis

ApproachTimeSpaceWhen to Use
Sort by frequencyO(n log n)O(n)Never in interviews (violates follow-up) — only for baseline explanation
Min-heap of size kO(n log k)O(n + k)When k << n; generalizes to streaming data
Bucket sortO(n)O(n)Best asymptotically; when frequencies are integers bounded by n

Detailed reasoning for min-heap: Building the frequency map is O(n). We then process m unique elements, where m <= n. Each heap push and pop is O(log k) since the heap size never exceeds k. Total: O(n + m log k) = O(n log k).

Detailed reasoning for bucket sort: Building the frequency map is O(n). Placing m unique elements into buckets is O(m) = O(n). Scanning n+1 buckets from right to left is O(n). Collecting k results during the scan is O(k). Total: O(n + n + n + k) = O(n). Space is O(n) for the frequency map plus O(n) for the bucket array plus O(k) for the result — all O(n).

The trade-off in practice: Bucket sort is asymptotically superior, but it has a larger constant factor (more memory allocations for the list-of-lists structure). For very small inputs or when k is tiny, the min-heap approach has better cache behavior. For production code at scale (n in the millions), bucket sort wins decisively.


Follow-up Questions

1. Top K Frequent Words (LC 692) — Medium

Same structure: count frequencies, then select top k. But with one crucial twist: if two words have the same frequency, the lexicographically smaller word ranks higher. This means:

  • Bucket sort breaks without modification: elements within the same frequency bucket are unordered. You must sort each bucket alphabetically, adding O(b log b) per bucket.
  • Min-heap needs a custom comparator: store (-freq, word) tuples so that higher frequency and lexicographically earlier words both "win" in a min-heap (less-than check).
  • Python's heapq handles string comparison natively: tuples compare lexicographically, so (-freq, word) works correctly out of the box.

The key lesson: bucket sort's O(n) advantage disappears when you need stable or lexicographic ordering within frequency groups.

2. K Closest Points to Origin (LC 973) — Medium

Structurally identical to Top K Frequent, but the "key" is Euclidean distance instead of frequency. The min-heap approach translates directly: maintain a max-heap of size k (or equivalently, a min-heap over negated distances) to track the k nearest points.

One important difference: distances are continuous floats, not bounded integers. Bucket sort cannot be applied directly — the frequency-bounded-by-n trick does not work for real-valued keys. This makes LC 973 a problem where the heap approach is near-optimal, and QuickSelect (O(n) average) is the true champion.

QuickSelect is worth knowing for LC 973: partition the array around a pivot distance, and recursively narrow to the partition containing k elements. O(n) average, O(n^2) worst case, O(1) extra space.

3. Streaming Top K Frequent Elements — System Design Variant

Amazon specifically asks this at L5/L6: "What if elements arrive in a continuous stream and you need to report the top k at any moment?"

Bucket sort breaks — you cannot afford to rebuild the entire bucket array on every new element. The min-heap of size k also becomes tricky because frequencies change dynamically.

The real answer involves a Count-Min Sketch (for approximate frequencies in memory-bounded streams) or a Lossy Counting algorithm (for exact top-k with bounded error). At minimum, mention:

  • Maintain a frequency hash map (updated per element: O(1))
  • Maintain a heap of size k (update the element's position when its frequency changes: O(log k))
  • A full heap rebuild happens at most when a new element first enters the top-k, giving amortized O(log k) per stream element

This demonstrates systems thinking beyond pure algorithm knowledge.

4. Kth Largest Element in an Array (LC 215) — Medium

Not about frequency — about value. But the same QuickSelect / heap pattern applies. If you understand LC 347, LC 215 is immediately recognizable as the same family.


This Pattern Solves

Problems that use the same frequency-count + top-k selection structure:

ProblemFrequency KeySelection Method
Top K Frequent Elements (LC 347)Element countMin-heap or bucket sort
Top K Frequent Words (LC 692)Word count (with ties)Min-heap with custom comparator
Sort Characters By Frequency (LC 451)Character countBucket sort or max-heap
K Closest Points to Origin (LC 973)DistanceMax-heap of size k
Kth Largest Element in Stream (LC 703)Element valueMin-heap of size k
Find K Pairs with Smallest Sums (LC 373)Sum of pairMin-heap
Reorganize String (LC 767)Character countMax-heap (greedy placement)
Task Scheduler (LC 621)Task frequencyGreedy with frequency counts

The meta-pattern: count or compute a key, then select the top/bottom k by that key using a heap of size k or a bounded bucket array.


Key Takeaway

LeetCode 347 is not about memorizing the bucket sort trick — it is about developing the habit of asking "what is special about the keys I am sorting by?" When the answer is "they are integers bounded by the input size," you have a bucket sort opportunity, and an O(n) solution where a naive sort would give O(n log n).

The min-heap of size k is the Swiss Army knife: it works for any comparable key (integers, floats, strings, custom objects), generalizes to streaming data, and gives O(n log k) — a meaningful improvement over O(n log n) whenever k << n. Know both, explain the trade-off clearly, and you have demonstrated exactly the algorithmic maturity FAANG interviewers look for.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro