Longest Consecutive Sequence [Medium] — HashSet O(n) [Google, Amazon, Facebook]

Sanjeev SharmaSanjeev Sharma
13 min read

Advertisement

Problem Statement

Given an unsorted array of integers nums, return the length of the longest consecutive elements sequence. You must write an algorithm that runs in O(n) time.

Example 1

Input:  nums = [100, 4, 200, 1, 3, 2]
Output: 4
Explanation: The longest consecutive sequence is [1, 2, 3, 4].

Example 2

Input:  nums = [0, 3, 7, 2, 5, 8, 4, 6, 0, 1]
Output: 9
Explanation: The longest consecutive sequence is [0, 1, 2, 3, 4, 5, 6, 7, 8].

Constraints: 0 <= nums.length <= 10^5, -10^9 <= nums[i] <= 10^9

Why This Problem Matters

This problem is a staple at Google, Amazon, and Meta because it tests something subtle: the ability to recognize that the "obvious" approach — sorting the array and then scanning — technically works but violates the constraint. The problem explicitly demands O(n) time, which rules out every comparison-based sorting algorithm.

What makes it genuinely interesting is that the optimal solution uses a nested loop structure that looks like O(n²) at first glance. Realizing why it is actually O(n) requires understanding amortized analysis — a concept that separates candidates who memorize solutions from those who truly understand algorithmic complexity.

The lesson generalizes: many hard-looking problems become linear once you identify the right "skip condition" that prevents redundant work. Longest Consecutive Sequence is the clearest example of this class of problem you will encounter in interviews.


The Trap: Sorting Approach

The instinctive approach is to sort the array first. Once sorted, you scan left to right, extending a running count whenever the next element equals the current element plus one, and resetting whenever there is a gap (handling duplicates by skipping equal neighbors). This produces the correct answer and takes O(n) time for the scan — but O(n log n) overall because of the sort.

Sort: [100, 4, 200, 1, 3, 2][1, 2, 3, 4, 100, 200]
Scan: 1→2 (extend), 2→3 (extend), 3→4 (extend), 4→100 (gap, reset), 100→200 (gap)
Result: 4... but sorting cost O(n log n)

The sorting approach is perfectly correct for interviews where the O(n) constraint is not enforced, but LeetCode 128 explicitly requires O(n). Proposing sorting and then admitting you cannot do better is a common way to miss the problem.

Why can we do better? Because sorting is doing unnecessary work — it is establishing a total order on all elements, but we only need to know whether specific individual numbers exist. A HashSet gives us that membership check in O(1).


The O(n) Insight

The key observation is this:

Only start counting a consecutive sequence when num - 1 is NOT in the set.

Think about why. If num - 1 exists, then num is not the beginning of a sequence — it is a middle or end element. If we started counting from num, we would be measuring a sequence that already started at num - 1 (or earlier). We would be doing redundant work and potentially double-counting.

By enforcing the start condition (num - 1 not in set), we guarantee:

  • Every sequence is counted exactly once, starting from its leftmost element.
  • Non-starting elements are encountered inside the inner while loop of their sequence's start — they are never the trigger for a new count.

This single condition is the entire algorithm. Everything else — building the set, the outer loop, the inner while — is mechanical.

The algorithm in plain English:

  1. Put all numbers in a HashSet for O(1) lookup.
  2. Iterate over every number in the set.
  3. If num - 1 is in the set, skip this number (it is not a sequence start).
  4. If num - 1 is not in the set, count forward: check num+1, num+2, ... until the chain breaks.
  5. Track the maximum length seen.

Why It's Still O(n) Despite the Inner While Loop

At first glance, the nested loop structure looks like it could be O(n²): outer loop over n elements, inner while loop that could run many times for each element. This is the same surface structure as a naive O(n²) algorithm.

The critical difference is amortized analysis.

Consider what the inner while loop actually does: it counts elements num+1, num+2, num+3, ... until the chain breaks. Each of those elements will be visited by the inner loop exactly once across the entire execution of the algorithm — because they all have a predecessor in the set, so the outer loop will skip them (they will never be sequence starts themselves).

Put more formally:

  • Each number is inserted into the set exactly once: O(n) total.
  • Each number is the starting element of at most one sequence start check.
  • Each number is visited by an inner while loop at most once (as part of exactly one sequence's expansion).
  • Therefore, the total number of inner while loop iterations across all outer loop iterations is bounded by n.
  • Total work: O(n) outer loop iterations + O(n) total inner loop iterations = O(n).

This is exactly amortized O(n). Each individual outer iteration might trigger many inner iterations (if there is a long sequence starting there), but the budget for those inner iterations is shared across the entire input. The total cost is linear because no single number can be "charged" more than a constant number of times.

Think of it like a bank account: every number deposited into the set earns exactly one credit that will be spent either by being skipped in the outer loop, or by being consumed in one inner loop iteration. The total credits spent equals the total deposited, which is n.


Visual Dry Run

Input: [100, 4, 200, 1, 3, 2]

Step 1 — Build the HashSet:

set = {100, 4, 200, 1, 3, 2}

Step 2 — Identify sequence starts (where num - 1 is NOT in the set):

numnum-1 in set?Is a sequence start?
10099 → NOYES
43 → YESno
200199 → NOYES
10 → NOYES
32 → YESno
21 → YESno

Step 3 — Count from each sequence start:

Start at 100:
  Check 101 → not in set. Length = 1.

Start at 200:
  Check 201 → not in set. Length = 1.

Start at 1:
  Check 2IN SET. cur = 2, length = 2.
  Check 3IN SET. cur = 3, length = 3.
  Check 4IN SET. cur = 4, length = 4.
  Check 5 → not in set. Done. Length = 4.

Result: max(1, 1, 4) = 4

Notice that elements 2, 3, and 4 were never outer-loop starts — they were consumed entirely inside the inner loop for the start at 1. This is exactly the amortized savings at work.


Common Mistakes

1. Sorting instead of using a HashSet Sorting is the most common first attempt. It produces correct output but runs in O(n log n), violating the problem constraint. Always ask yourself: "Do I need an ordered structure, or just membership testing?" Here, membership is enough — use a set.

2. Starting a count from every element If you skip the start condition check (num - 1 not in set) and count forward from every single element, you recount sequences from their midpoints. This degrades to O(n²) in the worst case (e.g., input [1, 2, 3, ..., n] would result in n + (n-1) + ... + 1 = O(n²) inner iterations).

3. Not building the full set before iterating Some implementations try to do everything in one pass, checking and inserting at the same time. This fails because when processing an early element, later elements that belong to the same sequence may not be in the set yet.

4. Iterating over nums instead of the set If there are duplicates in nums (e.g., [1, 1, 1, 2, 2]), iterating over the original array causes duplicate sequence starts. Iterating over the set (which deduplicates automatically) avoids this. Python's for n in num_set handles this correctly; for n in nums with a set check for n-1 is safer only if you handle duplicates explicitly.


Solutions

Python — Sorting Approach (O(n log n), shown for contrast)

def longestConsecutive(nums: list[int]) -> int:
    if not nums:
        return 0

    nums.sort()
    best = 1
    current_streak = 1

    for i in range(1, len(nums)):
        if nums[i] == nums[i - 1]:
            continue              # skip duplicates
        elif nums[i] == nums[i - 1] + 1:
            current_streak += 1  # extend the current consecutive run
            best = max(best, current_streak)
        else:
            current_streak = 1   # gap found, reset

    return best
# Time: O(n log n) — violates the O(n) requirement
# Space: O(1) or O(n) depending on sort implementation

Python — Optimal HashSet Approach (O(n))

def longestConsecutive(nums: list[int]) -> int:
    # Step 1: build the set for O(1) membership checks
    num_set = set(nums)
    best = 0

    # Step 2: only count from sequence starts
    for n in num_set:
        # num - 1 in set means n is NOT the start of a sequence
        if n - 1 not in num_set:
            current = n
            length = 1

            # Extend the sequence as far as it goes
            while current + 1 in num_set:
                current += 1
                length += 1

            best = max(best, length)

    return best
# Time: O(n) amortized — each element visited at most twice total
# Space: O(n) for the set

JavaScript — Sorting Approach (O(n log n), shown for contrast)

var longestConsecutive = function(nums) {
    if (nums.length === 0) return 0;

    nums.sort((a, b) => a - b);
    let best = 1, streak = 1;

    for (let i = 1; i < nums.length; i++) {
        if (nums[i] === nums[i - 1]) continue;       // skip duplicates
        if (nums[i] === nums[i - 1] + 1) {
            streak++;                                  // consecutive — extend
            best = Math.max(best, streak);
        } else {
            streak = 1;                               // gap — reset
        }
    }
    return best;
    // Time: O(n log n)  Space: O(1)
};

JavaScript — Optimal HashSet Approach (O(n))

var longestConsecutive = function(nums) {
    // Build a Set for O(1) average-case lookup
    const numSet = new Set(nums);
    let best = 0;

    for (const n of numSet) {
        // Only begin counting if n is the start of its sequence
        if (!numSet.has(n - 1)) {
            let current = n;
            let length = 1;

            // Walk forward through the consecutive chain
            while (numSet.has(current + 1)) {
                current++;
                length++;
            }

            best = Math.max(best, length);
        }
    }

    return best;
    // Time: O(n) amortized  Space: O(n)
};

Complexity Analysis

ApproachTimeSpaceNotes
Brute force (check all pairs)O(n³)O(1)Triple nested loops
SortingO(n log n)O(1)–O(n)Violates the O(n) constraint
HashSet (optimal)O(n) amortizedO(n)Each element visited at most twice

Amortized O(n) — full argument:

Let n = len(nums). The set construction is O(n). In the main loop:

  • The outer for loop runs exactly |set| times, which is at most n.
  • Each element can be the subject of the n-1 not in set check at most once.
  • Each element can be visited by an inner while loop at most once total across all outer iterations, because only one sequence start "owns" each element.
  • Total inner while iterations across all outer iterations ≤ n.

So the overall operation count is O(n) + O(n) = O(n).

Space: The set holds at most n distinct elements, so space is O(n). The sorting approach avoids this with O(1) extra space (in-place sort), but at the cost of O(n log n) time.


Follow-up Questions

These are the kinds of questions a strong interviewer will ask after you solve the base problem:

1. What if the input is a data stream? You cannot build a complete set upfront. One approach: maintain a HashMap where each key maps to the length of the consecutive sequence it belongs to. When a new number x arrives, look up x-1 and x+1 in the map. Merge the sequences on either side. This is the classic Union-Find / interval-merging approach and handles streaming in O(1) amortized per insertion.

2. What if the input is a 2D grid and you want the longest path of consecutive values? This becomes a graph problem. Treat each cell as a node; connect it to neighbors with value exactly +1. Run DFS or BFS from any cell where its value minus 1 does not appear among its neighbors (the 2D equivalent of "sequence start"). Track the longest path. The same "only start from the beginning" principle applies — just in 2D.

3. What if a "consecutive sequence" allows gaps of at most k? Modify the inner while loop: instead of checking for current + 1, check for any value in [current+1, current+k]. This changes the inner loop from O(1) per step to O(k) per step, making the total O(nk). Alternatively, sort and use a sliding window.

4. What about negative numbers or duplicates? The HashSet solution handles both naturally. Duplicates collapse in the set. Negative numbers are handled identically — num - 1 checks work regardless of sign.


This Pattern Solves

The "smart start condition + HashSet" pattern appears across many problems. Recognizing the pattern is the skill, not memorizing each solution individually:

  • Contains Duplicate (LC 217) — HashSet membership check in one pass.
  • First Missing Positive (LC 41) — Reframe: the answer must be in [1, n+1]. Visit only numbers in that range; skip others.
  • Number of Connected Components in an Undirected Graph (LC 323) — Only start a BFS/DFS from unvisited nodes (the "start condition" is "not yet visited").
  • Longest String Chain (LC 1048) — Only start DP from words with no valid predecessor in the chain.
  • Brick Wall (LC 554) — Count gaps at each position using a frequency map; the column with the most gaps is the "start" of the cut.

In all of these, the trick is identifying an invariant that lets you skip redundant work and bound total iterations across all inner operations to O(n).


Key Takeaway

The Longest Consecutive Sequence problem is deceptively simple once you see the key insight, but arriving at that insight requires rejecting the obvious sorting solution and asking: "What do I actually need to know?" You don't need order — you need membership. A HashSet gives you that in O(1), and the single condition num - 1 not in set turns an apparent O(n²) nested loop into a genuine O(n) algorithm by guaranteeing each element participates in exactly one sequence expansion. Whenever you see a problem that screams "sort me first," pause and ask whether a HashSet could give you everything you need without paying the O(n log n) price.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro