Find the Duplicate Number — Floyd's Cycle Detection Fully Explained [Google, Amazon Hard-Medium]

Sanjeev SharmaSanjeev Sharma
18 min read

Advertisement

Problem Statement

Given an array nums of n + 1 integers where each integer is in the range [1, n] inclusive, there is exactly one repeated number in nums. Return that repeated number.

You must solve the problem without modifying the array nums and using only constant extra space.

Examples:

Input:  [1, 3, 4, 2, 2]
Output: 2

Input:  [3, 1, 3, 4, 2]
Output: 3

Input:  [3, 3, 3, 3, 3]
Output: 3

Constraints (these are the puzzle — read carefully):

  • 1 <= n <= 10^5
  • nums.length == n + 1
  • 1 <= nums[i] <= n
  • There is only one repeated number in nums, but it could be repeated more than twice
  • You must not modify the array (no sorting, no swapping in place)
  • You must use only O(1) extra space (no hash sets, no frequency arrays)
  • The problem's follow-up asks for an O(n) time solution

Why This Problem Matters

LeetCode 287 is rated Medium, but a majority of experienced engineers call this rating misleading. The reason: the problem statement looks trivially easy — "find the duplicate number" — until you read the constraints. At that point, nearly every solution you would naturally reach for is eliminated before you write a single line of code.

The constraints are not just guardrails. They are the entire puzzle. Understanding why each approach fails is what interviewers at Google and Amazon actually want to see. Blurting out the Floyd's solution without being able to justify the constraint eliminations signals memorization, not understanding.

This problem also introduces a genuinely beautiful algorithmic idea: you can treat an array as an implicit linked list and run a graph algorithm on it without ever building a graph. That insight — reframing the data structure to unlock a different class of algorithms — is a transferable skill that unlocks problems like Linked List Cycle II (LC 142), Happy Number (LC 202), and more.

Interview frequency: This appears in Google and Amazon interviews with high frequency, often paired with the question: "Walk me through every approach and tell me why you ruled it out." The constraint-elimination walkthrough is the interview, not just the code.


The Constraint Analysis — Why Every Obvious Approach Fails

Let us go through every approach a competent engineer would reach for and identify exactly which constraint kills it.

Approach 1: Sorting

Sort the array, then scan for adjacent equal elements.

[1, 3, 4, 2, 2] → sorted → [1, 2, 2, 3, 4] → nums[1] == nums[2]2

Time: O(n log n). Space: O(1) (in-place sort).

Why it fails: Sorting modifies the array. The constraint says you must not modify nums. Eliminated.

Approach 2: Hash Set

Iterate through the array, adding each element to a set. Return the first element already in the set.

seen = set()
for num in nums:
    if num in seen:
        return num
    seen.add(num)

Time: O(n). Space: O(n).

Why it fails: The set can grow to hold up to n elements. This violates the O(1) extra space constraint. Eliminated.

Approach 3: Negative Marking (Index as Presence Flag)

For each value v, negate nums[|v|] as a visited marker. If it is already negative, v is the duplicate.

Time: O(n). Space: O(1).

Why it fails: Modifying array values is still modifying the array. Even though you could restore them afterward, this violates the constraint. Eliminated.

Approach 4: Binary Search on Value Range

This is the approach that survives the constraint check but is not fully optimal.

The key observation: for any midpoint mid in the range [1, n], count how many numbers in nums are less than or equal to mid. Call this count c.

  • If the duplicate is in [1, mid], then c > mid (there are more values in that half than there are slots for them — pigeonhole principle).
  • If the duplicate is in [mid+1, n], then c <= mid.

Binary search on the value range narrows down the duplicate in O(log n) rounds, each round scanning the array once.

Time: O(n log n). Space: O(1). Modifies array: No.

Why it is not optimal: It satisfies the no-modify and O(1) space constraints, but the problem's follow-up asks for O(n) time. O(n log n) is possible but not ideal. This is the "bonus" solution worth mentioning in an interview after presenting the optimal approach.

Approach 5: Floyd's Cycle Detection

Time: O(n). Space: O(1). Modifies array: No.

This is the only approach that satisfies every constraint simultaneously. The trick requires a non-obvious reframing of the problem.


The Floyd's Insight — Array as an Implicit Linked List

Here is the core idea: treat the array as a function that maps indices to values, where the value at each index is the "next" index to visit.

Formally, define: node at index i points to node at index nums[i].

So if nums = [1, 3, 4, 2, 2]:

  • Start at index 0 → value is 1 → move to index 1
  • At index 1 → value is 3 → move to index 3
  • At index 3 → value is 2 → move to index 2
  • At index 2 → value is 4 → move to index 4
  • At index 4 → value is 2 → move to index 2 ← we have been here before

The traversal looks like this as a linked list:

01324
                ↑   ↓
                └───┘

Why is there always a cycle? Every value in nums is in the range [1, n], so every "next pointer" is a valid index. With n + 1 elements and values only reaching n, by the pigeonhole principle at least two indices point to the same destination — that destination is the duplicate, and those two pointers create a cycle.

Why does the cycle entrance equal the duplicate? Because the only way two different indices can point to the same next index is if that next index appears as a value twice in the array. The "place where paths converge" is precisely the duplicate value.

This transforms "find the duplicate in an array" into "find the entrance of a cycle in a linked list" — a problem Floyd's algorithm solves in O(n) time and O(1) space.


Phase 1 — Finding Where Tortoise and Hare Meet

Run two pointers starting from index 0 (the entry point of the implicit linked list, since index 0's value is always in [1, n] and never equals 0 — so index 0 is never part of the cycle body):

  • Slow (tortoise): moves one step at a time → slow = nums[slow]
  • Fast (hare): moves two steps at a time → fast = nums[nums[fast]]

Because the hare moves twice as fast, it eventually laps the tortoise inside the cycle. They are guaranteed to meet inside the cycle, not at the entrance.

Important: Phase 1 only tells you there is a cycle and gives you a meeting point inside it. It does not directly give you the duplicate.


Phase 2 — Finding the Cycle Entrance

After Phase 1, reset the slow pointer to the start of the linked list (index 0), but keep the fast pointer at the meeting point from Phase 1.

Now advance both pointers one step at a time:

  • slow = nums[slow]
  • fast = nums[fast]

The position where they meet next is the cycle entrance — which equals the duplicate number.


Why the Entrance Equals the Duplicate — The Proof

Let us define:

  • x = distance from the start (index 0) to the cycle entrance
  • y = distance from the cycle entrance to the meeting point (inside the cycle)
  • d = total cycle length (number of nodes in the cycle)

When Phase 1 ends (slow and fast meet inside the cycle):

  • Slow has traveled: x + y steps
  • Fast has traveled: x + y + K*d steps (for some integer K — fast completed extra full loops)

Since fast travels twice as far as slow:

2(x + y) = x + y + K*d
x + y    = K*d
x        = K*d - y

This equation says: the distance from the start to the cycle entrance (x) equals K full cycles minus the distance from the cycle entrance to the meeting point (y).

In practical terms: if you walk x steps forward from the meeting point (still inside the cycle), you complete K full loops and land exactly at the cycle entrance. Meanwhile, if you walk x steps from the start of the list, you land exactly at the cycle entrance by definition.

So: two pointers walking one step at a time — one from the list start, one from the meeting point — will converge at the cycle entrance after exactly x steps.

The cycle entrance is the duplicate because it is the value that two different indices point to — the node with two incoming edges.


Visual Dry Run — Tracing [1, 3, 4, 2, 2]

First, let us draw the implicit linked list this array creates:

Index:  0   1   2   3   4
Value:  1   3   4   2   2

Traversal map:
  index 0 → nums[0] = 1  →  go to index 1
  index 1 → nums[1] = 3  →  go to index 3
  index 2 → nums[2] = 4  →  go to index 4
  index 3 → nums[3] = 2  →  go to index 2
  index 4 → nums[4] = 2  →  go to index 2  ← duplicate! two paths enter index 2

Linked list (starting from index 0):
  01324
                ↑   ↓
                └───┘
  Cycle entrance = index 2, cycle length = 2 (24)

Phase 1: Detect the cycle

Both pointers start at index 0.

StepSlow (index)Fast (index)slow == fast?
init00
1nums[0] = 1nums[nums[0]] = nums[1] = 3No
2nums[1] = 3nums[nums[3]] = nums[2] = 4No
3nums[3] = 2nums[nums[4]] = nums[2] = 4No
4nums[2] = 4nums[nums[4]] = nums[2] = 4Yes — meet at index 4

Meeting point: index 4 (inside the cycle).

Phase 2: Find the cycle entrance

Reset slow to index 0. Keep fast at index 4. Advance both one step at a time.

StepSlow (index)Fast (index)slow == fast?
init04No
1nums[0] = 1nums[4] = 2No
2nums[1] = 3nums[2] = 4No
3nums[3] = 2nums[4] = 2Yes — meet at index 2

The cycle entrance is index 2. The value at index 2 is... well, the index itself in this mapping — the entrance node number is the duplicate: 2.

The answer is 2. Correct.


Common Mistakes

Mistake 1: Resetting Fast to Start in Phase 2 Instead of Slow

The single most common error. Phase 2 requires resetting slow to the list start (index 0) while keeping fast at the Phase 1 meeting point. Many people accidentally reset fast and keep slow, which produces wrong results.

# WRONG — resets the wrong pointer
slow = nums[0]  # Phase 1 meeting point
fast = nums[0]  # WRONG — should stay at meeting point, not go back to start

# CORRECT
slow = nums[0]  # Reset THIS pointer to start
# fast stays where Phase 1 left it

Mistake 2: Confusing Index and Value

This problem requires careful mental tracking. The pointer's current position is an index. The pointer's next step is determined by the value at that index. The cycle entrance index is what equals the duplicate value. Getting these confused leads to off-by-one answers.

A concrete rule: slow = nums[slow] means "my new index is the value stored at my current index."

Mistake 3: Starting Phase 2 One Step Too Early or Late

Phase 2 must begin with both pointers at their respective starting positions, and they advance simultaneously from there. Do not take one extra step before starting Phase 2, and do not skip the first advancement.

Mistake 4: Assuming the Duplicate Must Appear Exactly Twice

The constraints say the duplicate can repeat more than twice. [3, 3, 3, 3, 3] is a valid input. Floyd's algorithm handles this correctly because the cycle structure still forms — multiple indices point into the same node — but it is worth testing your mental model against this case.


Solutions

Primary: Floyd's Cycle Detection — O(n) time, O(1) space

Python:

from typing import List

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        # Phase 1: Find the intersection point inside the cycle.
        # Both pointers start at the "head" — index 0 (not nums[0])
        # because index 0 is always outside the cycle (values are in [1,n]).
        slow = nums[0]
        fast = nums[0]

        while True:
            slow = nums[slow]           # tortoise: one step
            fast = nums[nums[fast]]     # hare: two steps
            if slow == fast:
                break                   # they met inside the cycle

        # Phase 2: Find the cycle entrance.
        # Reset slow to the list head; keep fast at the meeting point.
        # Both now move one step at a time.
        slow = nums[0]
        while slow != fast:
            slow = nums[slow]
            fast = nums[fast]

        # They meet at the cycle entrance = the duplicate number
        return slow

JavaScript:

/**
 * @param {number[]} nums
 * @return {number}
 */
var findDuplicate = function(nums) {
    // Phase 1: Detect the cycle using tortoise and hare.
    // Treating the array as: index i → nums[i] (implicit linked list).
    let slow = nums[0];
    let fast = nums[0];

    do {
        slow = nums[slow];           // tortoise: one step
        fast = nums[nums[fast]];     // hare: two steps
    } while (slow !== fast);
    // slow === fast: they are inside the cycle

    // Phase 2: Find the cycle entrance (= the duplicate).
    // Reset slow to the head; fast stays at the meeting point.
    slow = nums[0];
    while (slow !== fast) {
        slow = nums[slow];           // both move one step
        fast = nums[fast];
    }

    return slow;  // cycle entrance = duplicate value
};

Bonus: Binary Search on Value Range — O(n log n) time, O(1) space

This approach satisfies the no-modify and O(1) space constraints, but runs in O(n log n). Worth knowing as a stepping stone — it is far more intuitive than Floyd's and is valid if the follow-up O(n) requirement is waived.

Core idea (Pigeonhole Principle): In a range [1, mid], there are exactly mid distinct values. If more than mid elements of nums fall in this range, the duplicate must be in [1, mid]. Otherwise, it is in [mid+1, n].

Python:

from typing import List

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        lo, hi = 1, len(nums) - 1  # binary search over the VALUE range [1, n]

        while lo < hi:
            mid = (lo + hi) // 2

            # Count how many numbers in nums are <= mid
            count = sum(1 for x in nums if x <= mid)

            if count > mid:
                # Too many values in [1, mid] — duplicate is here
                hi = mid
            else:
                # Duplicate must be in [mid+1, hi]
                lo = mid + 1

        return lo  # lo == hi == the duplicate

JavaScript:

/**
 * Binary search on value range — O(n log n), O(1) space, no modification.
 * @param {number[]} nums
 * @return {number}
 */
var findDuplicateBinarySearch = function(nums) {
    let lo = 1;
    let hi = nums.length - 1;  // value range is [1, n]

    while (lo < hi) {
        const mid = Math.floor((lo + hi) / 2);

        // Count elements <= mid (O(n) scan per round)
        let count = 0;
        for (const x of nums) {
            if (x <= mid) count++;
        }

        if (count > mid) {
            // Pigeonhole: more values than slots in [1, mid] → duplicate here
            hi = mid;
        } else {
            lo = mid + 1;
        }
    }

    return lo;
};

Complexity Analysis

ApproachTimeSpaceModifies ArraySatisfies All Constraints
SortingO(n log n)O(1)YesNo
Hash SetO(n)O(n)NoNo
Negative MarkingO(n)O(1)YesNo
Binary Search (value range)O(n log n)O(1)NoPartial — not O(n)
Floyd's Cycle DetectionO(n)O(1)NoYes — all constraints

Floyd's analysis in detail:

  • Phase 1: Each pointer traverses at most the non-cycle part plus one cycle. The hare catches the tortoise in at most n steps total → O(n).
  • Phase 2: Slow pointer travels at most x steps (distance from head to cycle entrance). Since x <= n → O(n).
  • Overall: O(n) + O(n) = O(n).
  • Space: Only two integer variables (slow, fast) regardless of input size → O(1).

Follow-up Questions

Missing Number (LeetCode 268) — Easy

Given n distinct numbers in range [0, n], find the missing one. Use the Gauss formula: expected = n*(n+1)/2, return expected - sum(nums). Or use XOR: XOR all indices with all values; paired elements cancel; the unpaired result is the missing number. O(n) time, O(1) space.

The connection: Missing Number and Find Duplicate are two sides of the same coin. Missing Number has n values in n+1 slots (one missing). Find Duplicate has n+1 values in n slots (one repeated). The Gauss/XOR trick works for the former because you know exactly what the complete set should be.

Find All Duplicates in an Array (LeetCode 442) — Medium

Given an array where each element appears once or twice, find all duplicates. Now the "no modify" constraint is relaxed — you can use the negative marking trick: for each value v, negate nums[|v| - 1]; if it is already negative, v is a duplicate. O(n) time, O(1) space (excluding output list). This is a good demonstration of why the no-modify constraint in LC 287 is so significant — removing it opens up a whole new solution class.

First Missing Positive (LeetCode 41) — Hard

Given an unsorted array of integers, find the smallest missing positive integer. This is the hardest of the three: use the array itself as a hash table by placing each value v in the position v - 1 (if in range), then scan for the first mismatch. O(n) time, O(1) space — but requires modification.

The connection: All three problems exploit the constraint that values are bounded by n and the array has n+1 or n slots, allowing index-as-value tricks.


This Pattern Solves

Floyd's cycle detection — and the broader idea of treating an array as an implicit graph — appears across several LeetCode problems:

ProblemHow the Pattern Applies
Linked List Cycle II (LC 142)Direct application — find cycle entrance in an explicit linked list
Happy Number (LC 202)Detect if digit-squaring iteration cycles (cycle = not happy)
Find the Smallest Letter Greater Than Target (LC 744)Binary search on a circular value space
Circular Array Loop (LC 457)Detect cycles in arrays where index wraps
Repeated DNA Sequences (LC 187)Sliding window + hash, related pigeonhole reasoning

The meta-pattern: whenever a function maps a finite set to itself (integers in range, indices, etc.), you can model it as a functional graph and use Floyd's algorithm to detect repeated visits.


Key Takeaway

LeetCode 287 is fundamentally a constraint-satisfaction problem disguised as a search problem. The value of this problem is not memorizing Floyd's algorithm — it is understanding why three layers of constraints (no modification, O(1) space, O(n) time) form a logical funnel that leaves exactly one viable path through the solution space.

The Floyd's insight — that an array with bounded values defines an implicit linked list, and a duplicate value creates a cycle in that list — is a concrete example of a broader engineering skill: reframing the data structure to unlock a different class of algorithms. That skill transfers across dozens of harder problems and is what separates candidates who think from candidates who recall.

If you can walk through the constraint-elimination table in an interview without prompting, implement Floyd's with correct Phase 1 and Phase 2 pointer resets, and explain the mathematical proof intuitively, you have demonstrated genuine mastery of this problem — regardless of whether you had seen it before.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro