Intersection of Two Arrays (LC 349 + LC 350) — HashSet, HashMap, and Scalability Follow-ups [Amazon / Google]

Sanjeev SharmaSanjeev Sharma
23 min read

Advertisement

Problem Statement

LeetCode 349 — Intersection of Two Arrays

Given two integer arrays nums1 and nums2, return an array of their intersection. Each element in the result must be unique, and you may return the result in any order.

Example 1:

Input:  nums1 = [1, 2, 2, 1],  nums2 = [2, 2]
Output: [2]

Example 2:

Input:  nums1 = [4, 9, 5],  nums2 = [9, 4, 9, 8, 4]
Output: [9, 4]   (order does not matter)

LeetCode 350 — Intersection of Two Arrays II

Given two integer arrays nums1 and nums2, return an array of their intersection. Each element in the result must appear as many times as it shows in both arrays, and you may return the result in any order.

Example 1:

Input:  nums1 = [1, 2, 2, 1],  nums2 = [2, 2]
Output: [2, 2]          ← both 2s are included

Example 2:

Input:  nums1 = [4, 9, 5],  nums2 = [9, 4, 9, 8, 4]
Output: [4, 9]          ← each appears once in nums1, so each appears once in result

The critical distinction: LC 349 gives [2] from the first example. LC 350 gives [2, 2]. That single difference — unique vs. frequency-aware — completely changes which data structure you reach for. Using a HashSet for LC 350 is one of the most common interview mistakes on this problem pair.

Why This Problem Matters

These two problems are consistently among the most commonly screened problems at Amazon and Google for entry-level and new-grad roles. But their real value in an interview is not in the basic solution — it is in the follow-up questions that come after.

When an interviewer at Google asks LC 350, they typically spend two minutes on the solution and then pivot to: "What if both arrays are already sorted?" Then: "What if one array is much smaller — say ten elements vs. ten million?" Then: "What if neither array fits in RAM?" This escalating sequence is specifically designed to test whether you understand the trade-offs between time complexity, space complexity, and I/O costs — the very trade-offs that matter in distributed systems at scale.

A candidate who only knows the HashMap solution but cannot discuss the sorted two-pointer variant, binary search optimization, or external merge sort will leave the interview without an offer even if their code is correct. This post covers all of it.

Beyond the interview, the underlying pattern — store frequencies in a map, consume them as you iterate the second array — appears in deduplication pipelines, log analysis, inventory reconciliation, and any system that needs to compute set or multiset operations on large datasets. The concepts here compose.

LC 349 — Three Approaches

Approach 1: HashSet (O(n + m) time, O(min(n, m)) space) — Optimal for Most Cases

The core insight: you need fast membership testing. A HashSet provides O(1) average-case lookup. The algorithm is two steps:

  1. Insert all elements of nums1 into a set called seen. Duplicates in nums1 are automatically deduplicated — the set handles this.
  2. Iterate through nums2. For each element, check if it exists in seen. If yes, add it to the result set (which deduplicates on the nums2 side), then mark it consumed by removing it from seen.

Using a second set for the result — rather than a list — guarantees uniqueness even when nums2 contains repeated values.

seen = {1, 2}              ← built from nums1 = [1, 2, 2, 1], deduped

Walk nums2 = [2, 2]:
  num = 2in seen → add to result, remove from seen
  num = 2NOT in seen (was removed) → skip

result = {2}

Why remove from seen rather than from result? Because we want the result to contain unique values. Removing from seen is equivalent to "consuming" the match so we do not count it again. For LC 349 this matters less (result is a set anyway), but the habit of consuming matches becomes critical in LC 350.

Time: O(n) to build the set + O(m) to walk nums2 + O(1) per lookup = O(n + m) Space: O(min(n, m)) if you build the set from the smaller array

Optimization: Always build the HashSet from the smaller array. This minimizes memory usage. If len(nums2) < len(nums1), swap them before building.

Approach 2: Sort + Two Pointers (O(n log n + m log m) time, O(1) space)

If you cannot use extra memory — or if the interviewer explicitly asks for O(1) space — sorting is the answer.

  1. Sort both arrays.
  2. Place two pointers i = 0, j = 0 at the start of nums1 and nums2 respectively.
  3. Compare nums1[i] and nums2[j]:
    • If equal: record the match, advance both pointers. Use a prev variable or a result set to skip duplicates.
    • If nums1[i] < nums2[j]: advance i to catch up.
    • If nums1[i] > nums2[j]: advance j to catch up.
  4. Continue until either pointer goes out of bounds.
nums1 sorted: [1, 2, 2, 4]
nums2 sorted: [2, 4, 9, 9]
                i             j

i=0 j=0:  1 < 2  → advance i
i=1 j=0:  2 == 2 → match! add 2, advance both
i=2 j=1:  2 < 4  → advance i  (skip duplicate 2 in nums1)
i=3 j=1:  4 == 4 → match! add 4, advance both
i=4: out of bounds → stop

Result: [2, 4]

How to skip duplicates: After recording a match, keep advancing the pointer while nums1[i] == nums1[i-1] (or use a result set). This is the detail candidates most often forget under pressure.

Time: O(n log n + m log m) dominated by sorting Space: O(1) extra (O(log n) for the sort's stack if the language uses in-place sort)

When to choose this over HashSet: When memory is severely constrained, or when the interviewer mentions that the data is already sorted (which drops the time cost to O(n + m) — better than the HashSet approach if you count the set's memory).

Approach 3: Binary Search (O(n log m) time, O(n) space)

This approach shines when one array is much smaller than the other — a scenario the interviewer is likely to raise as a follow-up.

  1. Sort the larger array (nums2, size m).
  2. For each element in the smaller array (nums1, size n), run binary search in the sorted larger array.
  3. If found, add to the result set (for uniqueness) and mark that position as visited to avoid reuse.
nums1 = [4, 9, 5]         ← small array, iterate this
nums2 sorted = [4, 8, 9, 9]   ← large array, binary search here

For 4: binary search finds index 0 → match
For 9: binary search finds index 2 → match
For 5: binary search finds nothing → skip

Result: [4, 9]

Time: O(m log m) to sort + O(n log m) for n binary searches = O((n + m) log m) When n is tiny — say n = 10, m = 10,000,000 — this is approximately O(n log m) = O(10 * 23) = 230 operations, whereas the HashSet would need O(m) = 10,000,000 operations just to build the set. Binary search wins decisively.

Space: O(n) for the result set + O(log m) sort stack

Time: O(n log m) when n << m Space: O(n) result, O(log m) sort stack

LC 350 — HashMap Frequency Counting

LC 350 changes the rules: duplicates count. If 2 appears twice in both arrays, the result must contain 2 twice. A HashSet cannot track this — sets deduplicate by definition. You need a HashMap mapping each value to its count.

The Algorithm

  1. Build a frequency map count from the smaller array (nums1). For each element, count[num] += 1.
  2. Walk the larger array (nums2). For each element:
    • If count[num] > 0: this element appears in nums1 with remaining capacity. Add it to the result. Decrement count[num].
    • Otherwise: skip it.

Step 2 is the key mechanism: count[num] > 0 is the "still have budget" check. Decrementing after adding is "consuming" that budget. This ensures we add each shared element exactly min(count_in_nums1, count_in_nums2) times.

Why min? If 2 appears 3 times in nums1 (so count[2] = 3) but only 1 time in nums2, the loop in step 2 will encounter 2 exactly once — producing one match. If 2 appears 1 time in nums1 (count[2] = 1) but 5 times in nums2, after the first match count[2] drops to 0, and the remaining 4 encounters are skipped. In both cases the result count is min(freq_in_nums1, freq_in_nums2).

Time: O(n + m) Space: O(min(n, m)) — build the map from the smaller array

Always build the map from the smaller array. This is worth stating explicitly in an interview. If nums1 has 10 elements and nums2 has 10 million, building the map from nums1 uses O(10) space. Building from nums2 uses O(10,000,000) space. The algorithm is correct either way; the space usage is not.

Visual Dry Run

LC 349 — HashSet Approach

nums1 = [4, 9, 5]
nums2 = [9, 4, 9, 8, 4]

Step 1: Build seen from nums1
  seen = {4, 9, 5}

Step 2: Walk nums2
  num=99 in seen → add to result, remove from seen
    result={9},  seen={4, 5}

  num=44 in seen → add to result, remove from seen
    result={9, 4},  seen={5}

  num=99 NOT in seen → skip

  num=88 NOT in seen → skip

  num=44 NOT in seen → skip

Output: [9, 4]     (unique, order arbitrary)

LC 350 — HashMap Approach

nums1 = [1, 2, 2, 1]
nums2 = [2, 2]

Step 1: Build count from nums1
  count = {1: 2,  2: 2}

Step 2: Walk nums2
  num=2 → count[2]=2 > 0 → add 2 to result, count[2] becomes 1
    result=[2],  count={1:2, 2:1}

  num=2 → count[2]=1 > 0 → add 2 to result, count[2] becomes 0
    result=[2, 2],  count={1:2, 2:0}

Output: [2, 2]     (both 2s included — frequency-aware)

Notice what would have gone wrong with a HashSet for LC 350: after the first 2 match, the set approach would have removed 2 from seen, so the second 2 in nums2 would be skipped. The result would be [2] — correct for LC 349, wrong for LC 350. That is the exact mistake the interviewer is watching for.

Common Mistakes

Mistake 1: Using HashSet for LC 350

This is the single most common error on this problem pair. A HashSet is the right tool for LC 349 (unique results required). For LC 350, you need a HashMap with frequency counts. The two problems look similar; the difference is whether duplicates are preserved in the output. If you are unsure which problem you are solving, re-read whether the result should contain unique elements or frequency-matched elements.

Mistake 2: Forgetting to Remove / Decrement After a Match

In the HashSet approach for LC 349, failing to remove an element from seen after it is matched allows the same element to be matched multiple times if nums2 contains duplicates. Example: nums1 = [2], nums2 = [2, 2]. Without removal, the result would incorrectly be [2, 2]. With removal, correctly [2].

In the HashMap approach for LC 350, forgetting to decrement count[num] means the budget is never consumed. The result would include num every time it appears in nums2, regardless of how many times it appears in nums1.

Mistake 3: Building the Map from the Wrong Array

Always build the frequency map from the smaller array, then iterate the larger. The algorithm is correct either way, but building from the larger array wastes memory. In an interview setting, mentioning this optimization explicitly signals strong engineering instincts — it is the same optimization that matters in production systems where one dataset is tiny (a lookup table) and another is enormous (a data stream).

Mistake 4: Skipping the Duplicate-Skip Logic in Two Pointers

When using the sort + two pointers approach for LC 349, candidates frequently forget to skip over duplicate elements after recording a match. If nums1 = [2, 2, 3] and nums2 = [2, 3], after matching the first 2 and advancing both pointers, the next nums1[i] is also 2. Without a skip, you would record 2 again and produce [2, 2] instead of the correct [2, 3]. The fix: after a match, advance the pointer past all consecutive duplicates, or collect results in a set.

Solutions

Python

from typing import List
from collections import Counter

# ──────────────────────────────────────────────────
# LC 349 — Intersection of Two Arrays (unique result)
# ──────────────────────────────────────────────────

class Solution349:

    # Approach 1: HashSet — O(n+m) time, O(min(n,m)) space
    def intersection_hashset(self, nums1: List[int], nums2: List[int]) -> List[int]:
        # Always build the set from the smaller array to save memory
        if len(nums1) > len(nums2):
            nums1, nums2 = nums2, nums1

        seen = set(nums1)        # O(n) — deduplication is free with sets
        result = set()

        for num in nums2:        # O(m)
            if num in seen:      # O(1) average lookup
                result.add(num)  # adding to a set ensures uniqueness

        return list(result)

    # Approach 2: Sort + Two Pointers — O(n log n + m log m) time, O(1) extra space
    def intersection_two_pointers(self, nums1: List[int], nums2: List[int]) -> List[int]:
        nums1.sort()
        nums2.sort()
        result = []
        i, j = 0, 0

        while i < len(nums1) and j < len(nums2):
            if nums1[i] == nums2[j]:
                # Only add if it is a new value (skip consecutive duplicates)
                if not result or result[-1] != nums1[i]:
                    result.append(nums1[i])
                i += 1
                j += 1
            elif nums1[i] < nums2[j]:
                i += 1
            else:
                j += 1

        return result

    # Approach 3: Binary Search — O(n log m) when n << m
    def intersection_binary_search(self, nums1: List[int], nums2: List[int]) -> List[int]:
        import bisect
        # Sort the larger array; iterate the smaller
        if len(nums1) > len(nums2):
            nums1, nums2 = nums2, nums1

        nums2.sort()
        result = set()

        for num in nums1:
            # bisect_left finds the insertion point for num in nums2
            idx = bisect.bisect_left(nums2, num)
            if idx < len(nums2) and nums2[idx] == num:
                result.add(num)   # set handles uniqueness automatically

        return list(result)


# ──────────────────────────────────────────────────
# LC 350 — Intersection of Two Arrays II (frequency-aware result)
# ──────────────────────────────────────────────────

class Solution350:

    # HashMap frequency count — O(n+m) time, O(min(n,m)) space
    def intersect(self, nums1: List[int], nums2: List[int]) -> List[int]:
        # Build the frequency map from the smaller array
        if len(nums1) > len(nums2):
            nums1, nums2 = nums2, nums1

        # Counter is a dict subclass that counts hashable objects
        count = Counter(nums1)   # e.g. [1,2,2,1] → {1:2, 2:2}

        result = []
        for num in nums2:
            if count[num] > 0:       # still have budget for this element
                result.append(num)
                count[num] -= 1      # consume one unit of budget

        return result

    # Follow-up: both arrays already sorted → two pointers, O(n+m), O(1) space
    def intersect_sorted(self, nums1: List[int], nums2: List[int]) -> List[int]:
        # This assumes the caller guarantees both inputs are sorted
        result = []
        i, j = 0, 0

        while i < len(nums1) and j < len(nums2):
            if nums1[i] == nums2[j]:
                result.append(nums1[i])  # include duplicates — no dedup here
                i += 1
                j += 1
            elif nums1[i] < nums2[j]:
                i += 1
            else:
                j += 1

        return result

JavaScript

// ──────────────────────────────────────────────────
// LC 349 — Intersection of Two Arrays (unique result)
// ──────────────────────────────────────────────────

/**
 * Approach 1: HashSet — O(n+m) time, O(min(n,m)) space
 * @param {number[]} nums1
 * @param {number[]} nums2
 * @return {number[]}
 */
function intersection_hashset(nums1, nums2) {
    // Build the set from the smaller array to minimise memory
    if (nums1.length > nums2.length) [nums1, nums2] = [nums2, nums1];

    const seen = new Set(nums1);   // deduplication is free
    const result = new Set();

    for (const num of nums2) {
        if (seen.has(num)) {
            result.add(num);       // Set.add handles uniqueness
        }
    }

    return [...result];
}

/**
 * Approach 2: Sort + Two Pointers — O(n log n + m log m) time, O(1) extra
 * @param {number[]} nums1
 * @param {number[]} nums2
 * @return {number[]}
 */
function intersection_twoPointers(nums1, nums2) {
    nums1.sort((a, b) => a - b);   // numeric sort — critical, default JS sort is lexicographic
    nums2.sort((a, b) => a - b);

    const result = [];
    let i = 0, j = 0;

    while (i < nums1.length && j < nums2.length) {
        if (nums1[i] === nums2[j]) {
            // Add only if it is a new value (avoid duplicates in result)
            if (result.length === 0 || result[result.length - 1] !== nums1[i]) {
                result.push(nums1[i]);
            }
            i++;
            j++;
        } else if (nums1[i] < nums2[j]) {
            i++;
        } else {
            j++;
        }
    }

    return result;
}

/**
 * Approach 3: Binary Search — O(n log m) when nums1 is much smaller
 * @param {number[]} nums1
 * @param {number[]} nums2
 * @return {number[]}
 */
function intersection_binarySearch(nums1, nums2) {
    // Sort the larger array; iterate the smaller
    if (nums1.length > nums2.length) [nums1, nums2] = [nums2, nums1];

    nums2.sort((a, b) => a - b);
    const result = new Set();

    const binarySearch = (arr, target) => {
        let lo = 0, hi = arr.length - 1;
        while (lo <= hi) {
            const mid = (lo + hi) >>> 1;   // unsigned right shift avoids overflow
            if (arr[mid] === target) return true;
            else if (arr[mid] < target) lo = mid + 1;
            else hi = mid - 1;
        }
        return false;
    };

    for (const num of nums1) {
        if (binarySearch(nums2, num)) {
            result.add(num);   // Set handles uniqueness
        }
    }

    return [...result];
}


// ──────────────────────────────────────────────────
// LC 350 — Intersection of Two Arrays II (frequency-aware)
// ──────────────────────────────────────────────────

/**
 * HashMap frequency count — O(n+m) time, O(min(n,m)) space
 * @param {number[]} nums1
 * @param {number[]} nums2
 * @return {number[]}
 */
function intersect(nums1, nums2) {
    // Build the frequency map from the smaller array
    if (nums1.length > nums2.length) [nums1, nums2] = [nums2, nums1];

    const count = new Map();
    for (const num of nums1) {
        count.set(num, (count.get(num) || 0) + 1);  // build frequency map
    }

    const result = [];
    for (const num of nums2) {
        const freq = count.get(num) || 0;
        if (freq > 0) {
            result.push(num);           // include this element in result
            count.set(num, freq - 1);   // consume one unit of budget
        }
    }

    return result;
}

/**
 * Follow-up: both arrays already sorted → two pointers, O(n+m) time, O(1) space
 * @param {number[]} nums1
 * @param {number[]} nums2
 * @return {number[]}
 */
function intersect_sorted(nums1, nums2) {
    const result = [];
    let i = 0, j = 0;

    while (i < nums1.length && j < nums2.length) {
        if (nums1[i] === nums2[j]) {
            result.push(nums1[i]);   // include duplicates — no dedup needed here
            i++;
            j++;
        } else if (nums1[i] < nums2[j]) {
            i++;
        } else {
            j++;
        }
    }

    return result;
}

Complexity Analysis

LC 349

ApproachTimeSpaceBest When
HashSetO(n + m)O(min(n, m))General case, unsorted input
Sort + Two PointersO(n log n + m log m)O(1) extraMemory is constrained
Binary SearchO(n log m)O(n) resultn << m (one array tiny)

LC 350

ApproachTimeSpaceBest When
HashMap (frequency)O(n + m)O(min(n, m))General case
Sort + Two PointersO(n log n + m log m)O(1) extraMemory constrained
Two Pointers (pre-sorted)O(n + m)O(1) extraBoth arrays already sorted
Binary Search (n << m)O(n log m)O(n) result + O(log m) sortOne array much smaller

For the HashMap approach, note that if you can guarantee the elements fit within a bounded range (such as [0, 1000]), you can use an integer array instead of a hash map, reducing overhead from hash collisions to direct index access.

The Follow-up Discussion — What Interviewers Are Really Testing

These three follow-up questions are the portion of the interview that separates a good candidate from a great one. They are not hypotheticals — they are real engineering scenarios. Interviewers at Google and Amazon use them to see if you can extend an algorithmic solution to a systems context.

Follow-up 1: What if both arrays are already sorted?

Answer: Use the two-pointer approach directly — no sorting step needed.

With pre-sorted input, the two-pointer solution for LC 350 runs in O(n + m) time and O(1) extra space (not counting output). This is strictly better than the HashMap approach, which needs O(min(n, m)) space for the frequency map. When the interviewer tells you the input is sorted, they are handing you this optimization on a silver platter.

For LC 349 (unique results), add a check after each match to skip past duplicates in both arrays before the next comparison.

nums1 sorted: [1, 1, 2, 3]
nums2 sorted: [1, 2, 2, 4]

i=0, j=0: 1 == 1 → match, add 1, i=1, j=1
i=1, j=1: 1 < 2 → advance i
i=2, j=1: 2 == 2 → match, add 2, i=3, j=2
i=3, j=2: 3 > 2 → advance j
i=3, j=3: 3 < 4 → advance i
i=4: done

LC 349 result (unique): [1, 2]
LC 350 result (with duplicates): [1, 2]   ← same here since freq is 1 each

The key insight to articulate: Sorted order makes comparisons local and sequential — you never need random access. This is the property that enables O(1) space.

Follow-up 2: What if one array is much smaller than the other?

Answer: Use binary search on the larger array for each element of the smaller array.

Suppose nums1 has n = 100 elements and nums2 has m = 10,000,000 elements. Compare the options:

  • HashMap: O(m) = 10,000,000 operations just to read nums2, plus O(n) = 100 for the map. You are I/O bound by the large array.
  • Binary Search: Sort nums2 once in O(m log m). Then for each of the n = 100 elements in nums1, do a binary search in O(log m) = O(23) time. Total: O(m log m + n log m). The n log m part is just 2,300 operations — essentially free.

For LC 350 (with duplicates), binary search alone is not quite enough because you need to count how many times the target appears in the sorted nums2. You can handle this by finding the leftmost and rightmost positions of the target using two binary searches, then computing min(count_in_nums1, rightmost - leftmost + 1).

The key insight to articulate: When datasets are skewed in size, the asymmetry in the algorithm should mirror the asymmetry in the data. Always iterate the smaller collection and search or look up in the larger one.

Follow-up 3: What if the arrays do not fit in memory?

This is the scalability question. It signals that you are being asked to think about large-scale data engineering, not just in-memory algorithms.

Scenario A: Both arrays are too large to fit in RAM

Use external merge sort:

  1. Split each array into chunks that fit in RAM.
  2. Sort each chunk individually and write back to disk.
  3. Merge the sorted chunks using a k-way merge (like merge sort's merge step), keeping only one block of each chunk in RAM at a time.
  4. Once both arrays are externally sorted, stream them together using the two-pointer technique, reading one block from each at a time.

This approach uses O(B) RAM where B is the block size you choose, and costs O(n/B + m/B) disk reads. For LC 350, the two-pointer merge computes the frequency-aware intersection with O(1) extra RAM beyond the I/O buffers.

Scenario B: nums1 fits in RAM but nums2 does not

If nums1 is small enough to build a HashMap in memory, keep the HashMap resident and stream nums2 from disk in chunks. For each element read from disk, perform the O(1) HashMap lookup. This is essentially the original HashMap algorithm but with streaming I/O.

Scenario C: Neither fits in RAM — use hash partitioning

Divide both arrays into K partitions by computing hash(element) % K. All elements with the same hash value land in the same partition. The intersection of the original arrays is the union of the intersections of corresponding partitions. Process one partition pair at a time; each pair fits in RAM.

The key insight to articulate: The two-pointer approach for pre-sorted data is the foundation of external sort merging — it is the same algorithm operating at disk I/O granularity instead of memory granularity. Understanding this connection demonstrates systems thinking beyond purely algorithmic reasoning.

This Pattern Solves

ProblemHow This Pattern Applies
LC 349 — Intersection of Two ArraysHashSet for unique intersection
LC 350 — Intersection of Two Arrays IIHashMap for frequency-aware intersection
LC 1 — Two SumHashMap for O(1) complement lookup
LC 242 — Valid AnagramFrequency map from one string, consume with the other
LC 383 — Ransom NoteFrequency map from magazine, consume with ransom note
LC 1002 — Find Common CharactersFrequency map + min-frequency across all strings
LC 347 — Top K Frequent ElementsFrequency map + partial sort or bucket sort
LC 560 — Subarray Sum Equals KPrefix sum + frequency map for count of valid subarrays

The general pattern: build a frequency map from one collection, consume it with the other, and collect results where the remaining count is positive. This pattern appears any time you need to compare two multisets (bags of elements where duplicates matter).

Key Takeaway

LC 349 and LC 350 are deceptively similar — one asks for unique intersection, the other for frequency-aware intersection — and that distinction is exactly what the interviewer is probing. Use a HashSet for LC 349 and a HashMap for LC 350; confusing them is the canonical mistake. But the real depth of these problems is in the three follow-up questions: sorted input unlocks O(1) space via two pointers, a size-skewed input favors binary search over the smaller collection, and data that does not fit in RAM calls for external merge sort or hash partitioning. Knowing those answers transforms a routine Easy problem into a systems design conversation — which is precisely what top-tier interviewers are looking for.

Next: Problem 11 — Pascal's Triangle

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro