Group Anagrams — Hashmap Key Design Mastery [Amazon, Google, Meta]

Problem Statement

Given an array of strings strs, group the anagrams together. You can return the answer in any order.
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.

Example:

Input:  ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Constraints:

1 <= strs.length <= 10⁴
0 <= strs[i].length <= 100
strs[i] consists of lowercase English letters

Why This Problem Matters
The Core Insight
Two Approaches Built From Scratch
Approach 1: Sorted String as Key
Approach 2: Character Frequency Tuple as Key
When Does the Difference Actually Matter?
Visual Dry Run
Sort Approach Dry Run
Frequency Tuple Approach Dry Run
Common Mistakes
Solutions
Python — Approach 1: Sorted String Key
Python — Approach 2: Character Frequency Tuple Key
JavaScript — Approach 1: Sorted String Key
JavaScript — Approach 2: Character Frequency String Key
Complexity Analysis
Approach 1 — Sorted String Key
Approach 2 — Frequency Tuple Key
Follow-up Questions
This Pattern Solves
Key Takeaway

Why This Problem Matters

Group Anagrams is the #9 most-asked LeetCode problem globally. It has appeared 91 times across 38 companies, with Amazon, Google, Meta, Microsoft, and Bloomberg all using it regularly in phone screens and onsite rounds. That frequency is not a coincidence.

The reason interviewers keep returning to this problem is that it is deceptively simple on the surface — "just group the anagrams" — but it forces candidates to demonstrate a skill that separates good engineers from great ones: the ability to design a correct and efficient hashmap key from scratch.

Any grouping problem with a hashmap reduces to one core question: what property do all members of a group share, and how do I encode that property as a hashable key? Group Anagrams is a clean, isolated test of that exact skill. Get the key wrong, and every group is broken. Get the key right, and the rest of the solution writes itself in three lines.

Beyond key design, the problem also tests your knowledge of time-complexity trade-offs. There are two canonical approaches, and knowing when to prefer one over the other — and why — is what separates a candidate who memorized a solution from one who understood it.

The Core Insight

Two strings are anagrams if and only if they contain exactly the same characters with exactly the same frequencies. Order does not matter.

This means every anagram group shares a signature — some representation of its character inventory that is identical for all members and unique across groups. The entire problem is about finding a compact, hashable encoding of that signature.

Think of it like a fingerprint: "eat", "tea", and "ate" all have one a, one e, and one t. Their fingerprint is {a:1, e:1, t:1}. "tan" and "nat" have one a, one n, and one t — a different fingerprint. "bat" has one b, one a, one t — its own fingerprint.

If we can turn this fingerprint into a hashmap key, we can group every string in a single pass.

The question is: which encoding do we choose?

Two Approaches Built From Scratch

Approach 1: Sorted String as Key

The idea: When you sort the characters of any anagram, you get the same result. Sort "eat" → "aet". Sort "tea" → "aet". Sort "ate" → "aet". They all produce the same sorted string, so use that as the key.

How it works:

Create a hashmap where keys are sorted strings and values are lists of original strings.
For each string in the input, sort its characters and use that as the key.
Append the original string to the list at that key.
Return all values from the hashmap.

Why it is elegant: It requires no knowledge of the character set. It works for any alphabet, any language, any Unicode string — whatever Python or JavaScript's sort can handle.

The cost: Sorting a string of length k takes O(k log k). With n strings, the total time is O(n · k log k). When strings are very long — say, DNA sequences of 10,000 characters — that log factor adds up.

Approach 2: Character Frequency Tuple as Key

The idea: Instead of sorting, count how many times each of the 26 lowercase letters appears. Represent that as a fixed-length tuple of 26 integers. Two anagrams will always produce the same tuple, and two non-anagrams will always produce different tuples (assuming lowercase English only).

How it works:

Create a hashmap where keys are tuples of 26 integers and values are lists of original strings.
For each string, build a count array of length 26 (index 0 = 'a', index 25 = 'z').
Increment the count for each character encountered.
Convert the array to a tuple (so it is hashable) and use it as the key.
Return all values from the hashmap.

The cost: Counting characters in a string of length k takes O(k). Converting to a tuple of 26 elements takes O(26) = O(1). With n strings, the total time is O(n · k) — strictly better than the sort approach.

The trade-off: This approach only works with a fixed, known character set (26 lowercase English letters in LeetCode's constraints). Extend to Unicode and you need a different key strategy (more on that in Follow-up Questions).

When Does the Difference Actually Matter?

For LeetCode's constraints (k ≤ 100, n ≤ 10⁴), both approaches are fast enough. In practice, the difference becomes meaningful when:

Strings are very long (genomics, log analysis, document fingerprinting) — here O(k) vs O(k log k) per string compounds significantly.
You are in a hot path that processes millions of strings per second — every constant factor counts.
The interviewer asks for the optimal solution — knowing the frequency approach and its reasoning demonstrates depth.

In any Google or Amazon system design follow-up, knowing why the frequency approach is asymptotically better — and when that matters — is what earns extra credit.

Visual Dry Run

Let us trace ["eat","tea","tan","ate","nat","bat"] through both approaches side by side.

Sort Approach Dry Run

String	Sorted Key	Hashmap State After Insert
"eat"	"aet"	`{"aet": ["eat"]}`
"tea"	"aet"	`{"aet": ["eat","tea"]}`
"tan"	"ant"	`{"aet": ["eat","tea"], "ant": ["tan"]}`
"ate"	"aet"	`{"aet": ["eat","tea","ate"], "ant": ["tan"]}`
"nat"	"ant"	`{"aet": ["eat","tea","ate"], "ant": ["tan","nat"]}`
"bat"	"abt"	`{"aet": [...], "ant": [...], "abt": ["bat"]}`

Final output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Frequency Tuple Approach Dry Run

For "eat": a=1, e=1, t=1 → (1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0)

For "tea": same counts → same tuple → same group.

For "tan": a=1, n=1, t=1 → different tuple → new group.

For "bat": a=1, b=1, t=1 → different tuple → new group.

The hashmap keys are 26-integer tuples rather than strings, but the grouping logic is identical. The only difference is how we arrive at the key — counting in O(k) instead of sorting in O(k log k).

Common Mistakes

Mistake 1 — Using a mutable list as a key. In Python, mp[[1,0,0,...]] will throw a TypeError because lists are not hashable. You must convert the frequency array to a tuple before using it as a dictionary key. This is one of the most common bugs candidates write in real interviews, especially under pressure.

Mistake 2 — Missing delimiters when building string keys from counts. If you manually serialize the frequency array as a string without separators — "123" instead of "1#2#3" — you create collisions. Frequencies [1,2,3] and [12,3] both produce "123" but represent different character distributions. Always include a separator character like # when building string keys from integer sequences.

Mistake 3 — Forgetting the empty string. "" (empty string) is a valid input. Its sorted form is "" and its frequency tuple is (0,0,...,0) — 26 zeros. Both approaches handle it correctly as long as you do not add any length checks that skip zero-length strings. Many candidates add an if not s: continue guard that incorrectly discards empty strings instead of grouping them together.

Mistake 4 — Assuming the sort approach works on all Unicode. The sort approach works on any character set natively. But the frequency-tuple approach assumes exactly 26 lowercase English letters. If an interviewer extends the problem to include uppercase, digits, or Unicode characters, your count array of size 26 will silently produce wrong answers. Always clarify the character set constraint before coding the frequency approach.

Solutions

Python — Approach 1: Sorted String Key

from collections import defaultdict
from typing import List

def groupAnagrams(strs: List[str]) -> List[List[str]]:
    # Use the sorted characters of each string as the hashmap key.
    # All anagrams produce the same sorted string.
    groups = defaultdict(list)

    for s in strs:
        # sorted(s) returns a list of chars; join it into a string key
        key = ''.join(sorted(s))
        groups[key].append(s)

    return list(groups.values())

Python — Approach 2: Character Frequency Tuple Key

from collections import defaultdict
from typing import List

def groupAnagrams(strs: List[str]) -> List[List[str]]:
    # Use a 26-integer tuple of character frequencies as the hashmap key.
    # This avoids sorting entirely — O(k) per string instead of O(k log k).
    groups = defaultdict(list)

    for s in strs:
        # Build a count array: index 0 = 'a', index 25 = 'z'
        count = [0] * 26
        for ch in s:
            count[ord(ch) - ord('a')] += 1

        # Lists are mutable and unhashable; convert to tuple first
        key = tuple(count)
        groups[key].append(s)

    return list(groups.values())

JavaScript — Approach 1: Sorted String Key

/**
 * Approach 1: Sort each string and use the result as the hashmap key.
 * Time: O(n * k log k)  Space: O(n * k)
 * @param {string[]} strs
 * @return {string[][]}
 */
function groupAnagrams(strs) {
    const groups = new Map();

    for (const s of strs) {
        // Sort characters alphabetically to get the canonical key
        const key = [...s].sort().join('');

        if (!groups.has(key)) {
            groups.set(key, []);
        }
        groups.get(key).push(s);
    }

    return [...groups.values()];
}

JavaScript — Approach 2: Character Frequency String Key

/**
 * Approach 2: Count character frequencies and encode as a delimited string key.
 * Using '#' as a separator prevents collisions between different frequency combos.
 * Time: O(n * k)  Space: O(n * k)
 * @param {string[]} strs
 * @return {string[][]}
 */
function groupAnagrams(strs) {
    const groups = new Map();

    for (const s of strs) {
        // Build a count array of length 26 (index 0 = 'a')
        const count = new Array(26).fill(0);
        for (const ch of s) {
            count[ch.charCodeAt(0) - 97]++;
        }

        // Serialize with '#' separator to avoid collision:
        // [1,2,3] → "1#2#3#" is unambiguous; "123" would collide with [12,3]
        const key = count.join('#');

        if (!groups.has(key)) {
            groups.set(key, []);
        }
        groups.get(key).push(s);
    }

    return [...groups.values()];
}

Complexity Analysis

Approach 1 — Sorted String Key

	Complexity	Reasoning
Time	O(n · k log k)	For each of the `n` strings, sorting takes O(k log k) where `k` is the string length
Space	O(n · k)	The hashmap stores every original string. Total characters stored is proportional to `n · k`

Approach 2 — Frequency Tuple Key

	Complexity	Reasoning
Time	O(n · k)	For each of the `n` strings, counting characters takes O(k). Building the tuple takes O(26) = O(1)
Space	O(n · k)	Same as above — we still store all strings. The keys themselves are O(26) = O(1) each, so they do not change the asymptotic space

The bottom line: Both approaches use O(n · k) space. Approach 2 is strictly better in time — it eliminates the log k factor per string. For LeetCode's constraints the difference is imperceptible, but at scale (long strings, many strings) Approach 2 is the right engineering choice.

Follow-up Questions

1. What if strings can contain any Unicode character, not just lowercase English?

The 26-slot frequency array breaks immediately for Unicode since there are over a million possible code points. For the sort approach, nothing changes — sorting works on any character set. For the frequency approach, replace the fixed array with a Counter (Python) or a Map<char, int> (JavaScript/Java), then serialize the sorted entries of that map as the key. Time stays O(k) per string but the constant is larger.

2. What if strings are extremely long (e.g., 10,000+ characters)?

This is where the O(n · k log k) vs O(n · k) trade-off materializes in real systems. For genomics (DNA strings), log analysis, or document deduplication pipelines, eliminating the log k factor by using the frequency approach is a genuine win. If even O(k) is too slow, you could consider rolling hash techniques, but at that point you also need to handle hash collisions explicitly.

3. How would you handle streaming input where you do not have all strings upfront?

Use the same hashmap approach, but maintain the hashmap as a persistent data structure. Each new string is hashed and inserted into the appropriate group in O(k) time (frequency approach). The groups can be returned incrementally by iterating the hashmap at any point. This is exactly the architecture used in streaming log aggregation systems.

4. What if you need to keep the output groups sorted?

Sort the output lists after grouping, or use a SortedDict / TreeMap for the values. This adds an O(m log m) pass per group (where m is group size) on top of the main algorithm. It does not change the dominant term.

5. Can you do this without extra space — in O(1) additional space?

Not practically. Any correct solution must store the groupings somewhere. You can sort the input array in-place and then do a linear scan to collect contiguous anagram groups (comparing adjacent sorted forms), which avoids the hashmap but still uses O(k) temporary space for sorting each string. The time complexity becomes O(n · k log k) for the sort plus O(n log n) for sorting the array by sorted-string keys — a reasonable trade if space is critical.

This Pattern Solves

The insight — reduce each element to its canonical form, use that form as a hashmap key — appears in many problems:

Valid Anagram (LeetCode 242): Canonical form is the sorted string or frequency tuple of two strings; check if they match.
Find All Anagrams in a String (LeetCode 438): Sliding window over the target, comparing frequency arrays of the window vs the pattern.
Isomorphic Strings (LeetCode 205): Canonical form is the pattern of first appearances of each character.
Word Pattern (LeetCode 290): Same canonical-form idea applied to word-to-pattern mapping.
Minimum Window Substring (LeetCode 76): Frequency counting as the core comparison mechanism in a sliding window.

Any time you need to group or compare things that are "the same up to reordering," think: what is the canonical form, and how do I hash it?

Key Takeaway

Group Anagrams is fundamentally a lesson in hashmap key design: find the canonical form that all equivalent inputs share, make it hashable, and the grouping follows automatically. The sorted-string approach is intuitive and universally applicable; the frequency-tuple approach eliminates the log k factor and is strictly better when the character set is fixed and strings are long. Know both, understand the trade-off, and you will have a framework that solves an entire category of grouping and equivalence problems far beyond this one LeetCode question.