Valid Anagram — Frequency Array, HashMap & Sort [LeetCode 242]

Sanjeev SharmaSanjeev Sharma
12 min read

Advertisement

Problem Statement

Given two strings s and t, return true if t is an anagram of s, and false otherwise.

An anagram is a word or phrase formed by rearranging the letters of another, using all the original letters exactly once.

Example 1 (true):

Input:  s = "anagram", t = "nagaram"
Output: true

Example 2 (false):

Input:  s = "rat", t = "car"
Output: false

Constraints: 1 <= s.length, t.length <= 5 * 10^4. Both strings consist of lowercase English letters.

Why This Problem Matters

Valid Anagram (LC 242) appears in almost every FAANG warm-up interview for a simple reason: it compresses a deep idea into a tiny problem. Sorting works. A 26-slot counter works. A HashMap works. Each reflects a different trade-off between simplicity, speed, and generality — and the interviewer wants to hear you articulate those trade-offs rather than just produce a working answer.

The deeper insight is that character frequency is a hashable signature. Two strings are anagrams if and only if they produce identical frequency distributions. That single observation unlocks a whole family of problems:

  • Group Anagrams (LC 49): use the frequency vector or sorted string as a dictionary key.
  • Find All Anagrams in a String (LC 438): slide a window and compare frequency snapshots.
  • Minimum Window Substring (LC 76): track how many characters are still needed.

If you understand why frequency counting works here, you have a template that transfers directly to all of these.

Three Approaches — Trade-offs at a Glance

Approach 1 — Sort Both Strings

The most intuitive idea: sort s and t and check if the results are equal. Anagrams contain the same characters, so their sorted forms are identical.

sorted("anagram") == sorted("nagaram")"aaagmnr" == "aaagmnr"True
sorted("rat")     == sorted("car")"art"     == "acr"False

Why it works: sorting normalises the character order, turning the anagram check into a plain equality test.

Trade-off: Sorting is O(n log n) and — depending on the language — allocates O(n) extra space for the sorted copy. It is never the optimal answer, but it is perfectly acceptable as an opening move while you think out loud. Interviewers appreciate that you can explain why you then improve it.

Approach 2 — Fixed 26-Element Frequency Array

Because the constraint says lowercase English only, the alphabet has exactly 26 characters. Allocate an integer array of size 26 initialised to zero. Index character c as c - 'a' (or ord(c) - ord('a') in Python).

  • Iterate over s: increment freq[c - 'a'].
  • Iterate over t: decrement freq[c - 'a'].
  • If every slot is zero, the strings are anagrams; otherwise they are not.

Why it works: incrementing for s and decrementing for t turns the problem into checking whether net character usage is zero everywhere.

Trade-off: O(n) time, O(1) space (the 26-slot array is constant-size). This is the textbook answer for ASCII / lowercase-only inputs. The limitation — discussed in detail below — is that it breaks for Unicode.

Approach 3 — General HashMap

Replace the fixed array with a hash map (dict in Python, Map in JavaScript). Increment for each character in s, decrement for each character in t. At the end, every key must map to zero.

Trade-off: O(n) time, O(k) space where k is the number of distinct characters. This is the right answer when the input character set is unbounded (Unicode, emoji, arbitrary symbols). The constant-factor overhead of hashing is slightly higher than direct array indexing, but it is asymptotically equivalent.

The Frequency Array Insight

The 26-element array is worth understanding deeply because it recurs throughout string problems. Here is the exact mechanism:

index = char - 'a'

Since 'a' has ASCII code 97, subtracting it maps 'a'→0, 'b'→1, ..., 'z'→25. Every lowercase character maps to a unique slot in [0, 25].

Single-pass optimisation: instead of two separate loops, you can handle the length check as an O(1) early exit and then do one loop over s (incrementing) and one over t (decrementing). If any slot goes negative during the t loop, you can return false immediately — this is slightly faster in practice because it avoids completing the full traversal when a mismatch is found early.

Why the length check matters: if len(s) != len(t), the strings cannot be anagrams. This is a free O(1) check that eliminates a huge class of inputs before any character work begins. Always do it first.

Visual Dry Run — s = "anagram", t = "nagaram"

We trace the 26-element array (showing only active slots for readability).

After processing s = "anagram":

angrm
31111

Steps: a→3, n→1, g→1, r→1, a→3 (already counted), m→1, a→3 (already counted). Final: freq['a'-'a']=3, freq['g'-'a']=1, freq['m'-'a']=1, freq['n'-'a']=1, freq['r'-'a']=1.

After processing t = "nagaram" (decrement):

angrm
00000

Steps: n→0, a→2→1→0, g→0, a→0 (via two decrements for the two as in t), r→0, a→0, m→0.

Every slot is zero — the strings are anagrams. Return true.

Counter-example: s = "rat", t = "car"

After s: freq['r'-'a']=1, freq['a'-'a']=1, freq['t'-'a']=1. After t decrement: freq['c'-'a']=-1 (never incremented), immediate signal that these are not anagrams. Return false.

The Unicode Follow-up

Interviewers almost always ask: "What if the strings can contain Unicode characters?"

The 26-element array breaks because Unicode has over 140,000 code points. You cannot pre-allocate a fixed array for that. The fix is to use a full hash map:

  • In Python: collections.Counter(s) builds the frequency map in one line. Compare Counter(s) == Counter(t).
  • In JavaScript: use a Map keyed by character string.
  • In Java: HashMap<Character, Integer>.

The algorithm is identical — increment for s, decrement for t, check all values are zero — but now the keys are arbitrary characters rather than indices into a fixed array.

What the follow-up actually tests: Can you recognise when a hard-coded constraint (26 letters) baked into your data structure makes the solution brittle? The correct response is to explain the trade-off: "For ASCII-only inputs the fixed array gives O(1) space with no hashing overhead. For Unicode we pay O(k) space and a small hashing constant, but we gain generality."

This kind of reasoning — knowing not just how but when to use each structure — is what separates a good interview answer from a great one.

Connection to Group Anagrams (LC 49)

Group Anagrams asks you to take a list of strings and bucket them by anagram membership. The core insight is exactly what Valid Anagram teaches: the frequency distribution (or equivalently the sorted form) is a canonical key that identifies the anagram class.

  • Sort-based key: sorted(word) → use as dict key. O(n log n) per word.
  • Frequency-based key: build the 26-element count, convert to a tuple, use as dict key. O(n) per word — same O(1) extra space amortized.

If you nail LC 242, LC 49 is a one-line extension: instead of asking "do these two specific strings match?", you ask "what group does each string belong to?" The mapping from character frequencies to a canonical key is the same operation.

Common Mistakes

1. Skipping the length check. Comparing frequency maps of strings with different lengths will always return false, but the length check is free (O(1)) and immediately eliminates millions of inputs in a real application. Always do it first and explicitly, so the reviewer sees you understand the optimisation.

2. Wrong character indexing. A frequent bug is using char - 'A' (uppercase offset) for a problem that specifies lowercase, or mixing ord(c) (which gives 97 for 'a') with a zero-based index. Pin the formula in your head: index = ord(c) - ord('a') for lowercase, and double-check by substituting 'a' and 'z' to confirm the range [0, 25].

3. Assuming the 26-element array handles all inputs. The moment an interviewer says "Unicode" or "arbitrary characters" your array-based solution is wrong. Switch immediately to a hash map. Failing to flag this distinction signals that you memorised the solution rather than understood it.

4. Forgetting that negative counts signal non-anagram. When decrementing for t, any slot going negative means t contains a character more times than s does — that is a non-anagram. You can catch this immediately during the second loop and return early, rather than waiting to scan the whole array at the end. This is a micro-optimisation worth mentioning in an interview.

Solutions

Python

# Approach 1 — Sort (O(n log n) time, O(n) space)
def isAnagram_sort(s: str, t: str) -> bool:
    # Different lengths → can never be anagrams
    if len(s) != len(t):
        return False
    return sorted(s) == sorted(t)


# Approach 2 — 26-element frequency array (O(n) time, O(1) space)
def isAnagram_array(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False

    freq = [0] * 26  # index i maps to character chr(i + ord('a'))

    for c in s:
        freq[ord(c) - ord('a')] += 1  # increment for every char in s
    for c in t:
        freq[ord(c) - ord('a')] -= 1  # decrement for every char in t

    # Any non-zero slot means the distributions differ
    return all(f == 0 for f in freq)


# Approach 3 — HashMap / Counter (O(n) time, O(k) space, handles Unicode)
from collections import Counter

def isAnagram_hashmap(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False
    # Counter builds a frequency dict; two anagrams produce equal Counters
    return Counter(s) == Counter(t)

JavaScript

// Approach 1 — Sort (O(n log n) time, O(n) space)
function isAnagram_sort(s, t) {
  // Different lengths → immediate rejection
  if (s.length !== t.length) return false;
  return [...s].sort().join('') === [...t].sort().join('');
}

// Approach 2 — 26-element frequency array (O(n) time, O(1) space)
function isAnagram_array(s, t) {
  if (s.length !== t.length) return false;

  const freq = new Array(26).fill(0); // index = charCode - 97

  for (const c of s) freq[c.charCodeAt(0) - 97]++;  // increment for s
  for (const c of t) freq[c.charCodeAt(0) - 97]--;  // decrement for t

  // All slots must be zero for an anagram
  return freq.every(f => f === 0);
}

// Approach 3 — HashMap (O(n) time, O(k) space, handles Unicode)
function isAnagram_hashmap(s, t) {
  if (s.length !== t.length) return false;

  const map = new Map();

  // Increment for each character in s
  for (const c of s) map.set(c, (map.get(c) ?? 0) + 1);

  // Decrement for each character in t
  for (const c of t) {
    if (!map.has(c)) return false;          // t has a char s never had
    const count = map.get(c) - 1;
    if (count < 0) return false;            // t over-uses this character
    if (count === 0) map.delete(c);         // clean up zero entries
    else map.set(c, count);
  }

  return map.size === 0; // any remaining entries mean s had extras
}

Complexity Analysis

ApproachTimeSpaceUnicode-safe?
SortO(n log n)O(n)Yes
26-element arrayO(n)O(1)No — lowercase only
HashMapO(n)O(k)Yes

n = length of the strings (equal after the length check). k = number of distinct characters (at most 26 for lowercase English, unbounded for Unicode).

The 26-element array's "O(1) space" is not a cheat — iterating over a fixed 26-element array is a constant-time operation regardless of input size, so space truly does not grow with n.

Follow-up Questions

Q: What if the strings contain Unicode characters? Switch from the 26-element array to a full hash map. The algorithm stays identical; only the data structure changes. See the Unicode section above.

Q: How does this generalise to Group Anagrams (LC 49)? Use the frequency tuple (or sorted string) as a dictionary key. Words with the same key belong to the same anagram group. LC 242 is the single-pair version of the same insight.

Q: How would you handle a stream of characters where you cannot store the whole string? Maintain a running frequency map. Each new character increments its count; if you need to check anagram status at any point, you compare the current map against the reference. This is the foundation of the sliding window in Find All Anagrams in a String (LC 438).

This Pattern Solves

The character-frequency signature pattern applies directly to:

  • Find All Anagrams in a String (LC 438) — sliding window with a frequency snapshot
  • Minimum Window Substring (LC 76) — track character deficits with a frequency map
  • Permutation in String (LC 567) — check if any permutation of s is a substring of t
  • Ransom Note (LC 383) — can you form one string's characters from another's?
  • Longest Substring Without Repeating Characters (LC 3) — frequency map to detect duplicates

Any time a problem asks about character membership or rearrangement, your first mental model should be a frequency map.

Key Takeaway

Two strings are anagrams if and only if they share the same character frequency distribution — and a 26-element integer array is all the space you need to represent that distribution for lowercase English. Always lead with the O(1) length check, follow with the O(n) frequency scan, and be ready to swap the array for a hash map the moment the interviewer introduces Unicode. The same insight that solves this easy problem in O(n) time is the exact kernel reused across Group Anagrams, Find All Anagrams in a String, and the broader family of sliding-window character-frequency problems.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro