Valid Anagram — Frequency Array, HashMap & Sort [LeetCode 242]
Advertisement
Problem Statement
Given two strings s and t, return true if t is an anagram of s, and false otherwise.
An anagram is a word or phrase formed by rearranging the letters of another, using all the original letters exactly once.
Example 1 (true):
Input: s = "anagram", t = "nagaram"
Output: true
Example 2 (false):
Input: s = "rat", t = "car"
Output: false
Constraints: 1 <= s.length, t.length <= 5 * 10^4. Both strings consist of lowercase English letters.
- Why This Problem Matters
- Three Approaches — Trade-offs at a Glance
- Approach 1 — Sort Both Strings
- Approach 2 — Fixed 26-Element Frequency Array
- Approach 3 — General HashMap
- The Frequency Array Insight
- Visual Dry Run — s = "anagram", t = "nagaram"
- The Unicode Follow-up
- Connection to Group Anagrams (LC 49)
- Common Mistakes
- Solutions
- Python
- JavaScript
- Complexity Analysis
- Follow-up Questions
- This Pattern Solves
- Key Takeaway
Why This Problem Matters
Valid Anagram (LC 242) appears in almost every FAANG warm-up interview for a simple reason: it compresses a deep idea into a tiny problem. Sorting works. A 26-slot counter works. A HashMap works. Each reflects a different trade-off between simplicity, speed, and generality — and the interviewer wants to hear you articulate those trade-offs rather than just produce a working answer.
The deeper insight is that character frequency is a hashable signature. Two strings are anagrams if and only if they produce identical frequency distributions. That single observation unlocks a whole family of problems:
- Group Anagrams (LC 49): use the frequency vector or sorted string as a dictionary key.
- Find All Anagrams in a String (LC 438): slide a window and compare frequency snapshots.
- Minimum Window Substring (LC 76): track how many characters are still needed.
If you understand why frequency counting works here, you have a template that transfers directly to all of these.
Three Approaches — Trade-offs at a Glance
Approach 1 — Sort Both Strings
The most intuitive idea: sort s and t and check if the results are equal. Anagrams contain the same characters, so their sorted forms are identical.
sorted("anagram") == sorted("nagaram") → "aaagmnr" == "aaagmnr" → True
sorted("rat") == sorted("car") → "art" == "acr" → False
Why it works: sorting normalises the character order, turning the anagram check into a plain equality test.
Trade-off: Sorting is O(n log n) and — depending on the language — allocates O(n) extra space for the sorted copy. It is never the optimal answer, but it is perfectly acceptable as an opening move while you think out loud. Interviewers appreciate that you can explain why you then improve it.
Approach 2 — Fixed 26-Element Frequency Array
Because the constraint says lowercase English only, the alphabet has exactly 26 characters. Allocate an integer array of size 26 initialised to zero. Index character c as c - 'a' (or ord(c) - ord('a') in Python).
- Iterate over
s: incrementfreq[c - 'a']. - Iterate over
t: decrementfreq[c - 'a']. - If every slot is zero, the strings are anagrams; otherwise they are not.
Why it works: incrementing for s and decrementing for t turns the problem into checking whether net character usage is zero everywhere.
Trade-off: O(n) time, O(1) space (the 26-slot array is constant-size). This is the textbook answer for ASCII / lowercase-only inputs. The limitation — discussed in detail below — is that it breaks for Unicode.
Approach 3 — General HashMap
Replace the fixed array with a hash map (dict in Python, Map in JavaScript). Increment for each character in s, decrement for each character in t. At the end, every key must map to zero.
Trade-off: O(n) time, O(k) space where k is the number of distinct characters. This is the right answer when the input character set is unbounded (Unicode, emoji, arbitrary symbols). The constant-factor overhead of hashing is slightly higher than direct array indexing, but it is asymptotically equivalent.
The Frequency Array Insight
The 26-element array is worth understanding deeply because it recurs throughout string problems. Here is the exact mechanism:
index = char - 'a'
Since 'a' has ASCII code 97, subtracting it maps 'a'→0, 'b'→1, ..., 'z'→25. Every lowercase character maps to a unique slot in [0, 25].
Single-pass optimisation: instead of two separate loops, you can handle the length check as an O(1) early exit and then do one loop over s (incrementing) and one over t (decrementing). If any slot goes negative during the t loop, you can return false immediately — this is slightly faster in practice because it avoids completing the full traversal when a mismatch is found early.
Why the length check matters: if len(s) != len(t), the strings cannot be anagrams. This is a free O(1) check that eliminates a huge class of inputs before any character work begins. Always do it first.
Visual Dry Run — s = "anagram", t = "nagaram"
We trace the 26-element array (showing only active slots for readability).
After processing s = "anagram":
| a | n | g | r | m |
|---|---|---|---|---|
| 3 | 1 | 1 | 1 | 1 |
Steps: a→3, n→1, g→1, r→1, a→3 (already counted), m→1, a→3 (already counted). Final: freq['a'-'a']=3, freq['g'-'a']=1, freq['m'-'a']=1, freq['n'-'a']=1, freq['r'-'a']=1.
After processing t = "nagaram" (decrement):
| a | n | g | r | m |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
Steps: n→0, a→2→1→0, g→0, a→0 (via two decrements for the two as in t), r→0, a→0, m→0.
Every slot is zero — the strings are anagrams. Return true.
Counter-example: s = "rat", t = "car"
After s: freq['r'-'a']=1, freq['a'-'a']=1, freq['t'-'a']=1. After t decrement: freq['c'-'a']=-1 (never incremented), immediate signal that these are not anagrams. Return false.
The Unicode Follow-up
Interviewers almost always ask: "What if the strings can contain Unicode characters?"
The 26-element array breaks because Unicode has over 140,000 code points. You cannot pre-allocate a fixed array for that. The fix is to use a full hash map:
- In Python:
collections.Counter(s)builds the frequency map in one line. CompareCounter(s) == Counter(t). - In JavaScript: use a
Mapkeyed by character string. - In Java:
HashMap<Character, Integer>.
The algorithm is identical — increment for s, decrement for t, check all values are zero — but now the keys are arbitrary characters rather than indices into a fixed array.
What the follow-up actually tests: Can you recognise when a hard-coded constraint (26 letters) baked into your data structure makes the solution brittle? The correct response is to explain the trade-off: "For ASCII-only inputs the fixed array gives O(1) space with no hashing overhead. For Unicode we pay O(k) space and a small hashing constant, but we gain generality."
This kind of reasoning — knowing not just how but when to use each structure — is what separates a good interview answer from a great one.
Connection to Group Anagrams (LC 49)
Group Anagrams asks you to take a list of strings and bucket them by anagram membership. The core insight is exactly what Valid Anagram teaches: the frequency distribution (or equivalently the sorted form) is a canonical key that identifies the anagram class.
- Sort-based key:
sorted(word)→ use as dict key. O(n log n) per word. - Frequency-based key: build the 26-element count, convert to a tuple, use as dict key. O(n) per word — same O(1) extra space amortized.
If you nail LC 242, LC 49 is a one-line extension: instead of asking "do these two specific strings match?", you ask "what group does each string belong to?" The mapping from character frequencies to a canonical key is the same operation.
Common Mistakes
1. Skipping the length check. Comparing frequency maps of strings with different lengths will always return false, but the length check is free (O(1)) and immediately eliminates millions of inputs in a real application. Always do it first and explicitly, so the reviewer sees you understand the optimisation.
2. Wrong character indexing. A frequent bug is using char - 'A' (uppercase offset) for a problem that specifies lowercase, or mixing ord(c) (which gives 97 for 'a') with a zero-based index. Pin the formula in your head: index = ord(c) - ord('a') for lowercase, and double-check by substituting 'a' and 'z' to confirm the range [0, 25].
3. Assuming the 26-element array handles all inputs. The moment an interviewer says "Unicode" or "arbitrary characters" your array-based solution is wrong. Switch immediately to a hash map. Failing to flag this distinction signals that you memorised the solution rather than understood it.
4. Forgetting that negative counts signal non-anagram. When decrementing for t, any slot going negative means t contains a character more times than s does — that is a non-anagram. You can catch this immediately during the second loop and return early, rather than waiting to scan the whole array at the end. This is a micro-optimisation worth mentioning in an interview.
Solutions
Python
# Approach 1 — Sort (O(n log n) time, O(n) space)
def isAnagram_sort(s: str, t: str) -> bool:
# Different lengths → can never be anagrams
if len(s) != len(t):
return False
return sorted(s) == sorted(t)
# Approach 2 — 26-element frequency array (O(n) time, O(1) space)
def isAnagram_array(s: str, t: str) -> bool:
if len(s) != len(t):
return False
freq = [0] * 26 # index i maps to character chr(i + ord('a'))
for c in s:
freq[ord(c) - ord('a')] += 1 # increment for every char in s
for c in t:
freq[ord(c) - ord('a')] -= 1 # decrement for every char in t
# Any non-zero slot means the distributions differ
return all(f == 0 for f in freq)
# Approach 3 — HashMap / Counter (O(n) time, O(k) space, handles Unicode)
from collections import Counter
def isAnagram_hashmap(s: str, t: str) -> bool:
if len(s) != len(t):
return False
# Counter builds a frequency dict; two anagrams produce equal Counters
return Counter(s) == Counter(t)
JavaScript
// Approach 1 — Sort (O(n log n) time, O(n) space)
function isAnagram_sort(s, t) {
// Different lengths → immediate rejection
if (s.length !== t.length) return false;
return [...s].sort().join('') === [...t].sort().join('');
}
// Approach 2 — 26-element frequency array (O(n) time, O(1) space)
function isAnagram_array(s, t) {
if (s.length !== t.length) return false;
const freq = new Array(26).fill(0); // index = charCode - 97
for (const c of s) freq[c.charCodeAt(0) - 97]++; // increment for s
for (const c of t) freq[c.charCodeAt(0) - 97]--; // decrement for t
// All slots must be zero for an anagram
return freq.every(f => f === 0);
}
// Approach 3 — HashMap (O(n) time, O(k) space, handles Unicode)
function isAnagram_hashmap(s, t) {
if (s.length !== t.length) return false;
const map = new Map();
// Increment for each character in s
for (const c of s) map.set(c, (map.get(c) ?? 0) + 1);
// Decrement for each character in t
for (const c of t) {
if (!map.has(c)) return false; // t has a char s never had
const count = map.get(c) - 1;
if (count < 0) return false; // t over-uses this character
if (count === 0) map.delete(c); // clean up zero entries
else map.set(c, count);
}
return map.size === 0; // any remaining entries mean s had extras
}
Complexity Analysis
| Approach | Time | Space | Unicode-safe? |
|---|---|---|---|
| Sort | O(n log n) | O(n) | Yes |
| 26-element array | O(n) | O(1) | No — lowercase only |
| HashMap | O(n) | O(k) | Yes |
n = length of the strings (equal after the length check). k = number of distinct characters (at most 26 for lowercase English, unbounded for Unicode).
The 26-element array's "O(1) space" is not a cheat — iterating over a fixed 26-element array is a constant-time operation regardless of input size, so space truly does not grow with n.
Follow-up Questions
Q: What if the strings contain Unicode characters? Switch from the 26-element array to a full hash map. The algorithm stays identical; only the data structure changes. See the Unicode section above.
Q: How does this generalise to Group Anagrams (LC 49)? Use the frequency tuple (or sorted string) as a dictionary key. Words with the same key belong to the same anagram group. LC 242 is the single-pair version of the same insight.
Q: How would you handle a stream of characters where you cannot store the whole string? Maintain a running frequency map. Each new character increments its count; if you need to check anagram status at any point, you compare the current map against the reference. This is the foundation of the sliding window in Find All Anagrams in a String (LC 438).
This Pattern Solves
The character-frequency signature pattern applies directly to:
- Find All Anagrams in a String (LC 438) — sliding window with a frequency snapshot
- Minimum Window Substring (LC 76) — track character deficits with a frequency map
- Permutation in String (LC 567) — check if any permutation of
sis a substring oft - Ransom Note (LC 383) — can you form one string's characters from another's?
- Longest Substring Without Repeating Characters (LC 3) — frequency map to detect duplicates
Any time a problem asks about character membership or rearrangement, your first mental model should be a frequency map.
Key Takeaway
Two strings are anagrams if and only if they share the same character frequency distribution — and a 26-element integer array is all the space you need to represent that distribution for lowercase English. Always lead with the O(1) length check, follow with the O(n) frequency scan, and be ready to swap the array for a hash map the moment the interviewer introduces Unicode. The same insight that solves this easy problem in O(n) time is the exact kernel reused across Group Anagrams, Find All Anagrams in a String, and the broader family of sliding-window character-frequency problems.
Advertisement