String Algorithms — Complete Guide (KMP, Z, Rolling Hash, Suffix Array)

Sanjeev SharmaSanjeev Sharma
2 min read

Advertisement

Why String Algorithms?

Naive pattern matching is O(nm). String algorithms exploit structure to achieve O(n+m) or O(n log n).


The 6 Core Patterns

Pattern 1 — KMP (Knuth-Morris-Pratt)

Build a failure function: for each position in the pattern, the length of the longest proper prefix that is also a suffix. Skip unnecessary comparisons during search.

Pattern 2 — Z-Algorithm

z[i] = length of the longest substring starting at s[i] that matches a prefix of s. Linear time, all prefix occurrences.

Pattern 3 — Rabin-Karp (Rolling Hash)

Hash each window. On mismatch, roll the hash in O(1). O(n+m) expected, useful for multiple pattern search.

Pattern 4 — Suffix Array

Sort all suffixes lexicographically. Enables O(m log n) substring search, LCP queries, and unique substring counting.

Pattern 5 — Manacher's Algorithm

Find the longest palindromic substring in O(n) by reusing previously computed palindrome centres.

Pattern 6 — String Hashing

Map strings to integers. Compare substrings in O(1) after O(n) preprocessing. Polynomial hash: h[i] = s[i] + p*h[i-1].


Complexity Reference

AlgorithmPre-processSearchSpace
NaiveO(1)O(nm)O(1)
KMPO(m)O(n)O(m)
Z-algorithmO(n+m)O(n+m)
Rabin-KarpO(m)O(n) avgO(1)
Suffix ArrayO(n log n)O(m log n)O(n)
Manacher'sO(n)O(n)

Rolling Hash Template

MOD = 10**9+7
BASE = 31

def build_hash(s):
    n = len(s)
    h = [0]*(n+1)
    p = [1]*(n+1)
    for i in range(n):
        h[i+1] = (h[i]*BASE + ord(s[i])-ord('a')+1) % MOD
        p[i+1] = p[i]*BASE % MOD
    def get(l, r):
        return (h[r+1] - h[l]*p[r-l+1]) % MOD
    return get

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro