Aho-Corasick — Multi-Pattern String Matching

Sanjeev SharmaSanjeev Sharma
2 min read

Advertisement

Aho-Corasick Overview

Build a trie of all patterns. Add failure links (like KMP failure function but on the trie). Then stream the text through the automaton — each character transition is O(1) amortised.


Solution

Python

from collections import deque

class AhoCorasick:
    def __init__(self, patterns):
        self.goto = [{}]
        self.fail = [0]
        self.output = [[]]
        for i, p in enumerate(patterns):
            self._insert(p, i)
        self._build_fail()

    def _insert(self, pattern, idx):
        node = 0
        for c in pattern:
            if c not in self.goto[node]:
                self.goto[node][c] = len(self.goto)
                self.goto.append({})
                self.fail.append(0)
                self.output.append([])
            node = self.goto[node][c]
        self.output[node].append(idx)

    def _build_fail(self):
        q = deque()
        for c, s in self.goto[0].items():
            self.fail[s] = 0
            q.append(s)
        while q:
            r = q.popleft()
            for c, s in self.goto[r].items():
                q.append(s)
                state = self.fail[r]
                while state and c not in self.goto[state]:
                    state = self.fail[state]
                self.fail[s] = self.goto[state].get(c, 0)
                if self.fail[s] == s: self.fail[s] = 0
                self.output[s] += self.output[self.fail[s]]

    def search(self, text):
        state = 0
        results = []
        for i, c in enumerate(text):
            while state and c not in self.goto[state]:
                state = self.fail[state]
            state = self.goto[state].get(c, 0)
            for pattern_idx in self.output[state]:
                results.append((i, pattern_idx))
        return results

JavaScript

/* Aho-Corasick: trie with BFS failure links, text streaming */

Java

/* Java: integer-indexed trie nodes, BFS failure link construction */

C++

/* C++: vector<array<int,26>> trie + fail[] + output lists */

C

/* C: static 2D trie array, BFS failure links, linear search pass */

Complexity

PhaseTime
Build trieO(sum of pattern lengths)
Build failure linksO(sum of pattern lengths)
Search textO(n + number of matches)
TotalO(n + m + z)

Where n=text length, m=total pattern length, z=matches found.

Key advantage over KMP: matches ALL patterns simultaneously.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro