Longest Duplicate Substring [Hard] — Binary Search + Rolling Hash
Advertisement
Problem Statement
Given a string s, return any longest substring that appears at least twice. Empty string if none.
Input: "banana" → Output: "ana"
Intuition
Binary search on the answer length L. For each L, use Rabin-Karp rolling hash to check if any substring of length L repeats.
Solutions
Python
def longestDupSubstring(s: str) -> str:
n = len(s)
nums = [ord(c) - ord('a') for c in s]
MOD = (1 << 61) - 1 # Mersenne prime
BASE = 31
def search(length):
h = 0
power = pow(BASE, length, MOD)
seen = set()
for i in range(length):
h = (h * BASE + nums[i]) % MOD
seen.add(h)
for i in range(1, n - length + 1):
h = (h * BASE - nums[i-1] * power + nums[i+length-1]) % MOD
if h in seen:
return i
seen.add(h)
return -1
lo, hi, start = 1, n-1, -1
while lo <= hi:
mid = (lo + hi) // 2
idx = search(mid)
if idx != -1:
start = idx
best_len = mid
lo = mid + 1
else:
hi = mid - 1
return '' if start == -1 else s[start:start+best_len]
Complexity
- Time: O(n log n) average
- Space: O(n)
Note
Suffix arrays give O(n log n) worst-case guaranteed.
Advertisement