Repeated DNA Sequences — Rolling Hash

Sanjeev SharmaSanjeev Sharma
1 min read

Advertisement

Problem

Find all 10-letter DNA substrings that appear more than once.


Approach — Rolling Hash with Bit Ops

Encode each nucleotide as 2 bits (A=0, C=1, G=2, T=3). Use 20-bit window, roll hash by shifting.

Time: O(n) | Space: O(n)


Solutions

Python — Set approach

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> list[str]:
        seen=set(); result=set()
        for i in range(len(s)-9):
            sub=s[i:i+10]
            if sub in seen: result.add(sub)
            seen.add(sub)
        return list(result)

Python — Rolling hash

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> list[str]:
        enc={'A':0,'C':1,'G':2,'T':3}
        seen=set(); result=set(); h=0; mask=(1<<20)-1
        for i,c in enumerate(s):
            h=((h<<2)|enc[c])&mask
            if i>=9:
                if h in seen: result.add(s[i-9:i+1])
                seen.add(h)
        return list(result)

Complexity

  • Time: O(n) | Space: O(n)

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro