LeetCode-Repeated DNA Sequence

LiBlog發表於2016-08-31

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].

Analysis:

We want to code a 10-letter-long substring into a integer, to perform hashset add and check for duplication.

Since each letter only has 4 cases: A,C,G,T, we can use 2-bit to represent it. Therefore, we can use a 20-bits integer to represent the substring.

Solution:

public class Solution {
    // Use mask to only maintain the last 20 bits.
    int mask = (1 << 20) - 1;

    public List<String> findRepeatedDnaSequences(String s) {
        List<String> resList = new ArrayList<String>();
        if (s.length() < 10)
            return resList;

        HashSet<Integer> codeSet = new HashSet<Integer>();
        HashSet<Integer> resSet = new HashSet<Integer>();
        char[] charArray = s.toCharArray();

        // Get code of the first 9 letters.
        int code = 0;
        for (int i = 0; i < 9; i++) {
            code = moveCode(code, charArray[i]);
        }

        for (int i = 9; i < s.length(); i++) {
            // Get code.
            code = moveCode(code, charArray[i]);
            // if current code has existed and have not appeared twice (i.e.,
            // not added to resList), then add it into resList.
            if (!codeSet.add(code) && resSet.add(code)) {
                resList.add(s.substring(i - 9, i + 1));
            }
        }
        return resList;
    }

    public int moveCode(int value, char c) {
        value <<= 2;
        // if (c=='A') value += 0;
        if (c == 'C')  value += 1;
        if (c == 'G')  value += 2;
        if (c == 'T')  value += 3;
        value &= mask;
        return value;
    }

}

 

相關文章