Cody

Problem 79. DNA N-Gram Distribution

Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.

So for

 s = 'AACTGAACG' 

and

 n = 3 

we get the following n-grams (trigrams):

 AAC, ACT, CTG, TGA, GAA, AAC, ACG

Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.

This problem was originally inspired by a MATLAB Newsgroup discussion.

Solution Stats

49.65% Correct | 50.35% Incorrect
Last Solution submitted on Nov 18, 2019

Problem Comments

Solution Comments