DNA N-Gram Distribution - MATLAB Cody - MATLAB Central

Problem 79. DNA N-Gram Distribution

Difficulty:Rate
Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.

Solution Stats

63.32% Correct | 36.68% Incorrect
Last Solution submitted on May 09, 2025

Problem Comments

Solution Comments

Show comments
PIVlab surpasses 100K all-time File Exchange downloads
During the past twelve months, PIVlab, a MATLAB Community Toolbox for particle...
4
8

Problem Recent Solvers1358

Suggested Problems

More from this Author96

Problem Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!