Problem 79. DNA N-Gram Distribution

MathWorks Cody Team

1K solvers

7 likes

Solve

Solve Later

Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
 s = 'AACTGAACG'
and
 n = 3
we get the following n-grams (trigrams):
 AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.

Solve

Solution Stats

2419 Solutions

1375 Solvers

Last Solution submitted on Feb 04, 2026

Last 200 Solutions

Problem Comments

1 Comment

E Chang on 22 Oct 2018

It should be noted that spaces should be ignored or else test suites 3 and 5 fail.

Solution Comments

Show comments

Problem Recent Solvers1375

Suggested Problems

Determine whether a vector is monotonically increasing

22903 Solvers
Make a Palindrome Number

2466 Solvers
Project Euler: Problem 1, Multiples of 3 and 5

3675 Solvers
Back to basics 19 - character types

273 Solvers
Compute Fibonacci Number

518 Solvers

More from this Author96

Remove the small words from a list of words.

1560 Solvers
Nearest Numbers

5035 Solvers
Find all elements less than 0 or greater than 10 and replace them with NaN

15782 Solvers
Find the two most distant points

2952 Solvers
Most nonzero elements in row

7467 Solvers

Problem Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!