Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
good use of 'hankel' function
cool solution
Sorry about this, but I got stuck and I want to learn how to do it. After looking at several solutions, I found my mistake and was able to create my own solution :)
What happens if the test suite changed in the future?
This solution is not correct in general, as the way of using hankel here, generates n-1 fake fragments
Clever usage of the Hankel matrix. I don't automatically think of the Hankel for this application, but it really works well. Thanks - I've learned something
What's the point of a 'solution' like this? It passes the test suite, but in what way was it interesting for you to write it?
603 Solvers
Project Euler: Problem 1, Multiples of 3 and 5
1087 Solvers
371 Solvers
Getting the absolute index from a matrix
179 Solvers
2423 Solvers