How to cluster discrete data
Mostra commenti meno recenti
Hi!
I have a database containing discrete features. For example, number of hairpinloops, number of elements, length of a sequence, the % of A nucleotides. Now I would like to apply some clustering algorithms. Does anyone know which algorithms in matlab are suited for discrete data?
Thanks a lot, Iene
Risposte (1)
Purvaja
il 5 Feb 2025
There are various ways to obtain clusters. You can refer the following methods:
- K-Means clustering: The function “k-means" partitions data into k mutually exclusive clusters and returns the index of the cluster to which it assigns each observation. Requires number of clusters. (https://www.mathworks.com/help/stats/k-means-clustering.html )
[idx, C] = kmeans(data, k); % k is the number of clusters
- K-medoids Clustering: “K-medoids” is like “K-means” but is more robust to noise and outliers. Requires number of clusters too. (https://www.mathworks.com/help/stats/kmedoids.html)
[idx, C] = kmedoids(data, k); % k is the number of clusters
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Unlike “k-means” clustering, the ”DBSCAN” algorithm does not require prior knowledge of the number of clusters. It works with distance metrics and can be applied to discrete data.(https://www.mathworks.com/help/stats/dbscan-clustering.html)
epsilon = 0.5; % Distance threshold
minPts = 5; % Minimum number of points to form a cluster
idx = dbscan(data, epsilon, minPts);
- Gaussian Mixture Models (GMM): “GMM” clustering can accommodate clusters that have different sizes and correlation structures within them.(https://www.mathworks.com/help/stats/clustering-using-gaussian-mixture-models.html)
gm = fitgmdist(data, k); % k is the number of clusters
idx = cluster(gm, data);
To check out more methods, you can refer to the following resource:
You can also access release-specific documentation using these commands in your MATLAB command window:
web(fullfile(docroot, 'stats/k-means-clustering.html'))
web(fullfile(docroot, 'stats/kmedoids.html'))
web(fullfile(docroot, 'stats/dbscan-clustering.html'))
web(fullfile(docroot, 'stats/clustering-using-gaussian-mixture-models.html'))
Hope this helps you!
Categorie
Scopri di più su Statistics and Machine Learning Toolbox in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!