wrong values in histogram plotting
    5 visualizzazioni (ultimi 30 giorni)
  
       Mostra commenti meno recenti
    
    Elinor Ginzburg
 il 27 Dic 2023
  
    
    
    
    
    Commentato: Elinor Ginzburg
 il 27 Dic 2023
            Hello,
I'm trying to plot a histogram of an array. I have a csv file with a list of double values, and I want to see how many elements have a value that is less or equal to 10% of the maximal value, 20%, 30% and etc.. I tried using the following code, but I get wrong statistics, when I check how many elements have a lesser or equal value to 10% of the maximal element, I see that there are 11173940 such elements. I did so by using the following code:
maxElement = max(array);
elementCount = sum(array < maxElement * 0.1);
when I print the histogram it shows like there are less than 180 elements that constitute this condition. this is the code I used (I have a lot of csv files that I want to read and analyze in the same manner, that's why the filename loop):
clear; clc;
dataDir = 'hist_res_rel';
fileList = dir(strcat(dataDir, '/*.csv'));
plotDir = 'plot_dir_rel';
for i = 1:numel(fileList)
    fileName = fileList(i).name;
    epoch = fileName(length(fileName)-5:length(fileName)-4);
    if contains(fileName,'a_rel')
        plot_title = strcat('A Realtive Value Change Between Epochs: ', epoch, '-', num2str(str2double(epoch)+10));
    end
    if contains(fileName,'b_rel')
        plot_title = strcat('B Realtive Value Change Between Epochs: ', epoch, '-', num2str(str2double(epoch)+10));
    end
    rel_val = readmatrix(strcat(dataDir, fileName));
    rel_val = abs(rel_val);
    Max = max(rel_val);
    p = 0.1;
    x = zeros(10, 1);
    y = zeros(10, 1);
    for index = 1:10
        percentage = Max * p;
        x(index) = percentage;
        if index == 1
            y(index) = sum(rel_val <= x(index));
        else
            y(index) = sum(rel_val <= x(index) & rel_val > x(index-1));
        end
        p = p + 0.1;
    end
    f = histogram(rel_val, x);
    xticks(x);
    title(plot_title);
    xlabel('Percantage of Relative Change');
    ylabel('Amount of Parameters');
    xticklabels({'0', '10','20','30', '40', '50', '60', '70', '80', '90', '100'});
    saveas(f, strcat(plotDir, '/plot_', fileName(1:length(fileName)-3), '.jpg'));
end
this is the histogram that I get:

and this is the csv file that I'm trying to analyze just to make sure everything works (sorry, it's so large I had to use an external site for the upload):
Thank you so much for your time and attention, I appreciate your help.
0 Commenti
Risposta accettata
  Ganesh
      
 il 27 Dic 2023
        I understand that your histogram is inconsistent with the data you have. The issue you are facing can be easily resolved by adding 0 at the start of the variable "x". 
When using a histogram, the histogram calculates the number of data points between edges. As your variable "x" begins with Max*0.1, the histogram plots interval between Max*0.1 and Max*0.2 and so on. By adding 0 at the start you can make the first edge to be 0, Max*0.1, which will give you the right result. 
x = [0;x] % Add this line before plotting the histogram
Kindly refer to the following document for more information and examples on using the "histogram()" function:
Hope this helps!
Più risposte (0)
Vedere anche
Categorie
				Scopri di più su Histograms in Help Center e File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!