Finding the percentage of NaN cells in columns

Question

BN il 15 Gen 2020

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/500425-finding-the-percentage-of-nan-cells-in-columns

Commentato: Star Strider il 15 Gen 2020

Risposta accettata: Star Strider

Abadan.xlsx

I have an excel file. I want to know how much (%) of my data are NaN.

I want to found that for columns 9 to 12 separately, and have the results something like this:

I attached my excel file. this is just a sample and I have more than 125 excel files which I gonna do this for all. any help is appreciated.

thank you all

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Star Strider il 15 Gen 2020

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/500425-finding-the-percentage-of-nan-cells-in-columns#answer_410251

Apri in MATLAB Online

This does everything for all the tables in ‘C’:

D = load('C.mat');
C = D.C;
for k = 1:numel(C)
    C{k}(:,1:4) = fillmissing(C{k}(:,1:4),'nearest');                           % Fills Missing Data
    filename = C{k}{1,2};                                                       % Input Table Name
    fn{k,:} = sprintf('%s.xlsx',filename{:});                                   % Output File Name
    writetable(C{k},sprintf('%s.xlsx',filename{:}))                             % Write Each Table To Separate File
    rowlen{k,:} = size(C{k}(:,9:12),1);                                         % Row Lengtth Of Each Table
    NaN912{k,:} = [varfun(@isnan,C{k}(:,9:12))];                                % Number Of ‘NaN’ Values In Each Table
    PctNaN(k,:) = table2array(varfun(@(x)sum(x)./size(x,1),NaN912{k,:}));       % Percent ‘NaN’ Values In Selected Variables (Columns)
end
VarNms = compose('NAN in %s', string(C{1}.Properties.VariableNames(:,9:12)));   % Variable Names For ‘NaNPercent’ Table
filnam = cell2table(fn,'VariableNames',{'File name'});                          % File Names Table
NaNPercent = array2table(PctNaN, 'VariableNames',VarNms);                       % ‘PctNaN’ Initial Table
NaNPercent = [filnam NaNPercent];                                               % ‘PctNaN’ Final (Output) Table
FirstTen = NaNPercent(1:10,:)                                                   % Display Sample (Delete)

Producing:

FirstTen =
  10×5 table
           File name           NAN in tmax_m    NAN in tmin_m    NAN in rrr24    NAN in tm_m
    _______________________    _____________    _____________    ____________    ___________
    {'Abadan.xlsx'        }      0.088235           0.1299          0.13971         0.1152  
    {'Abadeh.xlsx'        }       0.10539          0.13235           0.1299        0.19118  
    {'Abali.xlsx'         }       0.12162          0.12162          0.15315        0.15766  
    {'Abumusa Island.xlsx'}        0.1464          0.14865          0.11486        0.16892  
    {'Ahar.xlsx'          }       0.21171           0.2027          0.21396        0.24099  
    {'Ahvaz.xlsx'         }      0.051471         0.058824         0.058824       0.056373  
    {'Aligudarz.xlsx'     }        0.1982          0.22523          0.23198        0.23874  
    {'Anar.xlsx'          }       0.22297          0.18919          0.19369        0.23649  
    {'Arak.xlsx'          }      0.068627         0.078431         0.088235       0.093137  
    {'Ardakan (Yazd).xlsx'}       0.21114          0.20882          0.34339         0.2065  

The code retains the original data in ‘rowlen’ and ‘NaN912’ as well as the percent NaN results in the ‘NaNPercent’ table.

2 Commenti
Mostra NessunoNascondi Nessuno

BN il 15 Gen 2020

Apri in MATLAB Online

Dear Star Strider

Really thank you, You identified what I want. I don't know why after running for loop part of your code successfully, the second part gives me an error:

Error using cell2table (line 57)
'File name' is not a valid variable name. 

I have tried some edits but unfortunately, I was unable to fix the error.

Thank you again.

Star Strider il 15 Gen 2020

Apri in MATLAB Online

Dear Behzad Navidi —

My pleasure!

R2019b allows spaces and other characters in table variable names.

This should work in every release:

filnam = cell2table(fn,'VariableNames',{'File_name'});                          % File Names Table

It was two separate words in the example you posted, so I assumed your version of MATLAB supported it.

My apologies.

Accedi per commentare.

Answer 2

Jakob B. Nielsen il 15 Gen 2020

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/500425-finding-the-percentage-of-nan-cells-in-columns#answer_410241

Apri in MATLAB Online

Lets say you have a vector A. isnan(A) gives you a vector of same dimension of A, with 0 on all indexes with a value, and 1 in all indexes with NaN. Therefore,

sum(isnan(A));

gives you the amount of NaN entries. And

(sum(isnan(A))/numel(A))*100;

gives the percentage.

Now, for doing it "easily" with excel, here is what you do. Put all excel files in the same folder, and then implement this. (This is a present from me to you, please look it through and understand it if you want to learn how to use it for other stuff :) )

[Name,Path]=uigetfile({'*.xls*'},'MultiSelect','on');
entirefile =fullfile(Path,Name); 
filecount=size(Name,2);
% Now do a for loop to read all the selected files. 
for i=1:filecount
Excel = actxserver('Excel.Application');   
Excel.visible = false;
Excel.DisplayAlerts = false;
Excel.EnableSound = false;
Excel.Workbooks.Open(entirefile);
Workbook = Excel.ActiveWorkbook;
sheetIndex=1;
Worksheets.Item(sheetIndex).Activate;
Datainfo = get(Excel.ActiveSheet, 'Range', 'I2', 'L361');  %this is the range in the example file you attached... See comment below***
Data(j).yourdata=cell2mat(Datainfo.value);
Excel.Quit;
Excel.delete;
clear Excel;
end

You will end up with a data structure where, if you have 125 files, the Data structure will contain 125 entries of 360x4.

Data(1).yourdata will display all data from the first excel file, corresponding to Name{1}, and so on.

Then, you simply run the isnan function on all your data. You can even do that inside the above loop if you want.

*** IF your various excel files are not all of the same dimension, it gets a little hairy. I refer you to this post, which gives a code for finding the next empty row. Save the script as a sub function, and then call it this way:

lastrow = GoToNextRowInColumn(Excel,'A');  %DONT call it with column I or some such, because if your very last entry is a NaN you will skip it!
%%% and then change your Datainfo line above to this: 
endrange=strcat('L',num2str(lastrow-1));
Datainfo = get(Excel.ActiveSheet, 'Range', 'I2', endrange); 

https://se.mathworks.com/matlabcentral/answers/285463-how-find-last-row-in-excel#answer_223169

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

BN il 15 Gen 2020

Dear Jakob B. Nielsen

I really appreciate your time and your answer. I was read it carefully and learned useful points. Thank you

Accedi per commentare.

Answer 3

Adam Danz il 15 Gen 2020

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/500425-finding-the-percentage-of-nan-cells-in-columns#answer_410265

Modificato: Adam Danz il 15 Gen 2020

Apri in MATLAB Online

Using the C.mat file from your previous questions, here's a simple way to produce the table you're describing.

load('C.mat');
% Identify columns to analyze 
col = [9 10 11 12];
% Compute percent-nan for each table & chosen column
nanPercent = cell2mat(cellfun(@(x)mean(isnan(x{:,col}),1),C,'UniformOutput',false)'); 
% If you want those values to be percentages rather than decimals (ie, 10.5 instead of 0.105)
% nanPercent = nanPercent * 100; 
% Get the station_name from each table
stationNames = cellfun(@(x)unique(x.station_name(~ismissing(x.station_name))),C)'; 
% Summarize results in table
varNames = C{1}.Properties.VariableNames(col); % Get var names from first table
T = [table(stationNames), array2table(nanPercent,'VariableNames',varNames)]; 

Result

head(T) %show first few rows of table

stationNames tmax_m tmin_m rrr24 tm_m

__________________ ________ ________ ________ ________

{'Abadan' } 0.098684 0.13377 0.14254 0.12719

{'Abadeh' } 0.10965 0.12939 0.125 0.19079

{'Abali' } 0.14474 0.14474 0.17544 0.17982

{'Abumusa Island'} 0.16886 0.17105 0.13816 0.19079

{'Ahar' } 0.23246 0.22368 0.23465 0.26096

{'Ahvaz' } 0.059211 0.065789 0.065789 0.063596

{'Aligudarz' } 0.2193 0.24561 0.25219 0.25877

{'Anar' } 0.24342 0.21053 0.21491 0.25658

Note that percentages in this answer and Star Strider's answer differ only because we're using different inputs (apparently the C.mat files we're using are not the same file). When I test his code using my C.mat file, the outputs are the same.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Finding the percentage of NaN cells in columns

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti
Mostra NessunoNascondi Nessuno

Più risposte (2)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

Finding the percentage of NaN cells in columns

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti Mostra NessunoNascondi Nessuno

Più risposte (2)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti