How I can eliminate days with incomplete hours?

2 visualizzazioni (ultimi 30 giorni)
Miguel L
Miguel L il 21 Set 2016
Risposto: Andrei Bobrov il 22 Set 2016
Hi everyone. I can't find a solution to my issue in the web. I have hourly averages data from a meteorological station. However there are some missing hours due to the lack of measurements. In this cases, I would like to remove the complete day, in order to keep only the days in which there are measurements in a whole day (24 data per day). Look at this example; The meteorological station is giving the next output, in which columns 1,2,3 and 4 corresponds to year, month, day and hour, respectively, and column 5 correspond to the measurement:
2010 1 1 0 12.3
2010 1 1 1 12.4
2010 1 1 2 3
2010 1 1 3 34.7
2010 1 1 4 54
2010 1 1 5 565
2010 1 1 6 2
2010 1 1 7 334
2010 1 1 8 23
2010 1 1 9 223
2010 1 1 10 32
2010 1 1 11 233
2010 1 1 12 544
2010 1 1 13 23
2010 1 1 16 12
2010 1 1 17 12
2010 1 1 18 3
2010 1 1 19 34
2010 1 1 20 54
2010 1 1 21 565
2010 1 1 22 2
2010 1 1 23 334
2010 1 2 0 23
2010 1 2 1 223
2010 1 2 2 32
2010 1 2 3 233
2010 1 2 4 544
2010 1 2 5 23
2010 1 2 6 12
2010 1 2 7 12
2010 1 2 8 3
2010 1 2 9 34
2010 1 2 10 54
2010 1 2 11 565
2010 1 2 12 2
2010 1 2 13 334
2010 1 2 14 23
2010 1 2 15 223
2010 1 2 16 32
2010 1 2 17 233
2010 1 2 18 544
2010 1 2 19 12
2010 1 2 20 12
2010 1 2 21 3
2010 1 2 22 34
2010 1 2 23 54
2010 1 3 0 565
2010 1 3 1 2
2010 1 3 2 334
2010 1 3 3 12
2010 1 3 5 12
2010 1 3 6 3
2010 1 3 7 34
2010 1 3 8 54
2010 1 3 9 565
2010 1 3 10 2
2010 1 3 11 334
2010 1 3 12 23
2010 1 3 13 223
2010 1 3 14 32
2010 1 3 15 233
2010 1 3 16 544
2010 1 3 17 23
2010 1 3 18 12
2010 1 3 19 12
2010 1 3 23 3
2010 1 4 0 34
2010 1 4 1 54
2010 1 4 2 565
2010 1 4 3 2.2
2010 1 4 4 334
2010 1 4 5 23
2010 1 4 6 223
2010 1 4 7 32
2010 1 4 8 233
2010 1 4 9 544
2010 1 4 10 12
2010 1 4 11 12
2010 1 4 12 3
2010 1 4 13 34
2010 1 4 14 54.7
2010 1 4 15 565
2010 1 4 16 2
2010 1 4 17 334
2010 1 4 18 23
2010 1 4 19 223
2010 1 4 20 32
2010 1 4 21 233
2010 1 4 22 544.6
2010 1 4 23 344.4
In the table of above, there were some missing hours; in day 1: hour 14 and hour 15; in day 3: hour 4, hour 20, hour 21 and hour 22. For that reason, is necessary don't take into account day 1 and day 3, and generate the next output:
2010 1 2 0 23
2010 1 2 1 223
2010 1 2 2 32
2010 1 2 3 233
2010 1 2 4 544
2010 1 2 5 23
2010 1 2 6 12
2010 1 2 7 12
2010 1 2 8 3
2010 1 2 9 34
2010 1 2 10 54
2010 1 2 11 565
2010 1 2 12 2
2010 1 2 13 334
2010 1 2 14 23
2010 1 2 15 223
2010 1 2 16 32
2010 1 2 17 233
2010 1 2 18 544
2010 1 2 19 12
2010 1 2 20 12
2010 1 2 21 3
2010 1 2 22 34
2010 1 2 23 54
2010 1 4 0 34
2010 1 4 1 54
2010 1 4 2 565
2010 1 4 3 2
2010 1 4 4 334
2010 1 4 5 23
2010 1 4 6 223
2010 1 4 7 32
2010 1 4 8 233
2010 1 4 9 544
2010 1 4 10 12
2010 1 4 11 12
2010 1 4 12 3
2010 1 4 13 34
2010 1 4 14 54
2010 1 4 15 565
2010 1 4 16 2
2010 1 4 17 334
2010 1 4 18 23
2010 1 4 19 223
2010 1 4 20 32
2010 1 4 21 233
2010 1 4 22 544
2010 1 4 23 344
Thanks in advance for your help!

Risposte (4)

Marieke Klijn
Marieke Klijn il 21 Set 2016
Hi Miguel,
For your question I loaded your copied data and named it 'data'. I assume that every day contains 24 hours (0-23) and each day has a number which is unique per dataset. So I loop over each day and check if there are 24 hour entries. If this is not the case then the rows are removed from the dataset.
days = unique(data(:,3));
for ii = 1:length(days)
hours_in_day = find(data(:,3) == days(ii));
if length(hours_in_day) < 24
data(hours_in_day,:) = [];
end
clear hours_in_day
end
Hope this works for you!
Cheers, Marieke
  3 Commenti
Miguel L
Miguel L il 22 Set 2016
I found one solution: first of all I filtered the year and secondly filtered each month; after this, I applied your scipt to each month.
Marieke Klijn
Marieke Klijn il 22 Set 2016
Very good, glad I could help!

Accedi per commentare.


Geoff Hayes
Geoff Hayes il 21 Set 2016
Miguel - depending upon your version of MATLAB, you could use hist or http://www.mathworks.com/help/matlab/ref/histogram.html to determine the number of recorded hours in each day. For example, using your above data and hist we would see that
[hourCount,days] = hist(data(:,3),1:max(data(:,3)))
returns
hourCount =
22 24 20 24
days =
1 2 3 4
So we know that the first and third days are incomplete. We could then remove these days from the data by checking to see which element of column three is a member of those days that are incomplete and set it to an empty element (which removes it from the array)
data(ismember(data(:,3),days(hourCount ~= 24)),:) = [];
In the above, we rely on the fact that
ismember(data(:,3),days(hourCount ~= 24))
returns a logical array of ones and zeros for each row of data where a one indicate that the row is missing an hour and a zero indicates otherwise. There may be a more efficient way of doing the above..I seem to recall that ismember can be expensive.
  1 Commento
Miguel L
Miguel L il 21 Set 2016
Hi Geoff thanks for your response!
But in my case my data is covering one year. For that reason when I execute the code, it doesn't work properly due to it counts all days 1, all days 2... and so on until day 31... I mean it counts every day 1 from every month, every day 2 from every month and so on.
I attach to this message the file in which I am trying to ignore the incomplete days (<24 hours). Could you help me one more time?
I have 56 files like this! For that reason it is convenient to count with a script which could make the process of information easier!
Thanks in advance!

Accedi per commentare.


George
George il 21 Set 2016
Modificato: George il 21 Set 2016
You can do this with findgroups() and splitapply(). Conceptually you want to do the following:
  1. Find groups on year, month, day.
  2. Validate that your hours meets what you need (e.g., numel == 24 or somesuch)
  3. Extract valid dates
This should mostly do it.
weather = readtable('weather.xlsx');
G = findgroups(weather.Year, weather.Month, weather.Day);
validDays = splitapply(@(x) (numel(x) == 24), weather.Hour, G);
weather(validDays(G),:)
Where this may fall down would be if you had days with a dupe reading, but still 24 readings. You can do different anonymous functino if that concerns you.

Andrei Bobrov
Andrei Bobrov il 22 Set 2016
[~,~,c] = unique(EXAMPLE(:,1:3),'rows');
t = nonzeros((1:max(c))'.*(accumarray(c,1) == 24));
out = EXAMPLE(ismember(c,t),:);

Categorie

Scopri di più su Characters and Strings in Help Center e File Exchange

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by