problem with converting dates to numbers

Question

0 voti

01AA002_Daily_Flow_ts.csv

Hi,

I want to convert the 3 rd column "date" to "numbers" using datanum function in a loop.

I have few problems.

1. datenum function does not read 3rd col properly (see the file attached csv file and image for the error) 2. I need to perform this in a loop since I have large number of files.

Please see my code below.

tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false);    % Load Data
tr(1,:)=[];
%fn='01AA002_Daily_Flow_ts.csv';
dn = datenum(tr.Var3,'yyyy/mm/dd');

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Peter Perkins il 3 Dic 2015

Apri in MATLAB Online

0 voti

The version of 01AA002_Daily_Flow_ts.csv that you've attached has column headings and freeform text at the bottom, so

tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false);

is gonna return a table with one (cellstr) variable. Assuming you added that disclaimer text just for the purposes of posting the file and just forgot to tell us to take it out, using 'ReadVariableNames',false will give you a table with five cellstr variables. That's almost surely NOT what you want, although in the case of the date strings, they're strings either way.

When I do this

>> tr = readtable('01AA002_Daily_Flow_ts.csv');
>> dn = datenum(tr.Date);
>> dn(1:5)
ans =
      718672
      718690
      718691
      718692
      718693

everything works fine. Do you get something different when you do that? In your latest code, it fails because you've used the wrong format string when calling datenum.

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Damith il 3 Dic 2015

Modificato: Damith il 3 Dic 2015

Apri in MATLAB Online

I have the code below but it gives me an error: See the code and error below

clear all
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%*[^\n]');    % Load Data
tr(1,:)=[];
I = ismember(tr.Var1(:,:),'');
tr(I,:)=[];
II = ismember(tr.Var1(:,:),'DISCLAIMER');
tr(II,:)=[];
III = ismember(tr.Var1(:,:),'"NOTICE: This application and its data are provided AS-IS.  In no event shall Environment Canada be liable for any damages whatsoever (including');
tr(III,:)=[];
tc = table2cell(tr);
dn = datenum(tc(:,3),'mm/dd/yyyy');  
dv = datevec(dn);
yr = unique(dv(:,1));
[tf, loc] = ismember(dv(:,1), yr);
nyr = length(yr);
nnonnegperyr = accumarray(loc, tc(:,4), [nyr 1], @(x) sum(x>=0), NaN);
  isleap = ((mod(yr,4) == 0 & mod(yr,100) ~= 0) | mod(yr,400) == 0);
  iscomplete = (~isleap & nnonnegperyr == 365) | ...
               (isleap & nnonnegperyr == 366);
  % Filter out dates and values for complete years only
  isin = ismember(dv(:,1), yr(iscomplete));
  newdata = [cellstr(datestr(dn(isin), 'mm/dd/yyyy')) num2cell(tc(:,4)(isin))];

Error:

Error using accumarray
Second input VAL must be a full numeric, logical, or char vector or scalar.

NB: Note to mentiond that I need to this over 2000+ csv files.

Damith il 4 Dic 2015

Modificato: Damith il 4 Dic 2015

Apri in MATLAB Online

I am kind of lost here. Can you please guide/help me with what you mentioned here?

I have revised the code now. See below.

clear all
cd ('C:\Users\Desktop')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
csvFiles = dir(filePattern);
for k = 1:length(csvFiles)
  fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
  out{k} = textscan(fid(k),'%s%*s%s%f%*[^\n]','delimiter',',','headerlines',1);
  fclose(fid(k));    
end
for jj=1:length(csvFiles)
   for ij=1:3
   if ij==3
      out{1,jj}{1,ij}(end)=[];
   else
      for i=1:-1:0
      out{1,jj}{1,ij}(end-i)=[];
      clear i
      end
   end
   end
end

"Return the year via datevec and count the number for each unique year found. If not nDays(yr) for the given year then throw that one out"

Finally, I want to write to csv file with station ID, years of data available (complete) and maximum value of each complete year. (Please see the attached csv file). Flow values and years are not correct in the attached csv file.

dpb il 4 Dic 2015

This again seems to have transmuted the original question into asking how to solve another particular processing problem...I posted another code snippet that illustrates as another Answer...

Accedi per commentare.

Answer 2

dpb il 2 Dic 2015

Modificato: dpb il 3 Dic 2015

Apri in MATLAB Online

0 voti

The problem isn't datenum, it's that you're trying to pass a cellstr array into it. If you're going to use readtable, use a specific format string to convert the dates on input; see doc for details and an example (albeit it converts a non-US date to US as well, but it does show the date formatting string).

Failing that, revert to the way I showed previously to parse the .csv file directly into numeric arrays, bypassing all the higher-level abstractions but leaving you with a set of double arrays that can be handled pretty simply for your needs.

ADDENDUM

Please cut 'n paste text instead of the images -- they're exceedingly difficult to read plus one can't select them to try to repeat anything you've done...

Anyway, I seem to have misspoken re: cellstring arrays and datenum; it actually accepts them just fine.

I used the import tool and retrieved both files (I have R2012b so don't have readtable so can't test it directly, but they have different forms for the date string. However, each worked just fine with datenum even as the time portion of the one file is ignored.

>> whos VarName5
  Name              Size              Bytes  Class    Attributes
    VarName5      32142x1             3535620  cell               
>> VarName5(1:4)
ans = 
    '1920-01-01T07:00:00+07:00'
    '1920-01-02T07:00:00+07:00'
    '1920-01-03T07:00:00+07:00'
    '1920-01-04T07:00:00+07:00'
>> datestr(datenum(VarName5(1:4),'yyyy-mm-dd'))
ans =
01-Jan-1920
02-Jan-1920
03-Jan-1920
04-Jan-1920
>> datestr(datenum(VarName5(1:4),'yyyy-mm-ddTHH:MM:SS'))
ans =
01-Jan-1920 07:00:00
02-Jan-1920 07:00:00
03-Jan-1920 07:00:00
04-Jan-1920 07:00:00
>>

With the new datetime '%d' format string as noted you can interpret the rest of the time string as well but datenum doesn't have that facility.

The other file is an "ordinary" YYYY-MM-DD string; should be no issues whatever with it.

Whatever problem you're having seems to be associated with the "how" of how you're reading the files, but can't see what Matlab actually complained about from the pictures without the full error text in context.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Damith il 2 Dic 2015

Modificato: Damith il 2 Dic 2015

Apri in MATLAB Online

OK. For example, I will show you the evaluation of the first two lines of the code for the attached KH_100303.csv file. (See the images below)

tr = readtable('KH_100303.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%s%s%s%s);    % Load Data
dn = datenum(tr.Var5, 'yyyy-mm-ddTHH:MM:SS');

But, this does not work for 01AA002_Daily_Flow_ts.csv file.

tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%s);    % Load Data
tr(1,:)=[];
dn = datenum(tr.Var3,'mm/dd/yyyy');  
Error:
Error using datenum (line 178)
DATENUM failed.
Caused by:
    Error using dtstr2dtnummx
    Failed on converting date string to date number.

Accedi per commentare.

Answer 3

dpb il 4 Dic 2015

Modificato: dpb il 5 Dic 2015

Apri in MATLAB Online

0 voti

You seem to keep retrogressing past what we've already solved/shown solutions for. Why not build on the previously working solution in the previous thread remove-rows-text-at-the-bottom-of-a-csv-file? There I showed a simple way to return the values from the .csv file that mitigates the trailing disclaimer text essentially automagically. Instead you've returned to the previous case of holding all the file content in a cell array of cells which is exceedingly difficult to address owing to the need to get all the curlies and parens correct plus you can't do global addressing of cells with two-layer addressing to get subsets.

The previous file in the above thread used '-' as the date separator whereas this one uses '/' so that's one modification if choose to return the dates as y,m,d values rather than the string so that might mitigate using that altho you'll probably have to fixup the format string for datenum so it's likely a wash in writing generic code; you'll have to deal with the specific format at some point, anyway.

All that aside, start by first reading a single file and returning the specific information needed; namely the max for complete years, then look at wrapping that functionality over the files...

fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
  fid=fopen(d(i).name);
  c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
                              'collectoutput',1, ...
                              'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
  i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
  i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
  c=c(i1:i2,:);                           % save only those entries
  [~,~,iy]=unique(c(:,1));                % indices vector for grouping
  mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
  stn=strtok(d(i).name,'_');              % parse station name from file
  % write out the results in other file (presume already open)
  fprintf(fido,'%s,%d'\n',stn,length(yr)) % output station, # years
  fprintf(fido,'%4d,%.1f\n', [yr mx].';)  % year, max for each
end

You'll have to put in the housekeeping to create and open the output file(s*) then close after done and such, but the basic processing should be taken care of in the above...

You'll note I didn't bother to parse the station name from the file; that's just a complication of a bunch of meaningless text; I just parsed it from the input file name. The output file format is

StationName,#years
year,max
year,max
...

(*) I basically presumed in the above the idea is to consolidate all these into a single file; hence the station and number of entries in each section to aid reading. If again want one per station, then as the sample in the other thread demonstated, find some common name-generating pattern here as well.

Also note the utility function isleapyr is one of my little helpers...

function is=isleapyr(yr)
%  returns T for input year being a leapyear
    is=eomday(yr,2)==29;

20 Commenti
Mostra 18 commenti meno recenti Nascondi 18 commenti meno recenti

Damith il 4 Dic 2015

Apri in MATLAB Online

Noticed that it does not store each file into an array. "c" stores only the last file. How can I modify to store mutliple files without assigning into a cell array?

I simply modifoed your code and ran but there was a error message.

See the code and error below.

N.B: I need to write the output to a different .csv file ('test.csv') which has StationID, number of completer years and maximum values. I guess the code tries to write to multiple csv file for the station name whereas I need to write into one csv file containing all the information.

clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
                              'collectoutput',1, ...
                              'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
  i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
  i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
  c=c(i1:i2,:);                           % save only those entries
  [~,~,iy]=unique(c(:,1));                % indices vector for grouping
  mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
  stn=strtok(d.name,'_');                 % parse station name from file
  % write out the results in other file (presume already open)
  fprintf(fid,'%s','%d\n',stn,length(yr)) % output station, # years
  fprintf(fid,'%4d,%f.2\n', [yr mx],:)  % year, max for each
end
Error: 
Attempted to access yr(1); index out of bounds because numel(yr)=0.

dpb il 4 Dic 2015

Modificato: dpb il 5 Dic 2015

Apri in MATLAB Online

In the snippet I posted c holds the data from each input file in succession and writes all to a single output file. Logic to name and open that file is left as "exercise for the student"; as the comments state the snippet presumes that is already done prior to beginning the loop.

You modified the fprintf lines however and used the same file handle variable as that used for the input file, and I see no code to open any different output file.

While again you don't post the full context of the error, in this case there is only one line that references yr(1) and since the message indicates that the array is empty that indicates there were no full years for that particular file found. Set a breakpoint and use the debugger to see precisely what's happening; I ran the script on the sample file so I know the logic works for case where there is at least one complete year. If it's possible there are no complete years, need to add a test for that where the count is done or, possibly, use a try...catch block. It would likely be of interest to display [unique(c(:,1) n] in event this occurs so you can decide what to do with that file or maybe you do know that is a possibility and don't care in which case you can simply put a continue in the catch clause and go on.

I did see a problem in a couple of the formatting strings for the output file; I had just looked at the [yr mx] array at the command line and typed those on the fly. I did edit them in the Answer so pick up those mods from there.

For the given file the script gives the following result--

>> dai
01AA002,8
1969,161.00
1970,280.00
1971,213.00
1972,168.00
1973,198.00
1974,255.00
1975,128.00
1976,281.00
>> type dai
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
  fid = fopen(d(i).name);
  c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
                              'collectoutput',1, ...
                              'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
  i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
  i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
  c=c(i1:i2,:);                           % save only those entries
  [~,~,iy]=unique(c(:,1));                % indices vector for grouping
  mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
  stn=strtok(d(i).name,'_');              % parse station name from file
  % write out the results in other file (presume already open)
  fprintf('%s,%d\n',stn,length(yr))       % output station, # years
  fprintf('%4d,%.2f\n', [yr mx].')        % year, max for each
end
>>

ADDENDUM

NB: The above was tweaked to simply output the results to the command line rather than write an output file; again you've got to open a new output file with a different handle than that used for the input files before starting the loop.

dpb il 5 Dic 2015

"Noticed that it does not store each file into an array. "c" stores only the last file. How can I modify to store mutliple files without assigning into a cell array?"

There's no reason to store more than one file at a time; you have no need for any data other than the one over which you're doing the processing at any one time. Ergo, don't make things more difficult than needs must be.

IF you were to need multiple years at one time, then it might be necessary to use cell arrays to hold disparate sizes/dates at one time, granted, but don't worry about solving a problem of that type until it's necessary. At that point I'd likely either

Do the reduction as shown to minimal dataset for the file first, then assign that array to a cell array element, or, alternatively,
Create a single 3D array that grows to hold each station by plane with empty placeholders for those locations without data at any given station/plane.

The choice would depend upon just what would be needed to be done with the data simultaneously and just how disparate those datasets might be so as to how much wasted space would be needed to do the second option.

Other ideas for storage would also likely present themselves for any specific case as well that might be better than either of the above.

Damith il 5 Dic 2015

Modificato: Damith il 5 Dic 2015

Apri in MATLAB Online

I figured out the cause for the error "yr". I simply moved the " fclose" to the end so it calculates all the intermediate outputs correctly and output to the .csv file. But this works for a single .csv file.

See the code I modified and the image of output csv file. (again thankful to your snippets and help).

But, thie code below works fine ONLY with ONE .csv file. I am in the process of modifying to read multiple csv files. Your thoughts and help is appreciated here.

N.B: FINALLY, I need to write all the station outputs in ONE .csv file (as shown on gage.csv) file. (See the attached sample of 10 station data and gage.csv file)

clear all
cd ('C:\Users\Desktop\test_file')
myFolder = 'C:\Users\Desktop\test_file';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
                              'collectoutput',1, ...
                              'delimiter',','));
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
  i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
  i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
  c=c(i1:i2,:);                           % save only those entries
  [~,~,iy]=unique(c(:,1));                % indices vector for grouping
  mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
  stn=strtok(d(i).name,'_');                 % parse station name from file
  fid=fclose(fid);
  fileID = fopen('new.csv','w');
  if fileID ~= -1
     fprintf(fileID, 'Station_ID  Data_Avail\n');
     fprintf(fileID,'%s,%d\n', stn,length(yr))
     fprintf(fileID,'%4d,%.2f\n', [yr mx].')  
  fclose(fileID);
  end
end

Final output needed (gage.csv)

dpb il 6 Dic 2015

Modificato: dpb il 6 Dic 2015

That's exactly what my example does if you'll simply open the output file first, before the loop and not close it until everything is done excepting I'd not choose to build a text file with all that missing data as you've shown the one record; that's basically the option outlined above as #2. I'd likely only save the valid data initially, then build a (probably sparse) array from it for processing. There's not much chance one would look at such a file manually, anyway, there's too much stuff there to deal with by hand so why not be more concise?

What's the next step; that would likely again control what I'd think would be the more suitable file format.

But, if you're adamant (or somebody else has made the requirement to put the file out in that specific [silly imo :) ] format), then creating an array of nan(nSta,maxYr) and populating it by row in the loop, saving the station in a linear vector since it's string, not numeric, and then writing it at the end will leave you with the full dataset in memory for further analyses as well. Better might be sparse depending upon just how many stations (rows) there are; 165 * 2000 is "only" 2+ MB, however, which is not a terribly large dataset by today's standards to handle but the storage is quite inefficient. Again, it all depends upon where you're headed in the end.

Damith il 7 Dic 2015

Apri in MATLAB Online

Again the same problem arises. I just ran the code below and the "yr" vector is empty. Cannot figue out why. I just moved the "fclose" as shown in your example.

See the screenshot as well.

clear all
cd ('C:\Users\i54814\Desktop\test_avg')
myFolder = 'C:\Users\i54814\Desktop\test_avg';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %2f/%2f/%4f %f %*[^\n]';
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
  fid=fclose(fid);                        % close input file
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
%   i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
%   i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
%   c=c(i1:i2,:);                           % save only those entries
%   [~,~,iy]=unique(c(:,1));                % indices vector for grouping
%   mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
%   stn=strtok(d(i).name,'_');              % parse station name from file  
end

Damith il 7 Dic 2015

Modificato: Damith il 7 Dic 2015

Apri in MATLAB Online

I fix the issue here is the code below. Now its reading all the files. But, now the problem is how can I store the year information to a matrix ("gage") before I start csvwrite function. Please see my code below.

I am having difficulties to think in a logical manner and to implement what you have mentionend here:

"creating an array of nan(nSta,maxYr) and populating it by row in the loop, saving the station in a linear vector since it's string, not numeric, and then writing it at the end will leave you with the full dataset in memory for further analyses as well"

clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
gage=nan(length(d),length(year));
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
   i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
   i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
   c=c(i1:i2,:);                           % save only those entries
   [~,~,iy]=unique(c(:,1));                % indices vector for grouping
   mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
   stn=strtok(d(i).name,'_');              % parse station name from file 
   I = ismember(year, yr);
   idx=find(I(1,:)==1);
   ....
   ....
end

Damith il 8 Dic 2015

Apri in MATLAB Online

Getting an error:

clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
   i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
   i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
   c=c(i1:i2,:);                           % save only those entries
   [~,~,iy]=unique(c(:,1));                % indices vector for grouping
   mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
   stn=strtok(d(i).name,'_');              % parse station name from file 
   [~,iy]=ismember(year,yr); 
   mxary(i,iy)=mx;     
end

I get the following error when I include this line

mxary(i,iy)=mx;
Error:
Subscript indices must either be real positive integers
or logicals.

Damith il 8 Dic 2015

Modificato: Damith il 8 Dic 2015

Apri in MATLAB Online

01AF003_Daily_Flow_ts.csv

Thanks for your guidance again. I figured it out. Here is the code below and it works.

Thers is problem. As you mentioned earlier one file (see the attached file) has imcomplete year of data in the middle of the array. So, it does not work for if that condition prevails.

clear all
cd ('C:\Users\Desktop\test_file3')
myFolder = 'C:\Users\Desktop\test_file3';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete      
  i1=find(c(:,1)==yr(1),1);               % first complete year in dataset
  i2=find(c(:,1)==yr(end),1,'last');      % last of last complete year
  c=c(i1:i2,:);                           % save only those entries
  [~,~,iy]=unique(c(:,1));                % indices vector for grouping
  mx=accumarray(iy,c(:,end),[],@max);     % get maximum for each year
  [~,iy]=ismember(year,yr); 
  mxary(i,logical(iy))=mx;  
  stn=strtok(d(i).name,'_');              % parse station name from file    
end

So, I am trying to remove incomplete years from "c" in the middle of the array. So, i created an index (idx)

idx=ismember((n<365),c);

but how can I use "idx" to remove incomplete years from "c" before it calculates i1 and i2.?

Any thoughts and help is appreciated here.

Damith il 10 Dic 2015

Apri in MATLAB Online

Thanks and apologies for late reply. It worked. Now, I am having some troubles wrting this to a csv file using "fprintf" function. Please see the code below. Any help is appreciated here.

clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
  i1=ismember(c(:,1),yr);
  c=c(i1,:);
  [~,~,iy]=unique(c(:,1)); 
  mx=accumarray(iy,c(:,end),[],@max);
  [~,iy]=ismember(year,yr); 
  mxary(i,logical(iy))=mx;  
  stn=strtok(d(i).name,'_');              % parse station name from file 
  fileID = fopen('new.csv','w');
  if fileID ~= -1
     for row = 1 : size(mxary, 1)
         fprintf(fileID,'%s,%d,%f\n',stn,length(yr(:,1)),mxary(row,:));
     end
    fclose(fileID);
  end
end

dpb il 10 Dic 2015

Modificato: dpb il 10 Dic 2015

How many times do you have to be told to open the output file first, not inside the loop? This isn't rocket science...

And, of course, don't close it until after done writing into it...

Damith il 10 Dic 2015

Apri in MATLAB Online

OK. Figured it out. But, having a hard time writing the " mxary" row by row in to the same csv file corresponding to each row. See the image for station name and year columns.

See the code below:

clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
filename='new.csv';
fileID = fopen(filename,'w');
for i=1:length(d)
  fid = fopen(fullfile(myFolder,d(i).name));
  c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
  fid=fclose(fid);                        % close input file
  c(all(isnan(c),2),:)=[];
  yr=unique(c(:,1));                      % unique years in file
  n=histc(c(:,1),yr);                     % count entries by year
  yr=yr(n==(365+isleapyr(yr)));           % years that are complete
  i1=ismember(c(:,1),yr);
  c=c(i1,:);
  [~,~,iy]=unique(c(:,1)); 
  mx=accumarray(iy,c(:,end),[],@max);
  [~,iy]=ismember(year,yr); 
  mxary(i,logical(iy))=mx;  
  stn=strtok(d(i).name,'_');             % parse station name from file
  fprintf(fileID,'%s,%d\n', stn,length(yr));
end
fclose(fileID);

Accedi per commentare.

problem with converting dates to numbers

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Più risposte (2)

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

20 Commenti
Mostra 18 commenti meno recenti Nascondi 18 commenti meno recenti

Categorie

Tag

Community Treasure Hunt

problem with converting dates to numbers

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

9 Commenti Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Più risposte (2)

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

20 Commenti Mostra 18 commenti meno recenti Nascondi 18 commenti meno recenti

Categorie

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

20 Commenti
Mostra 18 commenti meno recenti Nascondi 18 commenti meno recenti