problem with converting dates to numbers

Hi,
I want to convert the 3 rd column "date" to "numbers" using datanum function in a loop.
I have few problems.
1. datenum function does not read 3rd col properly (see the file attached csv file and image for the error) 2. I need to perform this in a loop since I have large number of files.
Please see my code below.
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false); % Load Data
tr(1,:)=[];
%fn='01AA002_Daily_Flow_ts.csv';
dn = datenum(tr.Var3,'yyyy/mm/dd');

 Risposta accettata

The version of 01AA002_Daily_Flow_ts.csv that you've attached has column headings and freeform text at the bottom, so
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false);
is gonna return a table with one (cellstr) variable. Assuming you added that disclaimer text just for the purposes of posting the file and just forgot to tell us to take it out, using 'ReadVariableNames',false will give you a table with five cellstr variables. That's almost surely NOT what you want, although in the case of the date strings, they're strings either way.
When I do this
>> tr = readtable('01AA002_Daily_Flow_ts.csv');
>> dn = datenum(tr.Date);
>> dn(1:5)
ans =
718672
718690
718691
718692
718693
everything works fine. Do you get something different when you do that? In your latest code, it fails because you've used the wrong format string when calling datenum.

9 Commenti

This format has been ongoing in a bunch of posts going back some time, Peter, which is why I gave him a textscan solution that worked to parse a file way back when...it ignores the trailing text (actually it fails in a known, repeatable pattern).
Damith
Damith il 3 Dic 2015
Modificato: Damith il 3 Dic 2015
I was able to use textread function and eliminate the text at the bottom of the file. As you mentioned it is easy to parse the file. Here is code below:
U=textread('01AA002_Daily_Flow_ts.csv','%s','delimiter','\n','whitespace',' ','headerlines',1);
for i=2:-1:0
U(end-i)=[];
clear i
end
Now, how can I convert this cell array (U) to numeric/double array to show in columns?
@ Peter: How did u obtain tr.Date.? I dont get that output when I read the csv file.
Damith
Damith il 3 Dic 2015
Modificato: Damith il 3 Dic 2015
I basically need to use the '01AA002_Daily_Flow_ts.csv' file to filter out the dates (3rd col) and values (4th col) for COMPLETE YEARS ONLY. Any incomplete years should not be in the output.
Any thoughts are highly appreciated.
dpb
dpb il 3 Dic 2015
Modificato: dpb il 3 Dic 2015
>> datestr(718672)
ans =
28-Aug-1967
>> datestr(718690)
ans =
15-Sep-1967
>>
That's what I see in the file; what do you get?
Return the year via datevec and count the number for each unique year found. If not nDays(yr) for the given year then throw that one out. It's another selection very similar to that demonstrated earlier with ismember and friends.
ADDENDUM
Actually, find diff datenums and locations that aren't nDays(yr) apart are short entries...
Damith
Damith il 3 Dic 2015
Modificato: Damith il 3 Dic 2015
I have the code below but it gives me an error: See the code and error below
clear all
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%*[^\n]'); % Load Data
tr(1,:)=[];
I = ismember(tr.Var1(:,:),'');
tr(I,:)=[];
II = ismember(tr.Var1(:,:),'DISCLAIMER');
tr(II,:)=[];
III = ismember(tr.Var1(:,:),'"NOTICE: This application and its data are provided AS-IS. In no event shall Environment Canada be liable for any damages whatsoever (including');
tr(III,:)=[];
tc = table2cell(tr);
dn = datenum(tc(:,3),'mm/dd/yyyy');
dv = datevec(dn);
yr = unique(dv(:,1));
[tf, loc] = ismember(dv(:,1), yr);
nyr = length(yr);
nnonnegperyr = accumarray(loc, tc(:,4), [nyr 1], @(x) sum(x>=0), NaN);
isleap = ((mod(yr,4) == 0 & mod(yr,100) ~= 0) | mod(yr,400) == 0);
iscomplete = (~isleap & nnonnegperyr == 365) | ...
(isleap & nnonnegperyr == 366);
% Filter out dates and values for complete years only
isin = ismember(dv(:,1), yr(iscomplete));
newdata = [cellstr(datestr(dn(isin), 'mm/dd/yyyy')) num2cell(tc(:,4)(isin))];
Error:
Error using accumarray
Second input VAL must be a full numeric, logical, or char vector or scalar.
NB: Note to mentiond that I need to this over 2000+ csv files.
dpb
dpb il 4 Dic 2015
Modificato: dpb il 4 Dic 2015
tc = table2cell(tr);
that won't be what's needed; it's a cell array.
tc{:,4}, ototh, might be altho what, specifically you're after via accumarray here I'm not sure. If it's attempting to compute the number in a given year, see the ADDENDUM on alternate technique.
Damith
Damith il 4 Dic 2015
Modificato: Damith il 4 Dic 2015
I am kind of lost here. Can you please guide/help me with what you mentioned here?
I have revised the code now. See below.
clear all
cd ('C:\Users\Desktop')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
csvFiles = dir(filePattern);
for k = 1:length(csvFiles)
fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
out{k} = textscan(fid(k),'%s%*s%s%f%*[^\n]','delimiter',',','headerlines',1);
fclose(fid(k));
end
for jj=1:length(csvFiles)
for ij=1:3
if ij==3
out{1,jj}{1,ij}(end)=[];
else
for i=1:-1:0
out{1,jj}{1,ij}(end-i)=[];
clear i
end
end
end
end
"Return the year via datevec and count the number for each unique year found. If not nDays(yr) for the given year then throw that one out"
Finally, I want to write to csv file with station ID, years of data available (complete) and maximum value of each complete year. (Please see the attached csv file). Flow values and years are not correct in the attached csv file.
This again seems to have transmuted the original question into asking how to solve another particular processing problem...I posted another code snippet that illustrates as another Answer...

Accedi per commentare.

Più risposte (2)

dpb
dpb il 2 Dic 2015
Modificato: dpb il 3 Dic 2015
The problem isn't datenum, it's that you're trying to pass a cellstr array into it. If you're going to use readtable, use a specific format string to convert the dates on input; see doc for details and an example (albeit it converts a non-US date to US as well, but it does show the date formatting string).
Failing that, revert to the way I showed previously to parse the .csv file directly into numeric arrays, bypassing all the higher-level abstractions but leaving you with a set of double arrays that can be handled pretty simply for your needs.
ADDENDUM
Please cut 'n paste text instead of the images -- they're exceedingly difficult to read plus one can't select them to try to repeat anything you've done...
Anyway, I seem to have misspoken re: cellstring arrays and datenum; it actually accepts them just fine.
I used the import tool and retrieved both files (I have R2012b so don't have readtable so can't test it directly, but they have different forms for the date string. However, each worked just fine with datenum even as the time portion of the one file is ignored.
>> whos VarName5
Name Size Bytes Class Attributes
VarName5 32142x1 3535620 cell
>> VarName5(1:4)
ans =
'1920-01-01T07:00:00+07:00'
'1920-01-02T07:00:00+07:00'
'1920-01-03T07:00:00+07:00'
'1920-01-04T07:00:00+07:00'
>> datestr(datenum(VarName5(1:4),'yyyy-mm-dd'))
ans =
01-Jan-1920
02-Jan-1920
03-Jan-1920
04-Jan-1920
>> datestr(datenum(VarName5(1:4),'yyyy-mm-ddTHH:MM:SS'))
ans =
01-Jan-1920 07:00:00
02-Jan-1920 07:00:00
03-Jan-1920 07:00:00
04-Jan-1920 07:00:00
>>
With the new datetime '%d' format string as noted you can interpret the rest of the time string as well but datenum doesn't have that facility.
The other file is an "ordinary" YYYY-MM-DD string; should be no issues whatever with it.
Whatever problem you're having seems to be associated with the "how" of how you're reading the files, but can't see what Matlab actually complained about from the pictures without the full error text in context.

1 Commento

Damith
Damith il 2 Dic 2015
Modificato: Damith il 2 Dic 2015
OK. For example, I will show you the evaluation of the first two lines of the code for the attached KH_100303.csv file. (See the images below)
tr = readtable('KH_100303.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%s%s%s%s); % Load Data
dn = datenum(tr.Var5, 'yyyy-mm-ddTHH:MM:SS');
But, this does not work for 01AA002_Daily_Flow_ts.csv file.
tr = readtable('01AA002_Daily_Flow_ts.csv','Delimiter',',','ReadVariableNames',false,'Format','%s%s%s%s%s); % Load Data
tr(1,:)=[];
dn = datenum(tr.Var3,'mm/dd/yyyy');
Error:
Error using datenum (line 178)
DATENUM failed.
Caused by:
Error using dtstr2dtnummx
Failed on converting date string to date number.

Accedi per commentare.

dpb
dpb il 4 Dic 2015
Modificato: dpb il 5 Dic 2015
You seem to keep retrogressing past what we've already solved/shown solutions for. Why not build on the previously working solution in the previous thread remove-rows-text-at-the-bottom-of-a-csv-file? There I showed a simple way to return the values from the .csv file that mitigates the trailing disclaimer text essentially automagically. Instead you've returned to the previous case of holding all the file content in a cell array of cells which is exceedingly difficult to address owing to the need to get all the curlies and parens correct plus you can't do global addressing of cells with two-layer addressing to get subsets.
The previous file in the above thread used '-' as the date separator whereas this one uses '/' so that's one modification if choose to return the dates as y,m,d values rather than the string so that might mitigate using that altho you'll probably have to fixup the format string for datenum so it's likely a wash in writing generic code; you'll have to deal with the specific format at some point, anyway.
All that aside, start by first reading a single file and returning the specific information needed; namely the max for complete years, then look at wrapping that functionality over the files...
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
fid=fopen(d(i).name);
c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
'collectoutput',1, ...
'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
% write out the results in other file (presume already open)
fprintf(fido,'%s,%d'\n',stn,length(yr)) % output station, # years
fprintf(fido,'%4d,%.1f\n', [yr mx].';) % year, max for each
end
You'll have to put in the housekeeping to create and open the output file(s*) then close after done and such, but the basic processing should be taken care of in the above...
You'll note I didn't bother to parse the station name from the file; that's just a complication of a bunch of meaningless text; I just parsed it from the input file name. The output file format is
StationName,#years
year,max
year,max
...
(*) I basically presumed in the above the idea is to consolidate all these into a single file; hence the station and number of entries in each section to aid reading. If again want one per station, then as the sample in the other thread demonstated, find some common name-generating pattern here as well.
Also note the utility function isleapyr is one of my little helpers...
function is=isleapyr(yr)
% returns T for input year being a leapyear
is=eomday(yr,2)==29;

20 Commenti

Noticed that it does not store each file into an array. "c" stores only the last file. How can I modify to store mutliple files without assigning into a cell array?
I simply modifoed your code and ran but there was a error message.
See the code and error below.
N.B: I need to write the output to a different .csv file ('test.csv') which has StationID, number of completer years and maximum values. I guess the code tries to write to multiple csv file for the station name whereas I need to write into one csv file containing all the information.
clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
'collectoutput',1, ...
'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d.name,'_'); % parse station name from file
% write out the results in other file (presume already open)
fprintf(fid,'%s','%d\n',stn,length(yr)) % output station, # years
fprintf(fid,'%4d,%f.2\n', [yr mx],:) % year, max for each
end
Error:
Attempted to access yr(1); index out of bounds because numel(yr)=0.
dpb
dpb il 4 Dic 2015
Modificato: dpb il 5 Dic 2015
In the snippet I posted c holds the data from each input file in succession and writes all to a single output file. Logic to name and open that file is left as "exercise for the student"; as the comments state the snippet presumes that is already done prior to beginning the loop.
You modified the fprintf lines however and used the same file handle variable as that used for the input file, and I see no code to open any different output file.
While again you don't post the full context of the error, in this case there is only one line that references yr(1) and since the message indicates that the array is empty that indicates there were no full years for that particular file found. Set a breakpoint and use the debugger to see precisely what's happening; I ran the script on the sample file so I know the logic works for case where there is at least one complete year. If it's possible there are no complete years, need to add a test for that where the count is done or, possibly, use a try...catch block. It would likely be of interest to display [unique(c(:,1) n] in event this occurs so you can decide what to do with that file or maybe you do know that is a possibility and don't care in which case you can simply put a continue in the catch clause and go on.
I did see a problem in a couple of the formatting strings for the output file; I had just looked at the [yr mx] array at the command line and typed those on the fly. I did edit them in the Answer so pick up those mods from there.
For the given file the script gives the following result--
>> dai
01AA002,8
1969,161.00
1970,280.00
1971,213.00
1972,168.00
1973,198.00
1974,255.00
1975,128.00
1976,281.00
>> type dai
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
fid = fopen(d(i).name);
c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
'collectoutput',1, ...
'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
% write out the results in other file (presume already open)
fprintf('%s,%d\n',stn,length(yr)) % output station, # years
fprintf('%4d,%.2f\n', [yr mx].') % year, max for each
end
>>
ADDENDUM
NB: The above was tweaked to simply output the results to the command line rather than write an output file; again you've got to open a new output file with a different handle than that used for the input files before starting the loop.
"Noticed that it does not store each file into an array. "c" stores only the last file. How can I modify to store mutliple files without assigning into a cell array?"
There's no reason to store more than one file at a time; you have no need for any data other than the one over which you're doing the processing at any one time. Ergo, don't make things more difficult than needs must be.
IF you were to need multiple years at one time, then it might be necessary to use cell arrays to hold disparate sizes/dates at one time, granted, but don't worry about solving a problem of that type until it's necessary. At that point I'd likely either
  1. Do the reduction as shown to minimal dataset for the file first, then assign that array to a cell array element, or, alternatively,
  2. Create a single 3D array that grows to hold each station by plane with empty placeholders for those locations without data at any given station/plane.
The choice would depend upon just what would be needed to be done with the data simultaneously and just how disparate those datasets might be so as to how much wasted space would be needed to do the second option.
Other ideas for storage would also likely present themselves for any specific case as well that might be better than either of the above.
Damith
Damith il 5 Dic 2015
Modificato: Damith il 5 Dic 2015
I figured out the cause for the error "yr". I simply moved the " fclose" to the end so it calculates all the intermediate outputs correctly and output to the .csv file. But this works for a single .csv file.
See the code I modified and the image of output csv file. (again thankful to your snippets and help).
But, thie code below works fine ONLY with ONE .csv file. I am in the process of modifying to read multiple csv files. Your thoughts and help is appreciated here.
N.B: FINALLY, I need to write all the station outputs in ONE .csv file (as shown on gage.csv) file. (See the attached sample of 10 station data and gage.csv file)
clear all
cd ('C:\Users\Desktop\test_file')
myFolder = 'C:\Users\Desktop\test_file';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'headerlines',1, ...
'collectoutput',1, ...
'delimiter',','));
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
fid=fclose(fid);
fileID = fopen('new.csv','w');
if fileID ~= -1
fprintf(fileID, 'Station_ID Data_Avail\n');
fprintf(fileID,'%s,%d\n', stn,length(yr))
fprintf(fileID,'%4d,%.2f\n', [yr mx].')
fclose(fileID);
end
end
Final output needed (gage.csv)
dpb
dpb il 6 Dic 2015
Modificato: dpb il 6 Dic 2015
That's exactly what my example does if you'll simply open the output file first, before the loop and not close it until everything is done excepting I'd not choose to build a text file with all that missing data as you've shown the one record; that's basically the option outlined above as #2. I'd likely only save the valid data initially, then build a (probably sparse) array from it for processing. There's not much chance one would look at such a file manually, anyway, there's too much stuff there to deal with by hand so why not be more concise?
What's the next step; that would likely again control what I'd think would be the more suitable file format.
But, if you're adamant (or somebody else has made the requirement to put the file out in that specific [silly imo :) ] format), then creating an array of nan(nSta,maxYr) and populating it by row in the loop, saving the station in a linear vector since it's string, not numeric, and then writing it at the end will leave you with the full dataset in memory for further analyses as well. Better might be sparse depending upon just how many stations (rows) there are; 165 * 2000 is "only" 2+ MB, however, which is not a terribly large dataset by today's standards to handle but the storage is quite inefficient. Again, it all depends upon where you're headed in the end.
Again the same problem arises. I just ran the code below and the "yr" vector is empty. Cannot figue out why. I just moved the "fclose" as shown in your example.
See the screenshot as well.
clear all
cd ('C:\Users\i54814\Desktop\test_avg')
myFolder = 'C:\Users\i54814\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %2f/%2f/%4f %f %*[^\n]';
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
fid=fclose(fid); % close input file
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
% i1=find(c(:,1)==yr(1),1); % first complete year in dataset
% i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
% c=c(i1:i2,:); % save only those entries
% [~,~,iy]=unique(c(:,1)); % indices vector for grouping
% mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
% stn=strtok(d(i).name,'_'); % parse station name from file
end
I observe you took out the line to clean up the end of the file on reading after the disclaimer text altho don't think that should cause this problem specifically.
But, I notice for the given file in the variable editor that c is only [19,NaN,NaN,NaN] so clearly something is either wrong with the input data file not following the rules for the others or somesuch. Maybe you're back to another one of those tab-delimited instead of comma-delimited files, I don't know but that's the cause for the failure; '19' won't match any year.
Now, why that's the returned data is something else related to the input file. You do need to reinsert the cleanup line, however.
Oh, I see you still haven't opened an input file for to collect the output prior to the loop...don't know why this seems such a difficult concept to get across.
And also didn't check before but I see that i is 20 so it's the last file in the list that's got "issues"...
Damith
Damith il 7 Dic 2015
Modificato: Damith il 7 Dic 2015
I fix the issue here is the code below. Now its reading all the files. But, now the problem is how can I store the year information to a matrix ("gage") before I start csvwrite function. Please see my code below.
I am having difficulties to think in a logical manner and to implement what you have mentionend here:
"creating an array of nan(nSta,maxYr) and populating it by row in the loop, saving the station in a linear vector since it's string, not numeric, and then writing it at the end will leave you with the full dataset in memory for further analyses as well"
clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
gage=nan(length(d),length(year));
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
I = ismember(year, yr);
idx=find(I(1,:)==1);
....
....
end
Well, you know you have length(d) files and length(year) years so the first part should be pretty obvious--
mxary=nan(length(d),length(year));
Then
[~,iy]=ismember(year,yr);
mxary(i,iy)=mx;
should put them in the right locations (NB: air-code, untested).
The stn variable this way could be come a cellstring array to hold the station IDs as I presume as per the previous thread they're not the same length so a character string array would require padding.
So what was wrong with the file???
Btw, the above assumes the complete years are not necessarily contiguous in the dataset; if they're known to be then all you need is the the first location index and length to place in the proper location. This is simply an offset calculation based on the starting years.
Also you mention csvwrite above; you can't write a mixed-content nor nonnumeric data with it' you'll have to use fprintf to output the file.
Getting an error:
clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
stn=strtok(d(i).name,'_'); % parse station name from file
[~,iy]=ismember(year,yr);
mxary(i,iy)=mx;
end
I get the following error when I include this line
mxary(i,iy)=mx;
Error:
Subscript indices must either be real positive integers
or logicals.
Debugger to the rescue...
Oh, as said, "air code". It's the first return value from ismember that's the logical array instead of the second...
Damith
Damith il 8 Dic 2015
Modificato: Damith il 8 Dic 2015
Thanks for your guidance again. I figured it out. Here is the code below and it works.
Thers is problem. As you mentioned earlier one file (see the attached file) has imcomplete year of data in the middle of the array. So, it does not work for if that condition prevails.
clear all
cd ('C:\Users\Desktop\test_file3')
myFolder = 'C:\Users\Desktop\test_file3';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=find(c(:,1)==yr(1),1); % first complete year in dataset
i2=find(c(:,1)==yr(end),1,'last'); % last of last complete year
c=c(i1:i2,:); % save only those entries
[~,~,iy]=unique(c(:,1)); % indices vector for grouping
mx=accumarray(iy,c(:,end),[],@max); % get maximum for each year
[~,iy]=ismember(year,yr);
mxary(i,logical(iy))=mx;
stn=strtok(d(i).name,'_'); % parse station name from file
end
So, I am trying to remove incomplete years from "c" in the middle of the array. So, i created an index (idx)
idx=ismember((n<365),c);
but how can I use "idx" to remove incomplete years from "c" before it calculates i1 and i2.?
Any thoughts and help is appreciated here.
You already know which are the complete years; just don't use yr(1):yr(end) but will have to locate those whose years match the remaining values in the data array. Use ismember instead with the yr vector on the year column in the data array. This shouldn't take long to do a sample case at the command line to understand the logic.
Thanks and apologies for late reply. It worked. Now, I am having some troubles wrting this to a csv file using "fprintf" function. Please see the code below. Any help is appreciated here.
clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=ismember(c(:,1),yr);
c=c(i1,:);
[~,~,iy]=unique(c(:,1));
mx=accumarray(iy,c(:,end),[],@max);
[~,iy]=ismember(year,yr);
mxary(i,logical(iy))=mx;
stn=strtok(d(i).name,'_'); % parse station name from file
fileID = fopen('new.csv','w');
if fileID ~= -1
for row = 1 : size(mxary, 1)
fprintf(fileID,'%s,%d,%f\n',stn,length(yr(:,1)),mxary(row,:));
end
fclose(fileID);
end
end
dpb
dpb il 10 Dic 2015
Modificato: dpb il 10 Dic 2015
How many times do you have to be told to open the output file first, not inside the loop? This isn't rocket science...
And, of course, don't close it until after done writing into it...
OK. Figured it out. But, having a hard time writing the " mxary" row by row in to the same csv file corresponding to each row. See the image for station name and year columns.
See the code below:
clear all
cd ('C:\Users\Desktop\test_avg')
myFolder = 'C:\Users\Desktop\test_avg';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
d = dir(filePattern);
fmt='%*s %*d %4f/%2f/%2f %f %*[^\n]';
year=1850:1:2014;
mxary=nan(length(d),length(year));
filename='new.csv';
fileID = fopen(filename,'w');
for i=1:length(d)
fid = fopen(fullfile(myFolder,d(i).name));
c=cell2mat(textscan(fid,fmt,'collectoutput',true,'headerlines',1,'delimiter',','));
fid=fclose(fid); % close input file
c(all(isnan(c),2),:)=[];
yr=unique(c(:,1)); % unique years in file
n=histc(c(:,1),yr); % count entries by year
yr=yr(n==(365+isleapyr(yr))); % years that are complete
i1=ismember(c(:,1),yr);
c=c(i1,:);
[~,~,iy]=unique(c(:,1));
mx=accumarray(iy,c(:,end),[],@max);
[~,iy]=ismember(year,yr);
mxary(i,logical(iy))=mx;
stn=strtok(d(i).name,'_'); % parse station name from file
fprintf(fileID,'%s,%d\n', stn,length(yr));
end
fclose(fileID);

Accedi per commentare.

Categorie

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by