Preallocating a complex structure

Hi! This is my first time asking a question here so forgive me if I am breaking any rules.
I am dealing with 19,000+ samples data of oil production and my objective is to perform various calculations and curve fitting and plot and save the results of each sample in a separate figure. However, this process in a loop takes 15-16 hours! I tried putting 'cla' after each figure was generated and saved, based on answers to a similar question here but that did not help me much. Reading a little further, looks like the issue is because I am not pre allocating my structure.I know how to pre allocate a simple structure but I am not sure if I am doing it right in my case, where there are structures and matrices within the structure.
clc
clear
close all
%%loading and defining data
load('middleton_nuM.mat')
load('middleton_date.mat')
[n,m]=size(middleton_nuM);
thold=30;
ft=fittype('power1');
data=struct('well_id',zeros(1,n),'start_date',cell(1,n),'end_date',cell
(1,n),'prod',zeros(1,n),'keep',zeros(1,n),'diff',zeros(1,n),'dxdy_sort',zeros(1,n),'dxdy2',zeros(1,n),'f',zeros(1,n),'GOF',zeros
(1,n),'Rsq2',zeros(1,n),'cumfit',zeros(1,n),'cumall',zeros(1,n),'der',zeros
(1,n),'std',zeros(1,n),'ratio',zeros(1,n));
This is what I did but it clearly is not working to give me any advantage in processing time. Any help in sorting this out will be greatly appreciated! I have attached a screenshot of my structure.

9 Commenti

I'd edited your post to format it correctly, and you must have edited it at the same time and gave it another incorrect format.
Please remove all these extra blank lines you've just added, then select all the lines of code and press the {} Code button to get it formatted properly.
Thank you Guillaume! Sorry about that!
Stephen23
Stephen23 il 14 Feb 2018
Modificato: Stephen23 il 14 Feb 2018
Have you run the profiler? Which parts of your code run the slowest? Have you written your code following the recommendations in the MATLAB documentation?:
"I tried putting 'cla' after each figure was generated and saved"
It often helps to create just one figure at the start of processing, and thereafter update/replace its contents. Continually creating and deleting figures is slow.
"Reading a little further, looks like the issue is because I am not pre allocating my structure..."
Do you mean the structure itself, or the fields within the structure? Often it is not required to preallocate the fields of a structure at all: "Of course it depends on your specifics, but since each field is its own MATLAB array, there is not necessarily a need to initialize them all up front"
"Any help in sorting this out will be greatly appreciated! I have attached a screenshot of my structure."
Your structure in and of itself is not the reason for slow code (it is not very large). We need to see your code if you want help with this.
Amlan Rath
Amlan Rath il 14 Feb 2018
Modificato: Stephen23 il 16 Feb 2018
This is the full code in case you are wondering.No I haven't tried the profiler. Let me do that! thank you for your suggestion.
Edit: attached code as file
Guillaume
Guillaume il 14 Feb 2018
Modificato: Guillaume il 14 Feb 2018
I haven't looked into detail at the processing loop code and it's time for bed here but the whole if ... elseif ... elseif can be replaced by
cats = {'0.0-0.8', '0.8-0.9', '0.9-1.0', etc.}; %outside the loop!
%... inside the loop
catidx = discretize(ratio, [0, 0.8:0.1:2]);
data(i).cat = cats{catidx};
This is a strange code. You calculate some values, plot them and then get the XData and YData from the plotted lines - but you do have the data already?!
[data(i).f,data(i).GOF] = fit(xthold,ythold,ft, opts);
plot(data(i).f(1:length(data(i).prod)))
figh=gcf;
H=findobj(figh,'type','line');
x_data=get(H,'xdata');
y_data=get(H,'ydata');
What about:
[data(i).f,data(i).GOF] = fit(xthold,ythold,ft, opts);
x_data = 1:length(data(i).prod);
y_data = data(i).f(x_data);
This can be simplified:
if (0.0<= ratio)&&(ratio <0.8)
data(i).cat='0.0-0.8';
elseif (0.8<= ratio)&&(ratio <0.9)
data(i).cat='0.8-0.9';
...
to
if ratio < 0.8
data(i).cat = '0.0-0.8';
elseif ratio <0.9
data(i).cat='0.8-0.9';
...
I assume, that the most time is spend inside fit(), but use the profiler to find the bottleneck of the code at first. Optimizing a piece of the code, which takes 2% of the processing time only, will give less than 2% acceleration only.
Amlan Rath
Amlan Rath il 15 Feb 2018
Modificato: Amlan Rath il 15 Feb 2018
Jay Simmons The curve fitting was only done to a certain number of months and then extrapolated when the curve changed shape (due to sudden increase in production).
The xdata and ydata are the extrapolated values assuming that the curve never changed shape and there was no sudden increase in production.
I ran the profiler for the full length of data (screenshot attached). saveas takes almost 16 hours! followed by print.
But I do need all the graphs for visual inspection so I am guessing there is no way around it?
@Amlan Rath: if you really are serious about reducing the runtime then you need to make some changes to how your write code. The slowest parts of your code are related to the graphics, so that is where you should focus your attention:
  • Convert all scripts to functions.
  • Do NOT call clear, close, or clc.
  • Do NOT print/display anything in the command window.
  • Instead of creating a new figure for each plot use just one figure and update its contents.
  • Obtain and use explicit graphics handles for all relevant graphics objects.
  • Save the figures as .fig files first (you can easily post-process to convert to raster image format).
  • Always load into an output variable (which is a structure: S = load(...).
Read the first link that I gave you in my earlier comment.
Jan
Jan il 16 Feb 2018
@Amlan Rath: My name is "Jan Simon". Calling me "Jan" in the forum is short and polite.
The output of the profiler shows, that the most time is spent inside saveas, printingGenerateOutput and loadPrefs. This is strange. Are you working on a network drive with a very slow connection speed? I do not see, why loadPrefs is called 11730 times. I do not see also, why saving these more or less easy figures takes so much time.

Accedi per commentare.

Risposte (1)

Guillaume
Guillaume il 14 Feb 2018
Modificato: Guillaume il 14 Feb 2018
The code you wrote does preallocate the structure, with the correct size though probably by mistake. It also unnecessarily preallocate a fair number of big vectors that are going to get replaced by your loop.
A more efficient way of allocating that structure would be simply:
data = struct('well_id', cell(1, n), 'start_date', [], 'end_date', [], ... and so on for the other fields regardless of their type)
You only need one of the input to be a cell array to create a structure array and if you do so, you do not want any of the other input to be a vector as it will create n of these 1xn vectors, one for each of your structure element.
While the above and your original code does allocate a 1xn structure, it does not preallocate any of the vectors, cell arrays, structures, etc. that are going to go into each of the fields. These are still going to be created in your loop and unfortunately there's nothing you can do about that because they're all different size.
The slow speed of your loop probably does not come from the lack of preallocation in any case.
edit: Now that you've posted the code
Most of the structure could be created without a loop, but it would still need a loop of some sort for the nan so I'm not sure you could gain much speed for the filling of the structure. I strongly suspect that the slow part is the processing loop, which is completely independent of the way the structure is allocated and filled.

1 Commento

Amlan Rath
Amlan Rath il 14 Feb 2018
Modificato: Amlan Rath il 14 Feb 2018
So is there any other way to speed things up?

Accedi per commentare.

Categorie

Scopri di più su Creating, Deleting, and Querying Graphics Objects in Centro assistenza e File Exchange

Richiesto:

il 14 Feb 2018

Commentato:

Jan
il 16 Feb 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by