Piecewise Polynomial fitting for data

I need to use curve fitting on a time series data, as the data is large (a week), fitting a single polynomial curve will not represent the true data. Therefore, solution i can come up with (i don't know if there exists such a solution) I have to fit the data for first 24 hours (a reading at every half hour so 48 data points during a day) and keep the loop running for all 7 days (the data is in a single file). I have tried to code it but am getting error and secondly i cannot understand how to save results for all loops as i will need to reconstruct the fitted curve so i need the data and i will need the RMSE for each curve. Also is there a way to determine which polynomial fits the data best ie.e. minimum RMSE without applying all of polynomial fittings programatically.
The code i could think of is given below, I will appreciate if some one can help me with it.
i=7
j=48
for i=1:7
for j=1:48:48
[Fit5, gof5] = fit( x([1:j]), y([1:j]), 'poly5' );
coeff5=coeffvalues(Fit5);
end
end

Risposte (3)

Star Strider
Star Strider il 12 Mag 2017
Consider the Signal Processing Toolbox sgolayfilt (link) function. It may do exactly what you want.

1 Commento

dpb
dpb il 12 Mag 2017
Modificato: dpb il 12 Mag 2017
Given OP's followup, +1
And, if doesn't have SP TB but does have Curve Fitting, then there's the smooth function which has that as one of many optional forms.

Accedi per commentare.

dpb
dpb il 12 Mag 2017
Fitting a polynomial is probably not going to work all that well and certainly a fifth-order one is likely quite unstable.
W/o data to see what it is the curve looks like, hard to give any real conclusive answer, but if the purpose is interpolation, consider piecewise splines instead.
That aside, your code above has a problem with j=1:48:48 as a loop count expression; it is just j=1
Altho it is not my recommendation to do this (see above note), a more concise way to operate over your set of equi-spaced data would be to reshape the vectors to 2D array and operate by column...
x=reshape(x,48,[]).'; % arrange x, y by day (48 observations/day)
y=reshape(y,48,[]).'; % as column arrays
nc=size(x,2) % number columns --> days
nDeg=5; % the poly degree (your value, see above notes)
b=zeros(nDeg+1,nc)) % allocate room for the coefficients
for i=1:nc
b(:,i)=polyfit(x(:,i),y(:,i),nDeg); % and do the fitting
end
Undoubtedly the above will give some numerical issues warnings; (see notes above) but if you're adamant about trying, use the optional output variables as documented for polyfit to at least standardize the design matrix before solving. Then you'll need to add to the saved results the output structure returned to use for the evaluation similarly as to the coefficients array above.
Did I say I don't recommend this, yet? :) Look at splines and give us a sample (smallish) dataset...

13 Commenti

Thanks for the reply. I do understand that polynomial can become unstable on the boundries, however i really am not using it to forecast anything, rather i am just trying to smooth shapes. Spline is a good option but the level of variations in splne are too much and i want to reduce variations that is why i can accept a degree of error. Is there any particular reason for not recommending polynomial apart from polynomial being unstable at ends?
It ain't necessarily the ends alone...it'll have wiggles all along depending upon the shape of the input. It is also totally unconstrained as to how big or where those inflections are.
Secondly I have implemented your code and it works very well. But how can i determine RMSE of different degrees of polynomial? I mean i want to start with polynomial of second degree and want to select lowest degree of polynomial with reasonably good RMSE. Can it be coded for automation?
I'd feel much more at ease here if had some data to observe... :)
If you have the Curve Fitting TB, look at
doc fit % more general fit object has more outputs
If not, the optional second output of polyval includes the norm of the residuals which is dependent on the quality of fit. Of course, compute Rsq is pretty-much trivial
Rsq = 1 - SSE/SST
and
SST=dot(y-meany);
SSE=S.normr^2;
As you add terms, Rsq becomes more biased as don't account for the number of terms in the fit, so probably you'll want to look at Radj. Once you've got Rsq, this too is trivial
N=length(y);
Radj = 1 - (1-Rsq)*(N-1)/(N-k-1);
where k is degree of the polynomial.
Did I say I'd be a lot more at ease with providing this if could see some sample data, yet...???
Note that I very deliberately choose not to wade into this question, even though I know a moderate amount about piecewise modeling. :)
A big reason for that is because I expect the OP will not (SHOULD not) be happy with the result of such a piecewise fit, since each segment is being computed with no relation to its neighbors. Since higher order polynomial segments will have their largest uncertainty at the ends of the support, the fit will be crappy across each arbitrarily chosen break point.
That will surely be a problem down the line. Effectively, the OP is trying to use a home-brewed spline fit, but without any kind of apparent understanding of what splines are or how to fit them.
The reason I'm commenting now is because of the request for automatic selection of model order here. Note that it is very easy to overfit polynomial models, especially if you leave the task to be done automatically.
My recommendation would be to use a spline of some ilk, so smoothing or a least squares spline are options. A Savitsky-Golay smoother is another good choice, depending on the goals for the result. If you are just looking to smooth some data for a better plot, then just use a Savitsky-Golay, or some other simple filter. If the goal is predictive ability, then a spline is arguably correct.
dpb
dpb il 12 Mag 2017
Modificato: dpb il 12 Mag 2017
+~23... :)
Only sometimes it seems folks can only be convinced of a bad idea after it fails... :(
dpb and John D'Errico , both of you are convincing. First of all please find attached my data which is y data and x data is .5,1,1.5,...,8760. The y data has two columns, each column needs to be fitted (each column represents one sample and i have a population with more than 100 such data samples) I need to fit this data and can you guys recommend me a suitable alternative. At the end i will need the fitted values for each sample or column and will need to select a fit which suits the data best. What is your recommendation and as i am not good with coding, can you give me a little help with code as well please.
"i will need the fitted values for each sample or column and will need to select a fit which suits the data best." <=== What is the difference between those two needs? A fit will be the best fit for the assumptions you are making, so it will already be the best. Those two needs are one and the same. You could make other fits for each column, like use different window widths or different polynomial orders. But you have to define what the overall best means to you. Like I said, each one if the best for the assumptions/parameters you used. Just selecting the parameter set with the lowest mean absolute deviation or RMS does not mean that is necessarily the best (because the highest order fit that hugs the noisy curve almost exactly will, of course, have the lowest MAD).
Ugh.
Crappy data. An apparent discontinuity in the blue curve. A clear outlier in the green curve. Note that outliers will blow away any polynomial fit. Even a straight line fit will be hurt badly. A robust fitting tool can be a good idea, like robustfit from the stats toolbox. But robustfit is not designed to fit a polynomial like polyfit does, so you would need to feed it the proper matrix.
As well, robustfit cannot be of much help in deciding which polynomial order to use, were you to use a polynomial. Worse, you may be lacking any good statistical methods to decide a reasonable polynomial order, in the presence of significant outliers.
Worse yet. The presence of significant outliers in a short series of points will cause even tools like robustfit to have problems trying to fit a polynomial model.
UGH.
sgolayfilter and smooth function seem to produce reasonably well results. But for sgolayfilter again we need to decide a degree of polynomial. Whereas smooth function with moving average seems to be a good option for smoothing. I am no interested in coefficients, all i need is the final fitted data and smooth function gives the data. When you say moderately high frame size, what would you say for data with a dimension of 1*17520 a suitable frame size will be? All this is experimentation and the data is energy consumption data. I am intending to use this data to generate a smoother data which can be used for load modeling. Although i am not intending to do it but if we can produce a good model, it can be used for forecasting purpose as well. So with previous discussions, polynomial fitting is out of window i.e. using high degree polynomials. This leaves me with polynomials of lower degree using sgolayfilter but i am not sure about the framelength and degree of polynomial. In case of smooth function i just need to understand how frame size needs to be decided. DO you guys think i am going in the right direction?
Ewww...there's a typo in the title--that's for 1830 hr, not 1630...
The data is volatile so you will see some seasonal patterns but not necessaryily. Fitting a distribution is not a very good idea in my case. I will still prefer to smooth the data. Will leave the idea of prediction for now and smoothing moving average produces good results which is sufficient for load modelling.
Well, whatever, it is your problem. I've removed the annoying info.

Accedi per commentare.

Image Analyst
Image Analyst il 13 Mag 2017
See my canned Savitzky-Golay filter demos, attached. Don't use a filter order more than about 2 or 3 or you won't see much smoothing. A 5th order polynomial will hug the curve fairly closely and not provide much smoothing.

Categorie

Richiesto:

il 12 Mag 2017

Commentato:

dpb
il 16 Mag 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by