How to interpolate 10 csv files with different x,y,z into a common grid and average them in MATLAB?

95 visualizzazioni (ultimi 30 giorni)
Hello,
I have 12 CSV files from a chromatic confocal sensor.
Each file has three columns: x, y, z.
The problem:
  • Each measurement has different x,y and z positions, since it is spirally scanning. I found x is more deviating than y and z.
  • I want to create a virtual reference surface by averaging these 12 measurements.
  • To do that, I need to interpolate each dataset onto the same (x,y) grid before averaging.
My questions:
  1. How can I load all 10 CSVs and interpolate them onto a common regular grid in MATLAB?
2. What is the best function for interpolation in this case (griddata, scatteredInterpolant, or something else)?
3. Once all are on the same grid, how do I average the z-values to get one final reference surface and save it as a CSV?
Any example code or workflow would be greatly appreciated.
Thanks in advance!
  6 Commenti
dpb
dpb il 26 Set 2025 alle 15:55
Modificato: dpb circa 2 ore fa
d=dir('*.csv');
N=numel(d);
tl=tiledlayout('flow');
mn=nan(N+1,3); mx=mn; % min, max each data set, overall
for i=1:N
data=readmatrix(fullfile(d(i).folder,fullfile(d(i).name)));
%whos data
mn(i,:)=min(data); % minima, by column
mx(i,:)=max(data); % minima, by column
ax=nexttile(tl);
scatter3(ax,data(:,1),data(:,2),data(:,3),3,'marker','.')
end
mn(end,:)=min(mn,[],'omitnan'); % minima, by column
mx(end,:)=max(mx,[],'omitnan'); % minima, by column
[mn mx]
ans = 6×6
-10.9484 -10.9726 -2.9383 10.9476 10.9234 -0.0009 -10.9484 -10.9726 -2.9394 10.9476 10.9234 -0.0013 -10.9484 -10.9726 -2.9397 10.9476 10.9234 -0.0015 -10.9500 -10.9747 -2.9390 10.9497 10.9251 -0.0016 -10.9455 -10.9712 -2.9393 10.9462 10.9212 -0.0018 -10.9500 -10.9747 -2.9397 10.9497 10.9251 -0.0009
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
% look a rsm -- crude just rereads the data
figure
tl=tiledlayout('flow');
for i=1:N
data=readmatrix(fullfile(d(i).folder,fullfile(d(i).name)));
fit22=fit([data(:,1:2)],data(:,3),'poly22')
ax=nexttile(tl);
plot(fit22,[data(:,1:2)],data(:,3),'style','residuals');
end
fit22 =
Linear model Poly22: fit22(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 Coefficients (with 95% confidence bounds): p00 = 0.01599 (0.01568, 0.01629) p10 = -0.000242 (-0.0002866, -0.0001973) p01 = 0.0004827 (0.000438, 0.0005273) p20 = -0.02409 (-0.0241, -0.02409) p11 = 4.456e-07 (-1.006e-05, 1.095e-05) p02 = -0.02409 (-0.0241, -0.02408)
fit22 =
Linear model Poly22: fit22(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 Coefficients (with 95% confidence bounds): p00 = 0.01562 (0.01532, 0.01592) p10 = -0.0002422 (-0.0002869, -0.0001975) p01 = 0.0004829 (0.0004382, 0.0005275) p20 = -0.02409 (-0.0241, -0.02408) p11 = 2.87e-07 (-1.022e-05, 1.08e-05) p02 = -0.02409 (-0.0241, -0.02408)
fit22 =
Linear model Poly22: fit22(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 Coefficients (with 95% confidence bounds): p00 = 0.01548 (0.01518, 0.01578) p10 = -0.0002419 (-0.0002866, -0.0001972) p01 = 0.0004831 (0.0004384, 0.0005278) p20 = -0.02409 (-0.0241, -0.02409) p11 = 2.565e-07 (-1.026e-05, 1.077e-05) p02 = -0.02409 (-0.0241, -0.02408)
fit22 =
Linear model Poly22: fit22(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 Coefficients (with 95% confidence bounds): p00 = 0.01529 (0.01499, 0.01559) p10 = -0.0002429 (-0.0002876, -0.0001982) p01 = 0.0004829 (0.0004383, 0.0005276) p20 = -0.02409 (-0.0241, -0.02409) p11 = 5.449e-07 (-9.965e-06, 1.105e-05) p02 = -0.02409 (-0.0241, -0.02408)
fit22 =
Linear model Poly22: fit22(x,y) = p00 + p10*x + p01*y + p20*x^2 + p11*x*y + p02*y^2 Coefficients (with 95% confidence bounds): p00 = 0.01511 (0.01481, 0.01541) p10 = -0.0002428 (-0.0002875, -0.0001982) p01 = 0.0004833 (0.0004386, 0.000528) p20 = -0.02409 (-0.0241, -0.02409) p11 = 4.298e-07 (-1.008e-05, 1.094e-05) p02 = -0.02409 (-0.0241, -0.02408)
figure
z=fit22([data(:,1:2)]);
whos z
Name Size Bytes Class Attributes z 23690x1 189520 double
[mxz,ixz]=max(z)
mxz = 0.0151
ixz = 11837
[data(ixz,:) mxz]
ans = 1×4
-0.0089 0.0099 -0.0018 0.0151
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
subplot(1,2,1)
plot(fit22,[data(:,1:2)],data(:,3));
xlabel('X'), ylabel('Y'), zlabel('Z')
subplot(1,2,2)
plot(fit22,[data(:,1:2)],data(:,3)); view(90,0), ylabel('Y'), zlabel('Z')
zlim([-0.1 0.02]), ylim([-3 3])
hAx=gca;
hObj=hAx.Children;
hS=hObj(1); hS.FaceAlpha=0.3;
hL=hObj(2); hL.Color='r';
Warning: Hardware-accelerated graphics is unavailable. Displaying fewer markers to preserve interactivity.
From there, the similarity appears to be such that regular sampling between min and max ranges should work just fine; I wasn't thinking of the density of the sampling to be anything nearly so high and so was thinking would possibly need to triangulate rather than using a regular grid. So, pick the minimum and maximum overall from the last line above.
ADDENDUM
Sorry, had to run before...the RSM (response surface model) isn't a bad idea; besides @John D&#39;Errico's most excellent FEX package, if one has the Curve Fitting TB there are some builtin features with it that can simplify use and evaluation.
The above shows the quadratic coefficients are all significant (confidence intervals don't include zero) excepting the xy interaction term, p11 and are all quite similar.
As @Mathieu NOE notes, the max residual is about 0.05 and at the outer edges and is about 0.05/3*100 --> ~1.7%.
Unfortunately, the maximum of the RSM will be at the origin and those average about 0.015 whereas the measured maxima are all essentially zero. This overshoot can be seen in the RH subplot above; the surface looked at along the Z-Y axis shows the data points in red while the surface is blue. The overshoot at the maximum is clearly visible and may not be acceptable for the given purpose. That the quadratic over predicts the interior and underprects the outer observations is also evident in the residuals plots; the observed data are less strongly peaked than the quadratic fits.
An interpolating spline surface might be better or revert to the sampled grid and averaging as your initial proposal.
Unfortunately, I've gotta' run again now...

Accedi per commentare.

Risposta accettata

Mathieu NOE
Mathieu NOE il 26 Set 2025 alle 14:35
Modificato: Mathieu NOE il 26 Set 2025 alle 15:49
hello again
this would be my suggestion :
once we have loaded all the data , we can see the shape looks pretty much like a paraboloid
so my suggestion here instead of trying to average the data is to fit a second order polynomial to your data - this can be done using this excellent tool available on the FEX : polyfitn - File Exchange - MATLAB Central
as a result now you have the equation of the best fitted paraboloid and you can use the model parameters to compute any value given your grid
see code below
here a plot of the result - the fitted model is evaluated for a new griddded x,y data (up to you to modify it)
also a second plot shows the error between the fitted surface and you raw data - here the max error in z direction is about 0.05 (= 1% or you z data amplitude)
NB that even if you reduce the amount of data , the fit is still good. To demonstrate that you can use the commented portion of the code that let you play with a decimation factor (optionnal) . I tested it with up to decim_factor = 100 without noticeable loss of accuracy (of the fit)
hope it helps !
%% main loop - retrieve data and store
% A simple, efficient solution is to use the structure returned by DIR:
P = pwd; % get working directory (or specify it)
S = dir(fullfile(P, '*.csv'));
% init arrays
x = [];
y = [];
z = [];
for k = 1:numel(S)
F = fullfile(S(k).folder, S(k).name);
data = readmatrix(F);
% aggregate data
x = [x; data(:,1)];
y = [y; data(:,2)];
z = [z; data(:,3)];
end
% % decimation (optionnal)
% n = numel(x);
% decim_factor = 10;
%
% x = x(1:decim_factor:n);
% y = y(1:decim_factor:n);
% z = z(1:decim_factor:n);
%% surface fit - polynomial model
% here we fit a paraboloid surface (order 2 polynomial)
s = polyfitn([x(:) y(:)], z(:), 2); % File Exchange by John D'Errico : https://fr.mathworks.com/matlabcentral/fileexchange/34765-polyfitn
[xx,yy] = meshgrid(linspace(min(x),max(x),25), linspace(min(y),max(y),25));
zz = polyvaln(s, [xx(:) yy(:)]);
zz = reshape(zz, size(xx));
% plot points and surface
figure('Renderer','opengl')
hold on
plot3(x, y, z,'r.'); % comment this line if you don't want the points
mesh(xx, yy, zz, 'FaceAlpha',0.4)
surface(xx, yy, zz,'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.4)
hold off
AZ = -37.5;
EL = 30;
view(AZ,EL)
grid on;
colormap(jet);
shading interp
xlabel('X')
ylabel('Y')
zlabel('Z')
colorbar;
% estimate error between data points and fitted curve
zfit = polyvaln(s, [x(:) y(:)]);
max_error = max(abs(zfit-z));
% or as a plot
figure
axis square
scatter(x, y, [], zfit-z ,'filled')
colorbar
grid on;
%% NOTA : difference between surf and surface matlab functions
% The difference is that surf checks first to see if the axes Nextplot property says to replace plot,
% and if so then it does a cla. Then either way it calls surface()
% In other words if "hold on" is not in effect (or the internal equivalent)
% then surf will erase the current plot. surface() will not do that.
% The difference is sort of like the difference between line() and plot(). line() never erases the current plot.
% plot() might erase the current plot, and whether it does or not it will call line to do the drawing work.
  1 Commento
Mathieu NOE
Mathieu NOE circa 14 ore fa
I believe I prefer this slightly modified code
NB that I finally opted for a decimation factor of 10 as the number of points at the end is more than enough to perform a good surface fit
just for demo's sake, I also changed the meshgrid from linear to polar coordinates
and last but not least , that should also interest you (maybe) : 3D Polar Plotting » Pick of the Week - MATLAB & Simulink
updated code :
% main loop - retrieve data and store
% A simple, efficient solution is to use the structure returned by DIR:
P = pwd; % get working directory (or specify it)
S = dir(fullfile(P, '*.csv'));
% init arrays
x = [];
y = [];
z = [];
% decimation (optionnal)
decim_factor = 10;
for k = 1:numel(S)
F = fullfile(S(k).folder, S(k).name);
data = readmatrix(F);
% decimation (optionnal) & aggregate data
n = size(data,1);
ind = 1:decim_factor:n;
x = [x; data(ind,1)];
y = [y; data(ind,2)];
z = [z; data(ind,3)];
end
%% surface fit - polynomial model
% here we fit a paraboloid surface (order 2 polynomial)
s = polyfitn([x(:) y(:)], z(:), 2); % File Exchange by John D'Errico : https://fr.mathworks.com/matlabcentral/fileexchange/34765-polyfitn
% % rectangular (regular) meshgrid
% [xx,yy] = meshgrid(linspace(min(x),max(x),25), linspace(min(y),max(y),25));
% polar coordinates meshgrid
rr = 0:1:15;
thth = (0:.05:1)*2*pi;
[r th] = meshgrid(rr,thth);
xx = r.*cos(th);
yy = r.*sin(th);
zz = polyvaln(s, [xx(:) yy(:)]);
zz = reshape(zz, size(xx));
% plot points and surface
figure('Renderer','opengl')
hold on
plot3(x, y, z,'r.'); % comment this line if you don't want the points
mesh(xx, yy, zz, 'FaceAlpha',0.4)
surface(xx, yy, zz,'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.4)
hold off
AZ = -37.5;
EL = 30;
view(AZ,EL)
grid on;
colormap(jet);
shading interp
xlabel('X')
ylabel('Y')
zlabel('Z')
colorbar;
% estimate error between data points and fitted curve
zfit = polyvaln(s, [x(:) y(:)]);
max_error = max(abs(zfit-z));
% or as a plot
figure
axis square
scatter(x, y, [], zfit-z ,'filled')
colorbar
grid on;
%% NOTA : difference between surf and surface matlab functions
% The difference is that surf checks first to see if the axes Nextplot property says to replace plot,
% and if so then it does a cla. Then either way it calls surface()
% In other words if "hold on" is not in effect (or the internal equivalent)
% then surf will erase the current plot. surface() will not do that.
% The difference is sort of like the difference between line() and plot(). line() never erases the current plot.
% plot() might erase the current plot, and whether it does or not it will call line to do the drawing work.

Accedi per commentare.

Più risposte (2)

Walter Roberson
Walter Roberson il 24 Set 2025 alle 17:56
There are a few main approaches:
  1. find the aggregate minimum and aggregate maximum over all of the x (y, z) values. Use linspace() to create a uniform vector. Interpolate the actual readings, using NaN for the case where an individual dataset did not extend to the full range
  2. find the aggregate minimum and aggregate maximum over all of the x (y, z) values. Use linspace() to create a uniform vector. Interpolate the actual readings, using extrapolation for the case where an individual dataset did not extend to the full range
  3. find the range of x (y, z) values that is common to all of the files. Use linspace() to create a uniform vector, discarding the data that lies outside of the common range
  4. find the union of all of the x (y, z) values. Interpolate the actual readings to the union grid, using NaN for the case where an individual dataset does not extend to the full range. This approach has the advantage of exactly representing all of the original data, with interpolation between samples to accomodate the data from the other datasets. This approach has the disadvantage that the resulting grid will not be regular.
  5. find the union of all of the x (y, z) values. Interpolate the actual readings to the union grid, using extrapolation for the case where an individual dataset does not extend to the full range. This approach has the advantage of exactly representing all of the original data, with interpolation between samples to accomodate the data from the other datasets. This approach has the disadvantage that the resulting grid will not be regular.
  1 Commento
Jes
Jes il 26 Set 2025 alle 12:32
Thank you for the answer @Walter Roberson. Actually i am new to MATLAB and i created a code with the help of AI. But i am not sure whether that is ok or not:

Accedi per commentare.


dpb
dpb il 27 Set 2025 alle 16:24
Modificato: dpb il 28 Set 2025 alle 18:57
OK, I'm back!!! <g>
d=dir('*.csv');
N=numel(d);
mn=nan(1,3); mx=mn; % min, max overall
for i=1:N
data=readmatrix(fullfile(d(i).folder,fullfile(d(i).name)));
mn=min(mn,min(data),'omitnan');
mx=max(mx,max(data),'omitnan');
end
%[mn mx]
Nx=100; Ny=Nx; % pick a number of bins each direction
xe=linspace(mn(1),mx(1),Nx+1); % set bin edges each direction for that many bins
ye=linspace(mn(2),mx(2),Ny+1);
C=zeros(Nx,Ny); % overall counts per bin accumulator
A=C; % overall averaged response each bin
for i=1:N
data=readmatrix(fullfile(d(i).folder,fullfile(d(i).name)));
[c,~,~,ix,iy]=histcounts2(data(:,1),data(:,2),xe,ye); % discretize each array into same bins
C=C+c; % accumulate total number of counts in each bin
A=A+accumarray([ix iy],data(:,3),size(c)); % and the height values
end
A=A./C; % the average is the total divided by count
x=movmean(xe,2,'Endpoints','discard'); % convert bin edges to midpoints
y=movmean(ye,2,'Endpoints','discard');
surf(x,y,A)
max(A,[],'all','omitnan') % check what the averaged max actually is
ans = -0.0018
does the interpolation by aggregation of the data; it has the advantage of being regular and not extrapolating nor having a fitting error. It has the disadvantage that as the binning becomes finer, there are missing bins but one does have a pretty smooth surface to infill with interpolation.
The script is somewhat inefficient since it reads the data twice; first to find the overall range and then again to do the binning. This approach was taken in case the total number of files to process can't all be held in memory at once to do a global search.
One could arbitrarily set the min/max range a priori or from other source and avoid the first loop, of course, or get more sophisticated and use some of the MATLAB tools for processing large datasets.
The only real "trick" here is the use of accumarray with the 2D indexing vectors; the doc for it is rather dense, but it is very powerful tool. Much of its capability has more recently been provided in higher-level tools within tables and/or timetables, but for plain vector data it remains key and is quick even for large datasets.
ADDENDUM:
The above set bin edges to the min/max of the observed data; alternatively, one could set first and last bin centers to that range (or, of course, use aribtrary values). To set bin centers to match the range, use
Nx=100; Ny=Nx; % pick a number of bins each direction
dxy=(mx(1:2)-mn(1:2))./([Nx Ny]-1)/2; % half a bin width to match range each direction
xe=linspace(mn(1)-dxy(1),mx(1)+dxy(1),Nx+1); % set bin edges each direction for that many bins
ye=linspace(mn(2)-dxy(2),mx(2)+dxy(2),Ny+1);
x=movmean(xe,2,'Endpoints','discard'); % convert bin edges to midpoints
y=movmean(ye,2,'Endpoints','discard');
xe=linspace(mn(1),mx(1),Nx+1); % set bin edges each direction for that many bins
ye=linspace(mn(2),mx(2),Ny+1);
C=zeros(Nx,Ny); % overall counts per bin accumulator
A=C; % overall averaged response each bin
for i=1:N
data=readmatrix(fullfile(d(i).folder,fullfile(d(i).name)));
[c,~,~,ix,iy]=histcounts2(data(:,1),data(:,2),xe,ye); % discretize each array into same bins
C=C+c; % accumulate total number of counts in each bin
A=A+accumarray([ix iy],data(:,3),size(c)); % and the height values
end
A=A./C; % the average is the total divided by count
surf(x,y,A)
which sets the first and last bin edges at a half-bin width below and above the observed min and max values, respectively.
As a side note on histcounts2, I think one should be able to also to specify bin midpoints as well as the number of bins and edges. However, in looking at the documentation more closely, I observed the first case of using the min/max as the bin edge limits can be done with a named parameter pair, as
[c,~,~,ix,iy]=histcounts2(data(:,1),data(:,2),[Nx Ny], ...
'XBinLimits',[mn(1) mx(1)], ...
'YBinLimits',[mn(2) mx(2)]);
  8 Commenti
dpb
dpb circa 3 ore fa
As the comment @Mathieu NOE inserted, polyfitn is from the FEX; you have to download it and have it somewhere on the search path to use it.
As I illustrated, you can use the Curve Fitting TB fit function builtin model to do the same thing if you already have it.

Accedi per commentare.

Categorie

Scopri di più su Linear and Nonlinear Regression in Help Center e File Exchange

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by