Extracting Data from Cell array with multiple values per column

5 visualizzazioni (ultimi 30 giorni)
Hi, I have this cell array of 4 columns, but in each column there are 3 values.
RESHAPED
D2 =
195×4 cell array
{[" 95.288 95.307 95.…"]} {[" 131.028 131.028 131.…"]} {[" 0.745516 0.745517 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.4433 95.4633 95.4…"]} {[" 131.028 131.028 131.…"]} {[" 0.745511 0.745509 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.6074 95.6285 95.6…"]} {[" 131.028 131.028 131.…"]} {[" 0.745486 0.745481 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.7799 95.802 95.8…"]} {[" 131.028 131.028 131.…"]} {[" 0.745438 0.745432 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.9603 95.9834 96.0…"]} {[" 131.028 131.028 131.…"]} {[" 0.745381 0.745373 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
How can I create 4 columns of 1D vectors, (and numeric values not strings), where each of the 3 values in a row are stacked under each other vertically like this
95.288 131.028 0.745516 -3.99239E-008
95.307 131.028 0.745517 -3.99239E-008
95.… 131..... 0.745… -3.99239E-008
95.4433 131.028 0.745511 -3.99239E-008
  9 Commenti
the cyclist
the cyclist il 19 Giu 2025
That's all well and good, but I don't see how it avoids the hard problem. Look at row 125 of D2 (which I transposed here, so it is easier to see):
load("testdata.mat","data")
D2=reshape(data,[],4);
D2(125,:)'
ans = 4×1 cell array
{[" 117.76 117.79 117.82 117.85 117.88 117.91 117.94 117.97"]} {[" 131.021 131.021 131.021 131.021 131.021 131.021 131.021 131.021"]} {[" 0.740383 0.740383 0.740383 0.740383 0.740383 0.740383 0.740383 0.740383"]} {["5.53099E-0066.03304E-0066.53553E-0067.04676E-0067.57499E-0068.1279E-006" ]}
We need to parse this row, because the first cell has values in 100-120. But 4th cell has the problem I mentioned, that there is no whitespace separting the numbers. And there seem to be only 6 numbers there, not the 8 I might expect from the other elements. And the numbers in the 4th cell don't occupy just 8 characters.
My impression is that you need to inspect your data more carefully, come up with all the steps of the algorithm (i.e. the "rules") that gives what you need, and then get the coding help you need here.
J. Alex Lee
J. Alex Lee il 24 Giu 2025
Modificato: J. Alex Lee il 25 Giu 2025
To take nothing for granted and leave no stone unturned: if you have control over the data output, work on that rather than undoing the mess at the data import stage.
That said, starting with the 4 columns, here's a start for the parsing problem the cyclist highlighted: assume that the exponent always has 3 digits (with an optional minus sign), then my mind goes to regexp. The following works, but returns different size arrays for each column in your reshaped cell array - if you are expecting 1560 rows from the data set, then it seems the 3rd and 4th columns are so malformed they don't even return the right number of data points (assuming the parsing worked). If that is the case, then even if you only want rows where some conditions are met on the first 2 columns, you would not be sure that you can safely associate the rows.
load("testdata.mat")
% reshaped dimensions are given/known
data_r = reshape(data,[],4);
col = cell(1,4); % pre-allocate
for i = 1:4
% convert the cell column to string
tmp = string(data_r(:,i));
% join everything together into a super string to parse
tmp = join(tmp);
% parse using regexp
% key assumption: exponent always has 3 digits
parsed = reshape(regexp(tmp,"[+-]?(\d+\.\d+|\d+)([eE][+-]?\d{3})?","match"),[],1);
parsed = str2double(parsed);
parsed(parsed==-999) = nan;
nVars(i) = numel(parsed);
col{i} = parsed;
end
% pad all data to max number of rows to inspect in table form
nVarsMax = max(nVars)
nVarsMax = 1560
for i = 1:4
p = nVarsMax - nVars(i);
col{i} = padarray(col{i},p,nan,'post');
end
T = table(col{:});
head(T,15)
Var1 Var2 Var3 Var4 ______ ______ _______ ___________ 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08
tail(T,15)
Var1 Var2 Var3 Var4 ______ _______ ____ ____ 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN

Accedi per commentare.

Risposta accettata

Pavel
Pavel il 25 Giu 2025
I noticed big problems with array dimensions in columns 3 and 4, this will not allow to get arrays of type double. Here is some combinations of lengths:
C =
8 8 6 8
8 8 8 6
8 8 8 7
8 8 8 8
For this task I got the result by expanding arrays to the same size (filling missing elements with zeros), but I would recommend to check the correctness of the initial data recording, maybe there are problems there.
%% Source data reading
load('testdata.mat')
D2=reshape(data,[],4)
%% Converting data
% add spaces for all problem strings in cellarray, "5.53099E-0066.03304E-0066.53553E-0067.04676E-0067.57499E-0068.1279E-006"
D2 = cellfun(@(x) regexprep(x,'([-\d\.]+E-?\d{3})|(.+)','$1 '),D2,'UniformOutput',false);
% convert strings to nums inside cells
D2 = cellfun(@(x) str2num(x),D2,'UniformOutput',false);
%% Check cells data lenght
data_length = cellfun(@(x) length(x),D2); % аггау with length of all elements
C= unique(data_length,'rows') % unique combinations
C =
8 8 6 8
8 8 8 6
8 8 8 7
8 8 8 8
max(data_length) % now we can see that max length is 8 - it will be our target length
ans =
8 8 8 8
%% Add zeroes to short rows (when length of row < 8)
D2 = cellfun(@(cell_element) fill_with_zeros_broken_data(cell_element), D2,'UniformOutput',false)
%% Reshaping
result = reshape([D2{:}],[],4)
%% Local function
function cell_element = fill_with_zeros_broken_data(cell_element)
if length(cell_element)<8
cell_element(8)=0;
end
end

Più risposte (1)

Pavel
Pavel il 24 Giu 2025
Modificato: Pavel il 24 Giu 2025
Hi! Here is a short 3 line solution, but it may be a bit hard to understand in some places. I will leave comments, but feel free to ask me questions if anything is unclear.
Sample data:
testdata = ...
{[" 95.288 95.307 95.1"], [" 131.028 131.028 131.1"], [" 0.745516 0.745517 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.4433 95.4633 95.4"], [" 131.028 131.028 131.1"], [" 0.745511 0.745509 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.6074 95.6285 95.6"], [" 131.028 131.028 131.1"], [" 0.745486 0.745481 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.7799 95.802 95.8"], [" 131.028 131.028 131.1"], [" 0.745438 0.745432 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.9603 95.9834 96.1"], [" 131.028 131.028 131.1"], [" 0.745381 0.745373 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"]}
testdata = 5×4 cell array
{[" 95.288 95.307 95.1"]} {[" 131.028 131.028 131.1"]} {[" 0.745516 0.745517 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.4433 95.4633 95.4" ]} {[" 131.028 131.028 131.1"]} {[" 0.745511 0.745509 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.6074 95.6285 95.6" ]} {[" 131.028 131.028 131.1"]} {[" 0.745486 0.745481 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.7799 95.802 95.8" ]} {[" 131.028 131.028 131.1"]} {[" 0.745438 0.745432 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.9603 95.9834 96.1" ]} {[" 131.028 131.028 131.1"]} {[" 0.745381 0.745373 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]}
Solution
The data in the last column needs to be split: we'll add a space between the string numbers.
{["-3.99239E-008-3.99239E-008-3.99239E-008"]} % we'll repalce this
{["-3.99239E-008 -3.99239E-008 -3.99239E-008"]} % with this
The fast way to do this is with regular expressions. To apply the regular expression to each cell, we'll use the cellfun():
testdata(:,end) = cellfun(@(x) regexprep(x,'(-.+?E.+?)(?=-|$)','$1 '),testdata(:,end),'UniformOutput',false)
testdata = 5×4 cell array
{[" 95.288 95.307 95.1"]} {[" 131.028 131.028 131.1"]} {[" 0.745516 0.745517 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.4433 95.4633 95.4" ]} {[" 131.028 131.028 131.1"]} {[" 0.745511 0.745509 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.6074 95.6285 95.6" ]} {[" 131.028 131.028 131.1"]} {[" 0.745486 0.745481 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.7799 95.802 95.8" ]} {[" 131.028 131.028 131.1"]} {[" 0.745438 0.745432 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.9603 95.9834 96.1" ]} {[" 131.028 131.028 131.1"]} {[" 0.745381 0.745373 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]}
OK, now we can convert strings to numbers inside cells:
testdata = cellfun(@(x) str2num(x),testdata,'UniformOutput',false)
testdata = 5×4 cell array
{[95.2880 95.3070 95.1000]} {[131.0280 131.0280 131.1000]} {[0.7455 0.7455 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.4433 95.4633 95.4000]} {[131.0280 131.0280 131.1000]} {[0.7455 0.7455 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.6074 95.6285 95.6000]} {[131.0280 131.0280 131.1000]} {[0.7455 0.7455 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.7799 95.8020 95.8000]} {[131.0280 131.0280 131.1000]} {[0.7454 0.7454 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.9603 95.9834 96.1000]} {[131.0280 131.0280 131.1000]} {[0.7454 0.7454 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]}
Now we need to get the data out of cells. The main hack here is to use [testdata{:}] which creates a row of doubles. Then all that remains is to split it into 4 columns with reshape():
result = reshape([testdata{:}],[],4)
result = 15×4
95.2880 131.0280 0.7455 -0.0000 95.3070 131.0280 0.7455 -0.0000 95.1000 131.1000 0.7450 -0.0000 95.4433 131.0280 0.7455 -0.0000 95.4633 131.0280 0.7455 -0.0000 95.4000 131.1000 0.7450 -0.0000 95.6074 131.0280 0.7455 -0.0000 95.6285 131.0280 0.7455 -0.0000 95.6000 131.1000 0.7450 -0.0000 95.7799 131.0280 0.7454 -0.0000 95.8020 131.0280 0.7454 -0.0000 95.8000 131.1000 0.7450 -0.0000 95.9603 131.0280 0.7454 -0.0000 95.9834 131.0280 0.7454 -0.0000 96.1000 131.1000 0.7450 -0.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Zeros in the last column may be a little confusing, but when changing the display format we see that the result is correct:
format shortE
result
result = 15×4
9.5288e+01 1.3103e+02 7.4552e-01 -3.9924e-08 9.5307e+01 1.3103e+02 7.4552e-01 -3.9924e-08 9.5100e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5443e+01 1.3103e+02 7.4551e-01 -3.9924e-08 9.5463e+01 1.3103e+02 7.4551e-01 -3.9924e-08 9.5400e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5607e+01 1.3103e+02 7.4549e-01 -3.9924e-08 9.5629e+01 1.3103e+02 7.4548e-01 -3.9924e-08 9.5600e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5780e+01 1.3103e+02 7.4544e-01 -3.9924e-08 9.5802e+01 1.3103e+02 7.4543e-01 -3.9924e-08 9.5800e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5960e+01 1.3103e+02 7.4538e-01 -3.9924e-08 9.5983e+01 1.3103e+02 7.4537e-01 -3.9924e-08 9.6100e+01 1.3110e+02 7.4500e-01 -3.9924e-08
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
  4 Commenti
Pavel
Pavel il 25 Giu 2025
@J. Alex Lee @Stephen23 Sorry, I haven't read the thread until now and didn't notice the message with the source file. I'll redo the regular expression now, but I see that there are big problems with the dimensions of the arrays in the cells...
J. Alex Lee
J. Alex Lee il 25 Giu 2025
Rather than use regexp to find only E's and add a space and split the string up later, you can just match regexp instances directly on the full expression of a value in scientific notation (with or without E).
Google search gave me this expression - modified at the end with \d{3} to assert that there will always be 3 digits after the "e".
"[+-]?(\d+\.\d+|\d+)([eE][+-]?\d{3})?"
As Pavel noted, the number of data in each column are inconsistent as demonstrated in my comment.

Accedi per commentare.

Categorie

Scopri di più su Numeric Types in Help Center e File Exchange

Tag

Prodotti


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by