Extracting Data from Cell array with multiple values per column

Question

Jason il 19 Giu 2025

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2177913-extracting-data-from-cell-array-with-multiple-values-per-column

Commentato: J. Alex Lee il 25 Giu 2025

Risposta accettata: Pavel

Apri in MATLAB Online

Hi, I have this cell array of 4 columns, but in each column there are 3 values.

RESHAPED

D2 =

195×4 cell array

{[" 95.288 95.307 95.…"]} {[" 131.028 131.028 131.…"]} {[" 0.745516 0.745517 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}

{[" 95.4433 95.4633 95.4…"]} {[" 131.028 131.028 131.…"]} {[" 0.745511 0.745509 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}

{[" 95.6074 95.6285 95.6…"]} {[" 131.028 131.028 131.…"]} {[" 0.745486 0.745481 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}

{[" 95.7799 95.802 95.8…"]} {[" 131.028 131.028 131.…"]} {[" 0.745438 0.745432 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}

{[" 95.9603 95.9834 96.0…"]} {[" 131.028 131.028 131.…"]} {[" 0.745381 0.745373 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}

How can I create 4 columns of 1D vectors, (and numeric values not strings), where each of the 3 values in a row are stacked under each other vertically like this

95.288 131.028 0.745516 -3.99239E-008

95.307 131.028 0.745517 -3.99239E-008

95.… 131..... 0.745… -3.99239E-008

95.4433 131.028 0.745511 -3.99239E-008

9 Commenti
Mostra 7 commenti meno recentiNascondi 7 commenti meno recenti

the cyclist il 19 Giu 2025

Apri in MATLAB Online

testdata.mat

That's all well and good, but I don't see how it avoids the hard problem. Look at row 125 of D2 (which I transposed here, so it is easier to see):

load("testdata.mat","data")
D2=reshape(data,[],4);
D2(125,:)'
ans = 4×1 cell array
    {["    117.76    117.79    117.82    117.85    117.88    117.91    117.94    117.97"]}
    {["   131.021   131.021   131.021   131.021   131.021   131.021   131.021   131.021"]}
    {["  0.740383  0.740383  0.740383  0.740383  0.740383  0.740383  0.740383  0.740383"]}
    {["5.53099E-0066.03304E-0066.53553E-0067.04676E-0067.57499E-0068.1279E-006"         ]}

We need to parse this row, because the first cell has values in 100-120. But 4th cell has the problem I mentioned, that there is no whitespace separting the numbers. And there seem to be only 6 numbers there, not the 8 I might expect from the other elements. And the numbers in the 4th cell don't occupy just 8 characters.

My impression is that you need to inspect your data more carefully, come up with all the steps of the algorithm (i.e. the "rules") that gives what you need, and then get the coding help you need here.

J. Alex Lee il 24 Giu 2025

Modificato: J. Alex Lee il 25 Giu 2025

Apri in MATLAB Online

testdata.mat

To take nothing for granted and leave no stone unturned: if you have control over the data output, work on that rather than undoing the mess at the data import stage.

That said, starting with the 4 columns, here's a start for the parsing problem the cyclist highlighted: assume that the exponent always has 3 digits (with an optional minus sign), then my mind goes to regexp. The following works, but returns different size arrays for each column in your reshaped cell array - if you are expecting 1560 rows from the data set, then it seems the 3rd and 4th columns are so malformed they don't even return the right number of data points (assuming the parsing worked). If that is the case, then even if you only want rows where some conditions are met on the first 2 columns, you would not be sure that you can safely associate the rows.

load("testdata.mat")
% reshaped dimensions are given/known
data_r = reshape(data,[],4);
col = cell(1,4); % pre-allocate
for i = 1:4
	% convert the cell column to string
	tmp = string(data_r(:,i));
	% join everything together into a super string to parse
	tmp = join(tmp);
	% parse using regexp
	% key assumption: exponent always has 3 digits
	parsed = reshape(regexp(tmp,"[+-]?(\d+\.\d+|\d+)([eE][+-]?\d{3})?","match"),[],1);
	parsed = str2double(parsed);
	parsed(parsed==-999) = nan;
	nVars(i) = numel(parsed);
	col{i} = parsed;
end
% pad all data to max number of rows to inspect in table form
nVarsMax = max(nVars)
nVarsMax = 1560
for i = 1:4
	p = nVarsMax - nVars(i);
	col{i} = padarray(col{i},p,nan,'post');
end
T = table(col{:});
head(T,15)
     Var1      Var2      Var3         Var4    
    ______    ______    _______    ___________

    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
    93.993    131.03    0.74552    -3.9924e-08
tail(T,15)
     Var1      Var2      Var3    Var4
    ______    _______    ____    ____

    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 
    131.03    0.74552    NaN     NaN 

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Pavel il 25 Giu 2025

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2177913-extracting-data-from-cell-array-with-multiple-values-per-column#answer_1567120

Apri in MATLAB Online

I noticed big problems with array dimensions in columns 3 and 4, this will not allow to get arrays of type double. Here is some combinations of lengths:

C =

8 8 6 8

8 8 8 6

8 8 8 7

8 8 8 8

For this task I got the result by expanding arrays to the same size (filling missing elements with zeros), but I would recommend to check the correctness of the initial data recording, maybe there are problems there.

%% Source data reading
load('testdata.mat')
D2=reshape(data,[],4)
%% Converting data
% add spaces for all problem strings in cellarray, "5.53099E-0066.03304E-0066.53553E-0067.04676E-0067.57499E-0068.1279E-006"
D2 = cellfun(@(x) regexprep(x,'([-\d\.]+E-?\d{3})|(.+)','$1 '),D2,'UniformOutput',false);
% convert strings to nums inside cells
D2 = cellfun(@(x) str2num(x),D2,'UniformOutput',false);
%% Check cells data lenght
data_length = cellfun(@(x) length(x),D2); % аггау with length of all elements
C= unique(data_length,'rows') % unique combinations

C =

8 8 6 8

8 8 8 6

8 8 8 7

8 8 8 8

max(data_length) % now we can see that max length is 8 - it will be our target length

ans =

8 8 8 8

%% Add zeroes to short rows (when length of row < 8)
D2 = cellfun(@(cell_element) fill_with_zeros_broken_data(cell_element), D2,'UniformOutput',false)
%% Reshaping
result = reshape([D2{:}],[],4)
%% Local function
function cell_element = fill_with_zeros_broken_data(cell_element)
if length(cell_element)<8
    cell_element(8)=0;
end
end

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Answer 2

Pavel il 24 Giu 2025

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2177913-extracting-data-from-cell-array-with-multiple-values-per-column#answer_1567073

Modificato: Pavel il 24 Giu 2025

Apri in MATLAB Online

Hi! Here is a short 3 line solution, but it may be a bit hard to understand in some places. I will leave comments, but feel free to ask me questions if anything is unclear.

Sample data:

testdata = ...
{["    95.288    95.307    95.1"],    ["   131.028   131.028   131.1"],    ["  0.745516  0.745517  0.745"],    ["-3.99239E-008-3.99239E-008-3.99239E-008"];
 ["   95.4433   95.4633   95.4"],    ["   131.028   131.028   131.1"],    ["  0.745511  0.745509  0.745"],    ["-3.99239E-008-3.99239E-008-3.99239E-008"];
 ["   95.6074   95.6285   95.6"],    ["   131.028   131.028   131.1"],    ["  0.745486  0.745481  0.745"],    ["-3.99239E-008-3.99239E-008-3.99239E-008"];
 ["   95.7799    95.802   95.8"],    ["   131.028   131.028   131.1"],    ["  0.745438  0.745432  0.745"],    ["-3.99239E-008-3.99239E-008-3.99239E-008"];
 ["   95.9603   95.9834   96.1"],    ["   131.028   131.028   131.1"],    ["  0.745381  0.745373  0.745"],    ["-3.99239E-008-3.99239E-008-3.99239E-008"]}
testdata = 5×4 cell array
    {["    95.288    95.307    95.1"]}    {["   131.028   131.028   131.1"]}    {["  0.745516  0.745517  0.745"]}    {["-3.99239E-008-3.99239E-008-3.99239…"]}
    {["   95.4433   95.4633   95.4" ]}    {["   131.028   131.028   131.1"]}    {["  0.745511  0.745509  0.745"]}    {["-3.99239E-008-3.99239E-008-3.99239…"]}
    {["   95.6074   95.6285   95.6" ]}    {["   131.028   131.028   131.1"]}    {["  0.745486  0.745481  0.745"]}    {["-3.99239E-008-3.99239E-008-3.99239…"]}
    {["   95.7799    95.802   95.8" ]}    {["   131.028   131.028   131.1"]}    {["  0.745438  0.745432  0.745"]}    {["-3.99239E-008-3.99239E-008-3.99239…"]}
    {["   95.9603   95.9834   96.1" ]}    {["   131.028   131.028   131.1"]}    {["  0.745381  0.745373  0.745"]}    {["-3.99239E-008-3.99239E-008-3.99239…"]}

Solution

The data in the last column needs to be split: we'll add a space between the string numbers.

{["-3.99239E-008-3.99239E-008-3.99239E-008"]} % we'll repalce this

{["-3.99239E-008 -3.99239E-008 -3.99239E-008"]} % with this

The fast way to do this is with regular expressions. To apply the regular expression to each cell, we'll use the cellfun():

testdata(:,end) = cellfun(@(x) regexprep(x,'(-.+?E.+?)(?=-|$)','$1 '),testdata(:,end),'UniformOutput',false)
testdata = 5×4 cell array
    {["    95.288    95.307    95.1"]}    {["   131.028   131.028   131.1"]}    {["  0.745516  0.745517  0.745"]}    {["-3.99239E-008 -3.99239E-008 -3.992…"]}
    {["   95.4433   95.4633   95.4" ]}    {["   131.028   131.028   131.1"]}    {["  0.745511  0.745509  0.745"]}    {["-3.99239E-008 -3.99239E-008 -3.992…"]}
    {["   95.6074   95.6285   95.6" ]}    {["   131.028   131.028   131.1"]}    {["  0.745486  0.745481  0.745"]}    {["-3.99239E-008 -3.99239E-008 -3.992…"]}
    {["   95.7799    95.802   95.8" ]}    {["   131.028   131.028   131.1"]}    {["  0.745438  0.745432  0.745"]}    {["-3.99239E-008 -3.99239E-008 -3.992…"]}
    {["   95.9603   95.9834   96.1" ]}    {["   131.028   131.028   131.1"]}    {["  0.745381  0.745373  0.745"]}    {["-3.99239E-008 -3.99239E-008 -3.992…"]}

OK, now we can convert strings to numbers inside cells:

testdata = cellfun(@(x) str2num(x),testdata,'UniformOutput',false)
testdata = 5×4 cell array
    {[95.2880 95.3070 95.1000]}    {[131.0280 131.0280 131.1000]}    {[0.7455 0.7455 0.7450]}    {[-3.9924e-08 -3.9924e-08 -3.9924e-08]}
    {[95.4433 95.4633 95.4000]}    {[131.0280 131.0280 131.1000]}    {[0.7455 0.7455 0.7450]}    {[-3.9924e-08 -3.9924e-08 -3.9924e-08]}
    {[95.6074 95.6285 95.6000]}    {[131.0280 131.0280 131.1000]}    {[0.7455 0.7455 0.7450]}    {[-3.9924e-08 -3.9924e-08 -3.9924e-08]}
    {[95.7799 95.8020 95.8000]}    {[131.0280 131.0280 131.1000]}    {[0.7454 0.7454 0.7450]}    {[-3.9924e-08 -3.9924e-08 -3.9924e-08]}
    {[95.9603 95.9834 96.1000]}    {[131.0280 131.0280 131.1000]}    {[0.7454 0.7454 0.7450]}    {[-3.9924e-08 -3.9924e-08 -3.9924e-08]}

Now we need to get the data out of cells. The main hack here is to use [testdata{:}] which creates a row of doubles. Then all that remains is to split it into 4 columns with reshape():

result = reshape([testdata{:}],[],4)
result = 15×4
2880  131.0280    0.7455   -0.0000
3070  131.0280    0.7455   -0.0000
1000  131.1000    0.7450   -0.0000
4433  131.0280    0.7455   -0.0000
4633  131.0280    0.7455   -0.0000
4000  131.1000    0.7450   -0.0000
6074  131.0280    0.7455   -0.0000
6285  131.0280    0.7455   -0.0000
6000  131.1000    0.7450   -0.0000
7799  131.0280    0.7454   -0.0000
8020  131.0280    0.7454   -0.0000
8000  131.1000    0.7450   -0.0000
9603  131.0280    0.7454   -0.0000
9834  131.0280    0.7454   -0.0000
1000  131.1000    0.7450   -0.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Zeros in the last column may be a little confusing, but when changing the display format we see that the result is correct:

format shortE
result
result = 15×4
5288e+01   1.3103e+02   7.4552e-01  -3.9924e-08
5307e+01   1.3103e+02   7.4552e-01  -3.9924e-08
5100e+01   1.3110e+02   7.4500e-01  -3.9924e-08
5443e+01   1.3103e+02   7.4551e-01  -3.9924e-08
5463e+01   1.3103e+02   7.4551e-01  -3.9924e-08
5400e+01   1.3110e+02   7.4500e-01  -3.9924e-08
5607e+01   1.3103e+02   7.4549e-01  -3.9924e-08
5629e+01   1.3103e+02   7.4548e-01  -3.9924e-08
5600e+01   1.3110e+02   7.4500e-01  -3.9924e-08
5780e+01   1.3103e+02   7.4544e-01  -3.9924e-08
5802e+01   1.3103e+02   7.4543e-01  -3.9924e-08
5800e+01   1.3110e+02   7.4500e-01  -3.9924e-08
5960e+01   1.3103e+02   7.4538e-01  -3.9924e-08
5983e+01   1.3103e+02   7.4537e-01  -3.9924e-08
6100e+01   1.3110e+02   7.4500e-01  -3.9924e-08
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Pavel il 25 Giu 2025

@J. Alex Lee @Stephen23 Sorry, I haven't read the thread until now and didn't notice the message with the source file. I'll redo the regular expression now, but I see that there are big problems with the dimensions of the arrays in the cells...

J. Alex Lee il 25 Giu 2025

Apri in MATLAB Online

Rather than use regexp to find only E's and add a space and split the string up later, you can just match regexp instances directly on the full expression of a value in scientific notation (with or without E).

Google search gave me this expression - modified at the end with \d{3} to assert that there will always be 3 digits after the "e".

"[+-]?(\d+\.\d+|\d+)([eE][+-]?\d{3})?"

As Pavel noted, the number of data in each column are inconsistent as demonstrated in my comment.

Accedi per commentare.

Extracting Data from Cell array with multiple values per column

9 Commenti
Mostra 7 commenti meno recentiNascondi 7 commenti meno recenti

Risposta accettata

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (1)

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Extracting Data from Cell array with multiple values per column

9 Commenti Mostra 7 commenti meno recentiNascondi 7 commenti meno recenti

Risposta accettata

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (1)

4 Commenti Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

9 Commenti
Mostra 7 commenti meno recentiNascondi 7 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti