Convert chars into formatted numbers

Hello everyone,
I am working on a code which parses a .header file to interpret a big database stored in a .data file (for those familiar, HITRAN).
From the header file I am able to obtain information on where to separate each line of the dataset into a variable and which format this variable is in. I will put below an example of data:
% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock.Columns = ' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}.
The question is: assuming that I am able to reorganise Names and Values in the same order of the data file, how can I convert the DataBlocks.Columns chars into numbers following each FormatBlock.Values?
For example:
'molec_id' = ' 1' has formatting '%2d', hence: "molec_id" = 1
'local_lower_quanta' = ' 0 1 0' has formatting '%15s', hence 'local_lower_quanta' = [0 1 0]
'nu' = ' 2800.033883' has formatting '%12.6f', hence 'nu' = 2.800033883e3
etc...
Thank you in advace for your help!

 Risposta accettata

I am not certain what result you want.
Try something like this —
% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock = struct with fields:
Names: {1x19 cell}
DataBlock.Columns = {' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}
DataBlock = struct with fields:
Names: {1x19 cell} Columns: {1x19 cell}
format shortG
DBC = cellfun(@(x)sscanf(x,'%g'),DataBlock.Columns,Unif=0);
disp(DBC)
Columns 1 through 13 {[1]} {[1]} {[2800]} {[1.303e-29]} {[0.0001003]} {[0.0664]} {[0.298]} {[2705.1]} {[0.65]} {[0.00578]} {3x1 double} {3x1 double} {3x1 double} Columns 14 through 19 {3x1 double} {[434233]} {[8.0729e+11]} {0x0 double} {[69]} {[63]}
for k = 1:numel(DBC)
DBC{k}.'
end
ans =
1
ans =
1
ans =
2800
ans =
1.303e-29
ans =
0.0001003
ans =
0.0664
ans =
0.298
ans =
2705.1
ans =
0.65
ans =
0.00578
ans = 1×3
0 2 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
0 1 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
11 6 5
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
10 1 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans =
434233
ans =
8.0729e+11
ans = []
ans =
69
ans =
63
You can format them at your leisure. Use either sprintf or fprintf depending on what you want to do.
.

4 Commenti

Thank you very much for your answer. This is 99% what i was looking for. The only variable which is not converted is the third from last, which results in an empty array. I will try to find a workaround.
As always, my pleasure!
The third-frrom-the-last character is a space. Theere is nothing there to convert, so it produces an empty ceell. You can assign it any value you want, including NaN.
Also, I transposed the vector elements so that they displayed as they do in the original vector.
If you leave them un-transposed and use the vertcat function, they will form individual elements of a column vector that you can then use as the numeric argument in fprintf or sprintf.
Example —
DataBlock.Columns = {' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}
DataBlock = struct with fields:
Columns: {1x19 cell}
format shortG
DBC = cellfun(@(x)sscanf(x,'%g'),DataBlock.Columns,Unif=0);
DBC = DBC.';
disp(DBC) % Up To Here, My Previous Code
{[ 1]} {[ 1]} {[ 2800]} {[ 1.303e-29]} {[ 0.0001003]} {[ 0.0664]} {[ 0.298]} {[ 2705.1]} {[ 0.65]} {[ 0.00578]} {3x1 double } {3x1 double } {3x1 double } {3x1 double } {[ 434233]} {[8.0729e+11]} {0x0 double } {[ 69]} {[ 63]}
Lv = cell2mat(cellfun(@isempty,DBC,Unif=0));
disp(Lv)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
DBC{Lv} = NaN; % Assign The Empty Element As ‘NaN’
DBCv = vertcat(DBC{:}); % Use ‘vertcat’
disp(DBCv) % Display All The Resulting Elements
1 1 2800 1.303e-29 0.0001003 0.0664 0.298 2705.1 0.65 0.00578 0 2 0 0 1 0 11 6 5 10 1 10 4.3423e+05 8.0729e+11 NaN 69 63
You can detect the NaN value by using the isnan function to create a logical vector that will give its logical index, or use the ‘Lv’ vector I created, then replace it with anything you want, except an empty value, since numeric arrays do not permit that. With the NaN value, it iis simply considered ‘missing’,
.
Thank you so much!
As always, my pleasure!

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Vehicle Dynamics Blockset in Centro assistenza e File Exchange

Prodotti

Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by