Extracting column from text file

The NOAA atmospheric data file with the entries defined by the header row comes in the following format:
USAF WBAN STATION NAME CTRY ST CALL LAT LON ELEV(M) BEGIN END
007018 99999 WXPOD 7018 +00.000 +000.000 +7018.0 20110309 20130730
007026 99999 WXPOD 7026 AF +00.000 +000.000 +7026.0 20120713 20170822
007070 99999 WXPOD 7070 AF +00.000 +000.000 +7070.0 20140923 20150926
008260 99999 WXPOD8270 +00.000 +000.000 +0000.0 20050101 20100920
Trying to extract a given column, which can be 'Elev' or 'USAF' or 'STATION NAME' etc. It is known apriori which column needs to be extracted, for example, column #1 (USAF). Running into problems because the 'STATION NAME' sometimes has a blank in between its alphanumeric code and sometimes it is just one code without any blanks. Also, other fields can be blank sometimes, for example CTRY. In the above 4 lines of the shortened input file, 'ST' and 'CALL' are empty, but they can be filled (and are usually alphabet codes).
Also,
(1). how to extract the USAF entries corresponding to only CTRY==AF ?
(2). how to extract all the rows with rowNumber=10000 to rowNumber=20000 (say).
Thanks.

 Risposta accettata

per isakson
per isakson il 29 Mag 2020
Modificato: per isakson il 30 Mag 2020
This is a fixed-width text file. The documentation includes a good description on how to read fixed-width text files.
See
Be careful to get the column widths right.
Also,
  1. how to extract the USAF entries corresponding to only CTRY==AF ?
  2. how to extract all the rows with rowNumber=10000 to rowNumber=20000 (say).
Use readtable() and read all rows (if that doesn't cause memory problems). The tools you need comes with table.
In response to comments
Since there are no delimiters in the data file, I find the message
Line 3 has 9 delimiters, while preceding lines have 8.
misleading. Even if one sequences of char(32) is counted as one delimiter the numbers 9 and 8 doesn't make sense.
I created the script below in three steps
  1. Create the obj, opts, with default values. Inspect opts
  2. Type opts.<tab> in the Command Window. (<tab> stands for tab-completion). I identified four properties, the default values of which were not meaningsful. I added statements to the script to assign values, which I found in the comments. (To save me some trouble in the future, I modified the names to become legal Matlab names.)
  3. Read the file with readtable().
%%
ffs = fullfile('d:\m\cssm\noaa1lineHeaderFirst15lines.txt');
opts = fixedWidthImportOptions; % default values
%%
opts.DataLines = [ 2, inf ];
opts.VariableNames = { 'USAF','WBAN','STATION_NAME','CTRY','ST' ...
, 'CALL','LAT','LON','ELEV_M_','BEGIN','END' };
opts.VariableTypes = { 'double','double','char','char','char','char' ...
, 'double','double','double','double','double' };
opts.VariableWidths = [ 7, 6, 30, 5, 3, 6, 8, 9, 8, 9, 9 ];
%%
tbl = readtable( ffs, opts );
No eror messages so far.
>> tbl
tbl =
14×11 table
USAF WBAN STATION_NAME CTRY ST CALL LAT LON ELEV_M_ BEGIN END
_____ _____ _____________________ ____ __ ______ ______ ______ _______ __________ __________
7018 99999 'WXPOD 7018' '' '' '' 0 0 7018 2.011e+07 2.0131e+07
7026 99999 'WXPOD 7026' 'AF' '' '' 0 0 7026 2.0121e+07 2.0171e+07
7070 99999 'WXPOD 7070' 'AF' '' '' 0 0 7070 2.0141e+07 2.0151e+07
8260 99999 'WXPOD8270' '' '' '' 0 0 0 2.005e+07 2.0101e+07
8268 99999 'WXPOD8278' 'AF' '' '' 32.95 65.567 1156.7 2.0101e+07 2.012e+07
8307 99999 'WXPOD 8318' 'AF' '' '' 0 0 8318 2.01e+07 2.01e+07
8411 99999 'XM20' '' '' '' NaN NaN NaN 2.016e+07 2.016e+07
...
Looks ok

10 Commenti

Already tried readtable. It was giving the following error:
Error using readtable (line 216)
Reading failed at line 3. All lines of a text file must have the same number of delimiters. Line 3 has 9 delimiters, while preceding lines have 8.
Note: readtable detected the following parameters:
'Delimiter', '\t ', 'MultipleDelimsAsOne', true, 'HeaderLines', 1, 'ReadVariableNames', false, 'Format', '%f%f%q%f%f%f%f%f%f'
Error in noaaExtractCol (line 3)
data=readtable('noaaFile.txt');
I thought this was because of blank spaces.
For the readtable options, I had used the following :
DataStartLine = 2;
NumVariables = 11;
VariableNames = {'USAF','WBAN','STATION NAME','CTRY','ST',...
'CALL','LAT','LON','ELEV(M)','BEGIN','END'};
VariableWidths = [ 7, 5, 30, 5, 3, 5, 8, 9, 8, 9, 9 ] ;
DataType = {'double','double','char','char','char','char',...
'double','double','double','double','double'};
opts = fixedWidthImportOptions('NumVariables',NumVariables,...
'DataLines',DataStartLine,...
'VariableNames',VariableNames,...
'VariableWidths',VariableWidths,...
'VariableTypes',DataType);
noaaTable = readtable(filename,opts)
which had given the error:
Error using matlab.io.text.FixedWidthImportOptions (line 46)
Expected a cell array of valid variable names.
Error in fixedWidthImportOptions (line 37)
opts = matlab.io.text.FixedWidthImportOptions(varargin{:});
Error in noaaExtract (line 13)
opts = fixedWidthImportOptions('NumVariables',NumVariables,...
I thought this was either due to a space in the variable 'Station Name', or blank.
Impossible for me to know for sure the format of the file. Do you know that it's not a fixed-width text file?
How did you use readtable()? You have obviously not created a FixedWidthImportOptions object using either the fixedWidthImportOptions function or the detectImportOptions function
"I thought this was because of blank spaces." Not likely, since the error message says it's because
Line 3 has 9 delimiters, while preceding lines have 8.
Proposal: upload a sample of the data file. Use the paper clip icon.
I think the VariableWidths are
7, 6, 30, 5, 3, 6, 8, 9, 8, 9, 9
b
b il 29 Mag 2020
Yes, the variable widths are 7,6,30,5,3,6,... instead of 7,5,30,5,3,5,... But it made no difference in the error with fixedWidthImportOptions. Error remained the same.
Line 3 has 9 delimiters, while preceding lines have 8.
The reason I thought it is because of the blank space is because the only difference between the data in row 1 and data in row 2 is the blank in the CTRY column of row 1 (whereas in row 2, it is 'AF').
Cris LaPierre
Cris LaPierre il 29 Mag 2020
Modificato: Cris LaPierre il 29 Mag 2020
Why not use the Import Tool to figure out how to load the data? I will admit the code might not be as readable as what could be done by hand, but it allows you to set up the properties interactively, and then create an import function, which can then be used whenever you need.
A couple videos you might find helpful
  1. How to use the import tool
  2. Generating and reusing code
b
b il 30 Mag 2020
Thanks, but importData doesn't capture the variables properly. I have attached the jpeg figure of how it has broken down the "Station Name" into 4 columns, and merged other variables into one column too.
See my answer, I've added a script that reads your sample data file.
b
b il 30 Mag 2020
Works perfect.
Some people on this site (Matlabcentral Answers) are extremely helpful. How can their help be ever repaid ?

Accedi per commentare.

Più risposte (1)

b
b il 29 Mag 2020
Modificato: b il 29 Mag 2020

0 voti

Thanks for suggesting the paper clip icon.
I have attached the file (short one), but it contains almost everything the rest of the (very) big file contains.
The errors are while using this short test file as an input.

Categorie

Richiesto:

b
b
il 29 Mag 2020

Commentato:

b
b
il 30 Mag 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by