Reading specific data from formatted txt files - looks very dificult

1 visualizzazione (ultimi 30 giorni)
Hi,
I have 15 ascii files. The file names are 1948_1950, 1950_1955, 1955_1960, 1960_1965, ..., 2010_2014 (all files except the frist and the last have 5 yaer span in the name). The 15th file is Kos.txt that has only dates and hours from 1948 to 2010 (but not all dates in that period). I've attached 1948_1950.txt and Kos.txt files.
You'll see that files with year in their name have year and time next to the word 'CENTERS' when you open them. So, the first file has 480101 0000 indicating date Jan 01, 1949 (date format is 'yymmdd') at hour 00. About 40 lines below is the following line:
"k io lon lat f c dp rd zs up vp lonv latv".
I need data that are below that line, in this case it would be:
"1 11 63.14 50.20. etc...
If you go through the file you'll see that pattern. Date and time are next to the CENTER word and then about 40 lines below are the date that I need (always below the line that starts with "k io lon lat ...".
However, there are three additional problems:
  1. I need these information only for dates and hours specified in Kos.dat file
  2. Date formats in Kos.dat and the files with years in their name are not the same
  3. Date format in files with years in their name is 'yymmdd', but when it comes to year 2000-2010 then 101 would be 000101 for Jan 1st, 2000. Therefore, zeros before the first integer are missing.
I know that this might be very challenging, but I would very much appriciate help.
Thanks in advance, Djordje
  4 Commenti
per isakson
per isakson il 3 Ago 2014
Modificato: per isakson il 3 Ago 2014
Did you specify
  • in what form you want the result
  • the date format used in Kos.txt
dpb
dpb il 3 Ago 2014
As I showed you before, the easiest will likely be to read the whole file into memory and then select those wanted (or eliminate the unwanted).
The rest is pretty much as IA say just more or less trivial grunt work of counting lines, creating format strings and using textscan and/or other io functions.
I don't see a piece of the puzzle that hasn't been addressed in one of the previous postings other than perhaps finding the given line. That's pretty much either
a) use a fixed headerlines count if the offset is fixed or
b) read line-by-line until find the string. That is indeed pretty simple...
while ~feof(fid)
l=fgetl(fid);
if strfind(l,'a unique pattern in the target string'), break,end
end
If you need to find the number of lines to the given one the first time so can use headerlines later for multiple sections that are a fixed (but initially unknown) separation, then just add a counter to the loop.

Accedi per commentare.

Risposta accettata

per isakson
per isakson il 3 Ago 2014
Modificato: per isakson il 3 Ago 2014
I disagree, it's not that simple. Ok, it depends.
I've chosen to divide the task into two steps
  1. Read the data-file and put the required data into a containers.Map object. The object may be saved to a mat-file. More data can be added to the object later. There are methods with which one may inspect data interactively.
  2. Loop over the "keys" of the key-file and print result to the screen. It's a demo after all.
Questions on performance and memory usage are postponed.
Error handling and more remains, e.g. testing and documentation.
&nbsp
Demo:
>> specific_data
Key: 19490101T0000, Data:
Key: 19490101T0600, Data:
Key: 19490101T1200, Data:
1.0e+03 *
Columns 1 through 9
0.0010 0 0.0596 0.0436 1.0379 -0.0002 0.0072 0.0093 0.1614
Columns 10 through 13
0.0006 0.0004 0.0560 0.0483
Key: 19490101T1800, Data:
Key: 19490102T0000, Data:
....
where
function specific_data
key_filespec = 'h:\m\cssm\Kos.txt';
met_filespec = 'h:\m\cssm\1948_1950.txt';
lib = containers.Map( 'KeyType', 'char', 'ValueType', 'any' );
lib = met2lib( met_filespec, lib );
fid = fopen( key_filespec );
cac = textscan( fid, '%s' );
fclose(fid);
for kk = 1 : length( cac{1} )
key = datestr( datevec( cac{1}(kk), 'ddmmyyyyHH' ) ...
, 'yyyymmddTHHMM' );
if not(isrow( key ))
keyboard
end
fprintf( '\nKey: %s, Data: \n', key )
if isKey( lib, key )
disp( lib( key ) )
end
pause(0.1)
end
end
and
function lib = met2lib( filespec, lib )
str = fileread( filespec );
cac = strtrim( strsplit( str, 'CENTRES:' ) );
cac(1) = [];
for bb = 1 : length( cac )
block_str = cac{bb};
datetime_str = repmat( '0', 1, 11 );
str = strtrim( block_str(1:12) );
datetime_str( end-length(str)+1 : end ) = str;
timekey = datestr( datevec(datetime_str,'yymmdd HHMM',1940)...
, 'yyyymmddTHHMM' );
colhead_xpr ...
= 'k\s+io\s+lon\s+lat\s+f\s+c\s+dp\s+rd\s+zs\s+up\s+vp\s+lonv\s+latv\s+';
str = regexp( block_str, ['(?<=',colhead_xpr,').+$'], 'match' );
if not( isempty( str ) )
num_val = str2num( str{:} );
else
num_val = [];
end
lib( timekey ) = num_val;
end
end
  5 Commenti
per isakson
per isakson il 5 Ago 2014
Modificato: per isakson il 5 Ago 2014
  • "But it gives me a specification, like [3x13 double]" &nbsp This comment indicates that you badly need to do some getting-started-exercises with the Matlab Desktop before you start experimenting with deeply nested cell arrays.
  • "I figured out" &nbsp The MathWorks forbid me to use the acronym, RTFM. Even after 20+ years with Matlab I read the on-line help all the time.
  • " is there a way to make it automatic for all files" &nbsp Yes, my code is the start of something automatic. But since you did not indicate how you will use the data, I just dumped it on the screen.
  • "very sophisticated" &nbsp I tried to structure the code somewhat and I use regular expressions. Stuctured programming was regarded sophisticated in the late seventies. My use of regular expressions might be sophisticated in the Matlab world.
  • My lib is way better than your eval.
/ not so humble
djr
djr il 5 Ago 2014
Sorry if you are offended. As I said before, I just started using Matlab and I have to do this asap. I know that most of my questions are maybe even stupid but I have like 2 weeks to finish this and 2 weeks of Matlab experience so far.
Thanks... P.S. It's a way better...

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Variables in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by