How do i use textscan to extract some of the numbers with certain pattern (not all the numbers) from one sentence in text file?

6 visualizzazioni (ultimi 30 giorni)
New to Matlab and really struggling on this. I am trying to extract some numbers at certain location or with patterns from the sentences in a text file, but i have no clue of how to filter out other numbers. I only know how to extract all the numbers from it. For example, i have the data in text file named ABC.txt to be:
TJX was in top-20 3 times and got higher 2 times within 1 day(s), 66.67%. It went 6.23% higher on average
TJX was in top-100 32 times and got higher 22 times within 1 day(s), 68.75%. It went 2.80% higher on average
TJX was in top-200 56 times and got higher 43 times within 1 day(s), 76.79%. It went 2.63% higher on average
Your choice on 2021-03-19: TJX(-) 1599/1962
Your choice on 2021-03-18: TJX(-) 1365/2029
Your choice on 2021-03-17: TJX(+) 497/1898
Your choice on 2021-03-16: TJX(-) 1721/1973
Your choice on 2021-03-15: TJX(+) 369/2039
Your choice: AMT since 2020-01-14
AMT was in top-20 1 times and got higher 0 times within 1 day(s), 0.00%. It went 0.00% higher on average
AMT was in top-100 11 times and got higher 8 times within 1 day(s), 72.73%. It went 1.31% higher on average
AMT was in top-200 20 times and got higher 16 times within 1 day(s), 80.00%. It went 2.03% higher on average
Your choice on 2021-03-19: AMT(+) 437/1962
Your choice on 2021-03-18: AMT(N) 1818/2029
Your choice on 2021-03-17: AMT(-) 1738/1898
Your choice on 2021-03-16: AMT(-) 1807/1973
Your choice on 2021-03-15: AMT(N) 259/2039
And i want to extract all the informaion underlined above (those are done by myself manually) and get the sorted data to be like this in a text file named ABC_Reduced.txt:
TJX.. 20:03/67% 100:32/69% 200:56/77% 0319(-)0318(-)0317(+)0316(-)0315(+)
AMT.. 20:01/00% 100:11/73% 200:20/80% 0319(+)0318(N)0317(-)0316(-)0315(N)
Any help or hint would be appreciated.
Thanks,
Esther

Risposta accettata

Mathieu NOE
Mathieu NOE il 22 Mar 2021
hello esther
that was not my easiest code of the day, but finally managed to get it done !
attached my input / output text files
hope it helps !
clc
clearvars
Filename_in = 'dataABC.txt';
Filename_out= 'dataABC_reduced.txt';
[Names,str_all] = extract_data(Filename_in)
% export to text file
writecell(str_all, Filename_out, "FileType", "text");
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Names,str_all] = extract_data(Filename)
fid = fopen(Filename);
tline = fgetl(fid);
% initialization
k = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1st loop to collect all the names
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
while ischar(tline)
if contains(tline,'was in top')
k = k+1; % loop over line index
Name{k} = deblank(extractBefore(tline,'was in top'));
end
tline = fgetl(fid); % lower make matlab not case sensitive
end
Names = unique(Name,'stable');
Names = Names';
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2nd loop to do the hard work
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% initialization
k = 0;
q = 0;
fid = fopen(Filename);
tline = fgetl(fid);
str1 = [];
str2 = [];
Name_old ='bbb';
row = 1;
while ischar(tline)
% retrieve line
if contains(tline,'was in top') % lower make matlab not case sensitive
k = k+1; % loop over line index
Name = deblank(extractBefore(tline,'was in top'));
if k>1 && strcmp(Name,Name_old) == 0
str_all{row} = [Names{row} '..' str1 ' ' str2]; % first concatenations (last one is done at the very end of the file)
str1 = []; % reset
str2 = []; % reset
row = row+1; % increment
end
% retrieve all numeraical contents
x = regexp(tline, '.*?(\d+(\.\d+)*)', 'tokens' );
A = [x{:}];
str1 = [str1 ' ' A{1} ':' A{2} '/' num2str(round(str2num(A{5}))) '%'];
end
if contains(tline,'Your choice on ') % lower make matlab not case sensitive
q = q+1; % loop over line index
date = extractBetween(tline,'Your choice on',':');
month = extractBetween(date,'-','-');
tmp = extractAfter(date,'-');
day = extractAfter(tmp,'-');
sign = extractBetween(tline,'(',')');
str2 = [str2 char(month) char(day) '(' char(sign) ')'];
end
Name_old = Name; % for the check of name change (increment row index)
tline = fgetl(fid); % lower make matlab not case sensitive
end
% last and final concatenation
str_all{row} = [Names{row} '..' str1 ' ' str2]; % last and final concatenation
str_all = str_all';
fclose(fid);
end

Più risposte (0)

Categorie

Scopri di più su Large Files and Big Data in Help Center e File Exchange

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by