Azzera filtri
Azzera filtri

Use terminal to speed up file removal

1 visualizzazione (ultimi 30 giorni)
Pete
Pete il 17 Ott 2017
Risposto: Stephen23 il 17 Ott 2017
Hi all, I've got large number of CSVs generated each time a system changes state. Basically, the CSVs start as a single row [1x3] array, and any data is added as a new row. I've written simple loop that checks for any "empty" CSVs (only containing the single row) and remove this file. This however takes many (>10) minutes to complete and I want to try the same in terminal. Code as shown:
CSV_Filenames_STRUCT = dir(sprintf('%s/*.csv',ResultDirectory));
CSV_Filenames_CELL = {CSV_Filenames_STRUCT.name};
StartingNumberOfFiles = size(CSV_Filenames_CELL,2);
for NthFile = 1:StartingNumberOfFiles
NumberOfPeaks = size(textread(sprintf('%s/%s',ResultDirectory,CSV_Filenames_CELL{1,NthFile}),'%s'),1) - 1; % Number of rows less one for the 'x,y,value'
if ~NumberOfPeaks % Essentially empty
delete(sprintf('%s/%s',ResultDirectory,CSV_Filenames_CELL{1,NthFile}));
end
end
I've not used terminal much, and wondering if it'd be faster for the above when there are many files to process, and how to code the check for the single line check So far, I've got something like:
for f in *.csv;
do
L=`wc -l "$f" | awk '{print $1}'`
if test $L -eq 1
then
mv $f ./MT;
fi
done
which isn't quite working (there's spaces in the filename as shown below), but I'm out of my depth here so calling for help on how to use the "system"/"unix" options through Matlab. I'm running OS-X and Kubuntu Linux. I should also mention that the filenames have spaces in them like: "Filter 0000001 Fwd,Alignment Black Screen - Ref_01 Input_19 (2017-10-17 @ 13.30.20.103).csv"
  3 Commenti
Pete
Pete il 17 Ott 2017
Just started a set with 2,000,000 files, but only expect about 10% of these to have genuine results (200k), so the rest just 'empty' CSVs (one row of (title) data). Looking at profiler, I think the Matlab functions called from textread are possibly taking time. I've removed sprintf's and replaced with concatenation strings i.e. [PathPart1 '/' PathPart2] etc. Sped up a bit, but still a long time for processing. Any other suggestions?
Jan
Jan il 17 Ott 2017
You mean "shell", not "terminal".

Accedi per commentare.

Risposte (2)

Jan
Jan il 17 Ott 2017
I'm not sure if I understand your question correctly: You want to delete all files, which have one column only - correct?
FULLFILE is smarter than creating file names by sprintf().
CSV_Filenames_STRUCT = dir(fullfile(ResultDirectory, '*.csv'));
CSV_Filenames_CELL = {CSV_Filenames_STRUCT.name};
StartingNumberOfFiles = numel(CSV_Filenames_CELL);
for NthFile = 1:StartingNumberOfFiles
File = fullfile(ResultDirectory, CSV_Filenames_CELL{NthFile});
fid = fopen(File, 'r');
if fid == -1, error('Cannot open file: %s', File); end
line1 = fgetl(fid);
line2 = fgetl(fid);
fclose(fid);
if ~ischar(line2)
delete(File);
end
end
Is this faster? It tries to import 2 lines only.

Stephen23
Stephen23 il 17 Ott 2017
Remove the textread and replace it with something like this (pseudocode):
fid = fopen(...,'rt');
fgetl(fid); % read first row
if feof(fid) % check if end of file
delete(...)
end
"I've removed sprintf's and replaced with concatenation strings "
I would recommend using fullfile: it actually makes the intention clearer.

Categorie

Scopri di più su Characters and Strings in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by