Azzera filtri
Azzera filtri

The function textscan is problematic and has serious bug!

3 visualizzazioni (ultimi 30 giorni)
The function textscan is really problematic and it gives me a huge lot of troubles, especially for my nerves!
I updated two text files (which are sliceGood.txt and sliceBad.txt). They contains the same number of figures(which is 16512) and stores the figures in the same fashion.
Then I have two matlab scripts(counterGoodAndBadSlice.m and compaireGoodBadLineByLine.m). They counter the number of figures in two text files by 2 different two algorithms.
For short, I will call them counter and comparire. Both two algorithms involves the use of textscan function:
  1. For counter, it use textscan to read the float figures one by one till the end of file.
  2. For compaire, it reads a line one time and use textscan to identify the float figures in one line and carries on this procedue till the end of file.
Two script yields the different result: the counter script indicates sliceGood.txt has 16512 figures and sliceBad.txt has 16544 figures. But the compaire script indicates that both two text files have the same number of figures: 16512.
If I am not wrong, this is a big bug for textscan. Please the stuff verifies this problem and response to me!
  1 Commento
Xiaoqiu
Xiaoqiu il 22 Set 2023
Thanks very much Walter Roberson and Voss !
The problem is in the text file which was produced by a fortran code. The missing E is due to the overflow of the data. When the data overflows the output format, Fortran just eat the E. What a acient fossil language it is! That bug tortured me for the whole 3 days and finally you come to free me from it!
Thank you very much! God bless you!

Accedi per commentare.

Risposta accettata

Voss
Voss il 21 Set 2023
Modificato: Voss il 21 Set 2023
The problem is due to errors in the sliceBad.txt file, like these on line 391:
0.2848764-119 0.1045924-120 0.1045924-120 0.2848764-119 -0.6596584-109
Notice: There are no "E"s.
So when you textscan(__,'%f',5) this line
result = textscan(' 0.2848764-119 0.1045924-120 0.1045924-120 0.2848764-119 -0.6596584-109','%f',5)
result = 1×1 cell array
{5×1 double}
result{:}
ans = 5×1
0.2849 -119.0000 0.1046 -120.0000 0.1046
You get 5 numbers, but they are not what was supposed to be written there. Notice you miss the rest of the numbers on the line beyond the first 5; they are not counted.
On the other hand, when doing textscan(__,'%f',1) to read the numbers one-at-a-time, and you encounter a problematic line like this in the file, you'd count 10 numbers.
Whatever program created these files messes up and omits the "E" when the exponent takes up more than two digits, it appears.
This explains all the behavior you observe:
  • When you textscan each line, specifying to read 5 floating-point numbers each time, your counts from the good file and the bad file are the same because you are failing to read the rest of the numbers on lines where this missing "E" problem happens.
  • When you textscan one floating-point number from the file at a time, you get some extra counts from the bad file because of this missing "E" problem.

Più risposte (1)

Walter Roberson
Walter Roberson il 21 Set 2023
You coded
if totalNumFigureBad == totalNumFigureGood
disp('The textscan is problematic!');
end
and
if counterBad == counterGood
disp('The textscan is problematic!');
end
but if the counts are equal then textscan is NOT problematic: it would be problematic if the counts were NOT equal.
  5 Commenti
Walter Roberson
Walter Roberson il 21 Set 2023
countagain2()
bad line #378 has 9 entries content is: 0.2073446E-99 -0.6596584-109 -0.4864104-110 -0.4864104-110 -0.6596584-109 bad line #390 has 6 entries content is: 0.1316473E-83 0.1951292E-86 -0.2101776E-88 0.1684242E-98 -0.6596584-109 bad line #391 has 10 entries content is: 0.2848764-119 0.1045924-120 0.1045924-120 0.2848764-119 -0.6596584-109 bad line #403 has 7 entries content is: 0.2690198E-88 -0.1869932E-89 0.1400264E-99 -0.4864104-110 0.1045924-120 bad line #404 has 9 entries content is: -0.3134086-131 -0.3134086-131 0.1045924-120 -0.4864104-110 0.1400264E-99 bad line #416 has 8 entries content is: -0.1869932E-89 0.1400264E-99 -0.4864104-110 0.1045924-120 -0.3134086-131 bad line #417 has 8 entries content is: -0.3134086-131 0.1045924-120 -0.4864104-110 0.1400264E-99 -0.1869932E-89 bad line #429 has 9 entries content is: 0.1684242E-98 -0.6596584-109 0.2848764-119 0.1045924-120 0.1045924-120 bad line #430 has 7 entries content is: 0.2848764-119 -0.6596584-109 0.1684242E-98 -0.2101776E-88 0.1951292E-86 bad line #442 has 9 entries content is: 0.2073446E-99 -0.6596584-109 -0.4864104-110 -0.4864104-110 -0.6596584-109 bad line #820 has one entry bad line #3304 has one entry good line #820 has one entry good line #3304 has one entry The textscan is problematic!
Xiaoqiu
Xiaoqiu il 22 Set 2023
You are all right! The files are problematic! Thank you very much! God bless you!

Accedi per commentare.

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by