fwrite fread consistency issue
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I have some input numbers, for example, [128 129 139 255 256].
They are written to a file with format 'char' and then the file is read with same format.
But the output is different ([26 129 26 255 26])
What was wrong?
The machine is Windows 10 and Matlab version R2019b.
The simple test code:
xInput = [128 129 139 255 256]';
filename = 'test.bin';
formt = 'char';
fid = fopen(filename, 'w');
fwrite(fid, xInput, formt);
fclose(fid);
fid = fopen(filename, 'r');
xOutput = fread(fid, inf, formt);
fclose(fid);
[xInput xOutput]
0 Commenti
Risposta accettata
Guillaume
il 8 Gen 2020
Modificato: Guillaume
il 8 Gen 2020
It's all to do with encoding. Note that the behaviour you see is going to be OS dependent and locale dependent in some ways.In addition, matlab's documentation does a very poor job of documenting the way character encoding works in matlab.
First, you don't specify an encoding in any of your fopen calls, so matlab will use whatever your system local encoding is. Chances are it's windows-1252 but that will vary with OS and localisation. Windows-1252 is a single-byte encoding so already you have a problem, character 256 does not even exist in that encoding. Range is 0-255.
However, internally matlab use unicode (specifically UTF-16) to store characters, so when you write:
fwrite(fid, xInput, 'char')
you're really saying, write the unicode (UTF16) characters [128 129 139 255 256] as windows-1252 characters. So you have a conversion from unicode to another encoding. Indeed:
>> unicode2native(char(xInput), 'windows-1252')
ans =
5×1 uint8 column vector
26
129
26
255
26
is the result of the conversion, which is exactly what you see. So your observation is what is expected to happen. Note: most unicode characters < 256 have the same encoding in windows-1252, it's the case for characters 129 and 255. A few (characters 127, 130:142, 145:156, 158, 159) don't exist in windows-1252 and they'll be replaced by character 26 as you see. The majority of unicode characters > 255 don't exist in windows-1252 so are also replaced by 26 (SUBstitution character).
On the other hand, the character string char([128 129 139 255 256]) does not make any sense textually, so is it really what you meant to write in the file?
If it is, then the best thing is to write these characters as unicode instead of the local code page since unicode supports all the characters that matlab support:
fid = fopen(filename, 'w', 'n', 'UTF8'); %ironically while matlab use UTF16 internally, it doesn't officially support UTF16 for IO!
fwrite(fid, xInput, formt);
fclose(fid);
fid = fopen(filename, 'r', 'n', 'UTF8');
xOutput = fread(fid, inf, formt);
fclose(fid);
[xInput xOutput]
will preserve your text. However, note that the actual bytes that will be written to the files are: [194 128 194 129 194 139 195 191 196 128] (since that's how your original characters are encoded in UTF8).
Again, are you sure that you meant to write these values as characters (as opposed to 16-bit integers for example)?
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Large Files and Big Data in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!