Failed to read xml error when using xmlread

I am trying to read several xml files in a loop using xmlread. An error 'Failed to read xml file' occurs. On examining the xml file I noticed that in the first line that says <?xml version="1.0" encoding="ISO8859-1"?>, if I change ISO8859-1 to ISO-8859-1, xmlread works. Is there an automated way to corect this or any other way to read the files in bulk without having to manually correct the header in each file?

Risposte (3)

...
try
DOMnode=xmlread(filename(i)); % try to read the file
catch ME % catch the failure; fixup
fidi=fopen(filename(i),'r'); % open the file
fido=fopen('tmp','r'); % open a scratch temp file
while ~feof(fidi)
l=fgetl(fidi);
if ~empty(strfind(l,'ISO8859'))
l=strrep(l,'ISO8859','ISO-8859'); % fixup the record
end
fprintf(fid0,l) % output to temp file...
end
fidi=fclose(fidi);
fido=fclose(fido);
copyfile('tmp',filename(i)) % and copy over the original
end
DOMnode=xmlread(filename(i)); % and try again with corrected file...

2 Commenti

Thanks a lot for your help.
Just a clarification: in this command fprintf(fid0,l) only the content of 'l' will be writtten to the tmp file? How do we get back all the other remaining content of the original file please?
It is within the loop, so eventually the entire content is written.
However, the
fprintf(fid0, l)
should be
fwrite(fid0, l)

Accedi per commentare.

Walter Roberson
Walter Roberson il 14 Ago 2020
Modificato: Walter Roberson il 14 Ago 2020
filename = 'InputFileName.xml';
S = fileread(filename);
SS = regexprep(S, 'encoding="ISO8859-', 'encoding="ISO-8859-', 'once');
if strcmp(S, SS)
remove = false; %optimization, do not write new file if not needed
tname = filename;
else
tname = tempname();
fid = fopen(tname, 'w');
fwrite(fid, tname);
fclose(fid);
remove = true;
end
DOMnode = xmlread(tname);
if remove; delete(tname); end
This code is deliberate in narrowing down to encoding= and only doing the first instance, so as to avoid accidentally changing any ISO8859 that might happen to be part of the data.

3 Commenti

Hi Walter, thanks a lot for your response.
I tried the above codes. To clarify: The replaced content is still within 'SS' and assuming strcmp(S,SS) is false, the 'tname = filename' is executed with filename still refering to the original (faulty) file with ISO8859 isnt it? How are the contents actually replaced within the faulty xml file please? Can you clarify this for me.
tname is set to filename when strcmp is true, not when it is false.
The comparison is true when the two strings S and SS are exactly the same, which would happen if regexprep did not make a change. Such as for a file that already has the right pattern, or which has a different encoding. In this situation the original file name is used directly for the later xmlread.
When the strcmp is false that means the original and regexprep versions are different, which means that the regexprep worked to make a new string. In that situation, a temporary file name is fetched, and the file is opened and the new content is written, and the temporary file is closed. It is this temporary file whose name is passed to xmlread. After the reading the temporary file is deleted
See also https://www.mathworks.com/matlabcentral/answers/101632-how-can-i-use-a-function-such-as-xmlread-to-parse-xml-data-from-a-string-instead-of-from-a-file-i#comment_972999 which shows a Java related method. To use it you would do the fileread(), regexprep(), and then java.io.StringBufferInputStream() the result, and xmlread() what you get from that.

Accedi per commentare.

Sarah Immanuel
Sarah Immanuel il 14 Ago 2020
Thanks a lot Walter, yes that makes sense. One last question, hope it is the last!. I am using Matlab2020a. The command tempfile() doesnt seem to work?

3 Commenti

Sorry, should be tempname() instead of tempfile()
Hi Walter, thanks - the tempname() creates a tempfile but is not handled by the xmlread. It shows an error again. Can you help?
maybe
tname = [tempfile() '.xml'];

Accedi per commentare.

Categorie

Scopri di più su Scope Variables and Generate Names in Centro assistenza e File Exchange

Richiesto:

il 13 Ago 2020

Commentato:

il 17 Ago 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by