Requirements for Converting HTML to DOM Objects
To convert HTML content to an mlreportgen.dom.HTML
or mlreportgen.dom.HTMLFile
object, the HTML content must be XML parsable.
HTML content is XML parsable when it complies with the rules for properly formed XML,
such as:
Include a closing tag for all elements.
Use lower case for the opening and closing (start and end) tags of an element. For example, use
<p>
and</p>
for a paragraph element, not<P>
and</P>
.Nest elements properly. If you open an element inside another element, close the first element before you close the containing element.
Enclose attribute values with quotation marks. For example, use
<p align="center"></p>
.
For details, see the W3Schools summary of XML rules at www.w3schools.com/xml/xml_syntax.asp.
Tip
To make HTML content XML parsable, you can use mlreportgen.utils.html2dom.prepHTMLString
, mlreportgen.utils.html2dom.prepHTMLFile
, and mlreportgen.utils.tidy
. See Prepare HTML Before Conversion.
Supported HTML Elements and Attributes
This table shows the HTML elements and attributes that are supported when you convert HTML to a DOM object. Unsupported elements and attributes are ignored.
HTML Element | Attributes |
---|---|
a | class , style ,
href , name |
address | class , style |
b | class , style |
big | class , style |
blockquote | class , style |
body | class , style |
br | n/a |
center | class , style |
cite | class , style |
code | class , style |
dd | class , style |
del | class , style |
dfn | class , style |
div | class , style |
dl | class , style |
dt | class , style |
em | class , style |
font | class , style ,
color , face ,
size |
h1, h2, h3, h4, h5, h6 | class , style ,
align |
hr | class , style ,
align |
i | class , style |
ins | class , style |
img | class , style ,
src , height ,
width |
kbd | class , style |
li | class , style |
mark | class , style |
nobr | class , style |
ol | class , style |
p | class , style ,
align |
pre | class , style |
s | class , style |
samp | class , style |
small | class , style |
span | class , style |
strike | class , style |
strong | class , style |
sub | class , style |
sup | class , style |
table | class , style ,
align , bgcolor ,
border , cellspacing ,
cellpadding , frame ,
rules , width |
tbody | class , style ,
align , valign |
tfoot | class , style ,
align , valign |
thead | class , style ,
align , valign |
td | class , style ,
bgcolor , height ,
width , colspan ,
rowspan ,align ,
valign , nowrap |
th | class , style ,
bgcolor , height ,
width , colspan ,
rowspan ,align ,
valign , nowrap |
tr | class , style ,
align ,bgcolor ,
valign |
tt | class , style |
u | class , style |
ul | class , style |
var | class , style |
For information about these elements, see https://developer.mozilla.org/en-US/docs/Web/HTML/Element.
Supported HTML CSS Style Attributes for All Elements
You can use HTML style attributes to format HTML content that you append to a DOM report. A style attribute is a string of Cascading style sheets (CSS) formats.
These CSS formats are supported:
background-color
border
border-bottom
border-bottom-color
border-bottom-style
boder-bottom-width
border-color
border-left
border-left-color
border-left-style
boder-left-width
border-right
border-right-color
border-rigtht-style
border-right-width
border-style
border-top
border-top-color
border-top-style
border-top-width
border-width
color
counter-increment
counter-reset
display
font-family
font-size
font-style
font-weight
height
line-height
list-style-type
margin
margin-bottom
margin-left
margin-right
margin-top
padding
padding-bottom
padding-left
padding-right
padding-top
text-align
text-decoration
text-indent
vertical-align
white-space
width
For information about these formats, https://developer.mozilla.org/en-US/docs/Web/CSS/Reference.
Support for HTML Character Entities
You can append HTML content that includes special characters, such as the British
pound sign, the U.S. dollar sign, or reserved XML markup characters. The XML markup
special characters are >
, <
,
&
, "
, and '
. To
include special characters, use HTML named or numeric character references. For
example, to include the left angle bracket (<) in HTML content that you want to
append, use one of these character entity references:
The named character entity reference
<
The numeric character entity reference
&003c;
For more information, see https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references.
DOCTYPE Declaration
The HTML content that you append to a DOM report does not need to include a document type declaration (see https://en.wikipedia.org/wiki/Document_type_declaration). If the content includes a document type declaration, it must meet the following conditions:
If the content includes character entity references (special characters), the document type declaration must reference a document type definition (DTD) that defines the referenced entities. For example, the following declaration specifies a DTD file that defines all HTML character entities:
<!DOCTYPE html SYSTEM "html.dtd">
The
html.dtd
is included in the MATLAB® Report Generator™ software.If the document type declaration references a DTD file, a valid DTD file must exist at the path specified by the declaration. Otherwise, appending the content causes a DTD parse error. For example, the following declaration causes a parse error:
<!DOCTYPE html SYSTEM "foo.dtd">
If the content to be appended does not include character entity references, the document type declaration does not need to reference a DTD file. For example, the following declaration works for content that does not use special characters:
<!DOCTYPE html>
Tip
To avoid document type declaration issues, remove declarations from existing HTML content that you intend to append to DOM reports. If the content does not include a declaration, the DOM prepends a valid declaration that defines the entire HTML character entity set.