mlreportgen.dom.HTMLFile Class

Namespace: mlreportgen.dom
Superclasses: mlreportgen.dom.HTML

Convert an HTML file to a DOM document

Description

Converts the contents of an HTML file to an mlreportgen.dom.HTMLFile object containing DOM objects having the same content and format. You can append the HTMLFile object to a DOM document of any type, including Word and PDF documents.

The mlreportgen.dom.HTMLFile class is a handle class.

Creation

Description

htmlFileObj = HTMLFile(htmlFile) converts the HTML file to an HTMLFile object containing DOM objects having the same content and format.

An HTMLFile object supports these HTML elements and attributes. In addition, HTMLFile objects accept HTML that contains custom CSS properties, which begin with a hyphen. Custom CSS properties are supported in HTML, Microsoft® Word, and PDF output.

example

HTML ElementAttributes
aclass, style, href, name
addressclass, style
bclass, style
bigclass, style
blockquoteclass, style
bodyclass, style
brn/a
centerclass, style
citeclass, style
codeclass, style
ddclass, style
delclass, style
dfnclass, style
divclass, style
dlclass, style
dtclass, style
emclass, style
fontclass, style, color, face, size
h1, h2, h3, h4, h5, h6class, style, align
hrclass, style, align
iclass, style
insclass, style
imgclass, style, src, height, width
kbdclass, style
liclass, style
markclass, style
nobrclass, style
olclass, style
pclass, style, align
preclass, style
sclass, style
sampclass, style
smallclass, style
spanclass, style
strikeclass, style
strongclass, style
subclass, style
supclass, style
tableclass, style, align, bgcolor, border, cellspacing, cellpadding, frame, rules, width
tbodyclass, style, align, valign
tfootclass, style, align, valign
theadclass, style, align, valign
tdclass, style, bgcolor, height, width, colspan, rowspan,align, valign, nowrap
thclass, style, bgcolor, height, width, colspan, rowspan,align, valign, nowrap
trclass, style, align,bgcolor, valign
ttclass, style
uclass, style
ulclass, style
varclass, style

For information about these elements, see https://developer.mozilla.org/en-US/docs/Web/HTML/Element.

These CSS formats are supported:

  • background-color

  • border

  • border-bottom

  • border-bottom-color

  • border-bottom-style

  • boder-bottom-width

  • border-color

  • border-left

  • border-left-color

  • border-left-style

  • boder-left-width

  • border-right

  • border-right-color

  • border-rigtht-style

  • border-right-width

  • border-style

  • border-top

  • border-top-color

  • border-top-style

  • border-top-width

  • border-width

  • color

  • counter-increment

  • counter-reset

  • display

  • font-family

  • font-size

  • font-style

  • font-weight

  • height

  • line-height

  • list-style-type

  • margin

  • margin-bottom

  • margin-left

  • margin-right

  • margin-top

  • padding

  • padding-bottom

  • padding-left

  • padding-right

  • padding-top

  • text-align

  • text-decoration

  • text-indent

  • vertical-align

  • white-space

  • width

For information about these formats, https://developer.mozilla.org/en-US/docs/Web/CSS/Reference.

Input Arguments

expand all

HTML file path, specified as a character vector or string scalar.

Properties

expand all

Note

For HTML markup to display correctly in your report, you must include end tags for empty elements and enclose attribute values in quotation marks. If you want to show a reserved XML markup character as text, you must use its equivalent named or numeric XML character.

Reserved CharacterDescriptionEquivalent Character
>Greater than>
<Less than&lt;
&Ampersand&amp;
"Double quotation mark&quot;
'Single quotation mark&apos;
%Percent&#37;

HTML tag name of container, specified as a character vector or string scalar. The name must be an HTML element, such as "div", "section", or "article".

Note

Microsoft Word output ignores the HTML tag name.

Data Types: char | string

Style name, specified as a character vector or string scalar. The style name is the name of a style specified in the style sheet of the document or document part to which this element is appended. The specified style defines the appearance of this element in the output document unless overridden by the formats specified by the Style property of this element. To learn more about using style sheets, see Use Style Sheet Styles.

Note

Microsoft Word output ignores the style name.

Attributes:

NonCopyable
true

Data Types: char | string

Formatting to apply to this document element object, specified as a cell array of DOM format objects. The formats specified by this property override corresponding formats specified by the StyleName property of this element. Formats that do not apply to this element are ignored.

Note

The children of this document element object inherit any of these formats that they do not override.

Attributes:

NonCopyable
true

Data Types: cell

Parent of mlreportgen.dom.HTMLFile object, specified as a document element object. A document element must have only one parent.

Attributes:

SetAccess
private
NonCopyable
true

Children elements of this HTMLFile object, specified as an array of mlreportgen.dom.Element objects.

Attributes:

SetAccess
private

Tag for the mlreportgen.dom.HTMLFile object, specified as a character vector or string scalar. The DOM API generates a session-unique tag as part of the creation of this object. The generated tag has the form CLASS:ID, where CLASS is the object class and ID is the value of the Id property of the object. Specify your own tag value to help you identify where to look when an issue occurs during document generation.

Attributes:

NonCopyable
true

Data Types: char | string

A session-unique identifier is generated as part of HTML object creation. You can specify your own value for Id.

Attributes:

NonCopyable
true

Data Types: char | string

Note

HTMLFile ignores the KeepInterElementWhiteSpace property. If you want to preserve white space, use fileread to read your HTML file as text and then follow the procedure described for the mlreportgen.dom.HTMLKeepInterElementWhiteSpace property.

Methods

expand all

Examples

collapse all

Create a text file named myHTML.html and save it in the current folder. Add this text into the file:

<html>
<head>
<style>p {font-size:14pt;}</style>
</head>
<body>
<p style='white-space:pre'><b>Hello</b><i style='color:green'> World</i></p>
<p>This is <u>me</u> speaking</p>
</body>
</html>

To convert the myHTML.html file to a Word report, run these commands:

import mlreportgen.dom.*; 
rpt = Document('MyReport','docx'); 
htmlFile = HTMLFile('myHTML.html'); 
append(rpt,htmlFile); 
close(rpt); 
rptview(rpt.OutputPath);
The resulting Word report contains the text that you specified in the HTML file.

Word report showing the text "Hello world" as the first paragraph, with Hello in bold and World in green and italic. The second paragraph reads "This is me speaking", with "me" underlined

Tips

  • MATLAB® Report Generator™ mlreportgen.dom.HTML or mlreportgen.dom.HTMLFile objects typically cannot accept the raw HTML output of third-party applications, such as Microsoft Word, that export native documents as HTML markup. In these cases, your Report API report generation program can use the mlreportgen.utils.html2dom.prepHTMLString and mlreportgen.utils.html2dom.prepHTMLFile functions to prepare the raw HTML for use with the mlreportgen.dom.HTML or mlreportgen.dom.HTMLFile objects. Typically, your program will have to further process the prepared HTML to remove valid but undesirable objects, such as line feeds that were in the raw content.

  • Word and PDF documents require inline elements, such as text and links, to be contained in a paragraph. To meet this requirement, the HTML parser creates wrapper paragraphs to contain inline elements that are not already in a paragraph. If you create an mlreportgen.dom.HTML or mlreportgen.dom.HTMLFile object from HTML that contains inline elements that are not in paragraphs and add the object to an HTML document, the generated HTML can differ from the input HTML. To generate the inline elements without the added wrapper paragraphs, insert the HTML markup into an HTML document by using an mlreportgen.dom.RawText object.

  • By default, the DOM API uses a base font size of 12 points to convert em units to actual font sizes. For example, a font size specified as 2em converts to 24 points. To specify a different base font size, add your content to a report by using an mlreportgen.dom.HTML object. Set the EMBaseFontSize property of the object to the base font size. For example, if you set the EMBaseFontSize property to 14, a font size of 2em converts to 28 points.

Version History

Introduced in R2015a