dataset data type might be removed in
a future release. To work with heterogeneous data, use the MATLAB®
type instead. See MATLAB
for more information.
Statistics and Machine Learning Toolbox™ has dataset arrays for storing variables with heterogeneous data types. For example, you can combine numeric data, logical data, cell arrays of character vectors, and categorical arrays in one dataset array variable.
Within a dataset array, each variable (column) must be one homogeneous data type, but the different variables can be of heterogeneous data types. A dataset array is usually interpreted as a set of variables measured on many units of observation. That is, each row in a dataset array corresponds to an observation, and each column to a variable. In this sense, a dataset array organizes data like a typical spreadsheet.
Dataset arrays are a unique data type, with a corresponding
set of valid operations. Even if a dataset array contains only numeric
variables, you cannot operate on the dataset array like a numeric
variable. The valid operations for dataset arrays are the methods
You can create a dataset array by combining variables that exist in the MATLAB workspace, or directly importing data from a file, such as a text file or spreadsheet. This table summarizes the functions you can use to create dataset arrays.
|Data Source||Conversion to Dataset Array|
|Data from a file|
|Heterogeneous collection of workspace variables|
You can export dataset arrays to text or spreadsheet files using
To convert a dataset array to a cell array or structure array, use
To convert a dataset array to a table, use
In addition to storing data in a dataset array, you can store metadata such as:
Variable and observation names
Units of measurement
This information is stored as dataset array properties.
For a dataset array named
ds, you can view the
dataset array metadata by entering
the command line. You can access a specific property, such as variable
You can both retrieve and modify property values using this syntax.
Variable and observation names are included in the display of a dataset array. Variable names display across the top row, and observation names, if present, appear in the first column. Note that variable and observation names do not affect the size of a dataset array.