Main Content

bioma.data.DataMatrix

Data structure encapsulating data and metadata from microarray experiment

Description

A bioma.data.DataMatrix object is a data structure encapsulating measurement data and feature metadata from a microarray experiment so that it can be indexed by gene or probe identifiers and by sample identifiers.

A bioma.data.DataMatrix object stores experimental data in a matrix, with rows typically corresponding to gene names or probe identifiers, and columns typically corresponding to sample identifiers. A DataMatrix object also stores metadata, such as the gene names or probe identifiers and sample identifiers, in row names and column names.

Creation

Description

DMobj = bioma.data.DataMatrix(Matrix) creates a DataMatrix object from measurement data and feature metadata from microarray experiment.

DMobj = bioma.data.DataMatrix(Matrix,RowNames,ColumnNames) specifies row and column names. RowNames are typically gene names or probe identifiers. ColumnNames are typically sample identifiers.

example

DMobj = bioma.data.DataMatrix('File',FileName) creates a bioma.data.DataMatrix object from a tab-delimited TXT or XLS file that contains table-oriented data and metadata.

DMobj = bioma.data.DataMatrix('File',FileName,Name,Value)creates a bioma.data.DataMatrix object from a tab-delimited TXT or XLS file according to the Name,Value arguments.

Input Arguments

expand all

Measurement data and feature metadata from microarray experiment, specified as a two-dimensional numeric or logical array or a bioma.data.DataMatrix object.

Row names for the bioma.data.DataMatrix object, specified as a numeric vector, character array, string vector, or cell array of character vectors. The number of elements RowNames must be equal to the number of rows in Matrix. RowNames are typically gene names or probe identifiers from a microarray experiment. Row names do not need to be unique.

Data Types: double | char | string | cell

Column names for the bioma.data.DataMatrix object, specified as a numeric vector, character array, string vector, or cell array of character vectors. The number of elements ColumnNames must be equal to the number of columns in Matrix. ColumnNames are typically sample identifiers from a microarray experiment. Column names do not need to be unique.

Data Types: double | char | string | cell

File name or a path and file name of a tab-delimited TXT or XLS file that contains table-oriented data and metadata, specified as a character vector or string.

Typically, the first row of the table contains column names, the first column contains row names, and the numeric data starts at the 2,2 position. The bioma.data.DataMatrix function detects if the first column does not contain row names, and reads data from the first column. However, if the first row does not contain header text (column names), set the HLine property to 0.

Data Types: char | string

Name-Value Arguments

Row names for bioma.data.DataMatrix object, specified as one of these values:

  • Numeric vector, character array, string vector, or a cell array of character vectors, whose elements are equal in number to the number of rows of numeric data in the input matrix.

  • A character vector or string, which is used as a prefix for row names. Numbers are appended to the prefix.

  • true — Unique row names are assigned using the formats row1, row2, row3, and so on.

  • false — No row names are assigned.

Row names do not need to be unique.

Data Types: double | logical | char | string | cell

Column names for bioma.data.DataMatrix object, specified as one of these values:

  • Numeric vector, character array, string vector, or a cell array of character vectors, whose elements are equal in number to the number of columns of numeric data in the input matrix.

  • A character vector or string, which is used as a prefix for column names. Numbers are appended to the prefix.

  • true — Unique column names are assigned using the formats col1, col2, col3, and so on.

  • false — No column names are assigned.

Column names do not need to be unique.

Data Types: double | logical | char | string | cell

Name for bioma.data.DataMatrix object, specified as a character vector or string.

Data Types: char | string

Delimiter symbol to use for input file, specified as a character vector or string. Typical choices are:

  • ' '

  • '\t' (default)

  • ','

  • ';'

  • '|'

Data Types: char | string

Row of the input file that contains the column header text (column names), specified as a positive integer. When creating the DataMatrix object, the DataMatrix function loads data from (HLine + 1) to the end of the file. If the input file does not contain column header text (column names), set HLine to 0.

Data Types: double

Subset of row names in File for the bioma.data.DataMatrix function to use for creating the bioma.data.DataMatrix object, specified as a cell array of character vectors, character array, string vector, or a numeric or logical vector.

Data Types: logical | char | string | cell

Subset of column names in File for the DataMatrix function to use for creating the bioma.data.DataMatrix object, specified as a cell array of character vectors, character array, string vector, or a numeric or logical vector.

Data Types: logical | char | string | cell

Properties

expand all

Name of the bioma.data.DataMatrix object, stored as a character vector.

Data Types: char

Row names (typically gene names or probe identifiers), stored as an empty array or a cell array of character vectors. The number of elements in the cell array must equal the number of rows in the matrix.

Data Types: cell

Column names (typically sample identifiers), stored as an empty array or a cell array of character vectors. The number of elements in the cell array must equal the number of columns in the matrix.

Data Types: cell

This property is read-only.

Number of rows in the matrix, stored as a positive number. You cannot modify this property directly. You can access it using the get method.

Data Types: double

This property is read-only.

Number of columns in the matrix, stored as a positive number. You cannot modify this property directly. You can access it using the get method.

Data Types: double

This property is read-only.

Number of dimensions in the matrix, stored as a positive number. You cannot modify this property directly. You can access it using the get method.

Data Types: double

This property is read-only.

Class type of the elements in bioma.data.DataMatrix object, stored as a character vector, such as single or double. You cannot modify this property directly. You can access it using the get method.

Data Types: char

Object Functions

expand all

colnamesRetrieve or set column names of DataMatrix object
dispDisplay DataMatrix object
dmwriteWrite DataMatrix object to text file
doubleConvert DataMatrix object to double-precision array
getRetrieve information about DataMatrix object
isemptyDetermine whether array is empty
isfiniteDetermine which array elements are finite
isinfDetermine which array elements are infinite
isnanDetermine which array elements are NaN
isscalarDetermine whether input is scalar
isequalTest DataMatrix objects for equality
isequalnTest DataMatrix objects for equality, treating NaNs as equal
isvectorDetermine whether input is vector
lengthLength of largest array dimension
ndimsReturn number of dimensions in DataMatrix object
numelReturn number of elements in DataMatrix object
pdistPairwise distance between pairs of observations
plotDraw 2-D line plot of DataMatrix object
rownamesRetrieve or set row names of DataMatrix object
setSet property of DataMatrix object
singleConvert DataMatrix object to single-precision array
sizeArray size
catConcatenate arrays
horzcatConcatenate DataMatrix objects horizontally
sortcolsSort columns of DataMatrix object in ascending or descending order
sortrowsSort rows of DataMatrix object in ascending or descending order
vertcatConcatenate DataMatrix objects vertically
kmeansk-means clustering
maxReturn maximum values in DataMatrix object
meanReturn average or mean values in DataMatrix object
medianReturn median values in DataMatrix object
minReturn minimum values in DataMatrix object
nanmax(Not recommended) Maximum, ignoring NaN values
nanmean(Not recommended) Mean, ignoring NaN values
nanmedian(Not recommended) Median, ignoring NaN values
nanmin(Not recommended) Minimum, ignoring NaN values
nanstd(Not recommended) Standard deviation, ignoring NaN values
nansum(Not recommended) Sum, ignoring NaN values
nanvar(Not recommended) Variance, ignoring NaN values
pcaPrincipal component analysis of raw data
pdistPairwise distance between pairs of observations
stdReturn standard deviation values in DataMatrix object
sumReturn sum of elements in DataMatrix object
varReturn variance values in DataMatrix object
expExponential
logNatural logarithm
log10Common logarithm (base 10)
log2Base 2 logarithm and floating-point number dissection
pow2Base 2 exponentiation and scaling of floating-point numbers
sqrtSquare root
ceilRound toward positive infinity
fixRound toward zero
floorRound toward negative infinity
roundRound to nearest decimal or integer
dmarrayfunApply function to each element in DataMatrix object
plusAdd DataMatrix objects
minusSubtract DataMatrix objects
timesMultiply DataMatrix objects
rdivideRight array divide DataMatrix objects
ldivideLeft array divide DataMatrix objects
powerArray power DataMatrix objects
ltTest DataMatrix objects for less than
leTest DataMatrix objects for less than or equal to
gtTest DataMatrix objects for greater than
geTest DataMatrix objects for greater than or equal to
eqTest DataMatrix objects for equality
neTest DataMatrix objects for inequality
dmbsxfunApply element-by-element binary operation to two DataMatrix objects with singleton expansion enabled

Examples

collapse all

Load the file containing yeast data. This file includes three variables: yeastvalues, a 614-by-7 matrix of gene expression data, genes, a cell array of 614 GenBank® accession numbers for labeling the rows in yeastvalues, and times, a 1-by-7 vector of time values for labeling the columns in yeastvalues.

load filteredyeastdata

Create variables to contain a subset of the data, specifically the first five rows and first four columns of the yeastvalues matrix, the genes cell array, and the times vector.

yeastvalues = yeastvalues(1:5,1:4);
genes = genes(1:5,:);
times = times(1:4);

Import the microarray object package.

import bioma.data.*

Create a DataMatrix object from the gene expression data.

DMobj = DataMatrix(yeastvalues,genes,times)
DMobj = 

                  0       9.5     11.5      13.5  
    SS DNA     -0.131    1.699    -0.026     0.365
    YAL003W     0.305    0.146    -0.129    -0.444
    YAL012W     0.157    0.175     0.467    -0.379
    YAL026C     0.246    0.796     0.384     0.981
    YAL034C    -0.235    0.487    -0.184    -0.669

Display all properties of a DataMatrix object and their current values.

get(DMobj)
            Name: ''
        RowNames: {5x1 cell}
        ColNames: {'   0'  ' 9.5'  '11.5'  '13.5'}
           NRows: 5
           NCols: 4
           NDims: 2
    ElementClass: 'double'

Return all properties and their current values of the DataMatrix object to a scalar structure where each field name is a property of a DataMatrix object, and each field contains the value of that property.

DMstruct = get(DMobj)
DMstruct = struct with fields:
            Name: ''
        RowNames: {5x1 cell}
        ColNames: {'   0'  ' 9.5'  '11.5'  '13.5'}
           NRows: 5
           NCols: 4
           NDims: 2
    ElementClass: 'double'

Return the value of a specific property of the DataMatrix object. For exxample, return the value of RowNames.

NamesOfRows = get(DMobj,'RowNames')
NamesOfRows = 5x1 cell
    {'SS DNA' }
    {'YAL003W'}
    {'YAL012W'}
    {'YAL026C'}
    {'YAL034C'}

Now return the value of NRows.

NumberOfRows = DMobj.NRows
NumberOfRows = 
5

Load the file containing yeast data. This file includes three variables: yeastvalues, a 614-by-7 matrix of gene expression data, genes, a cell array of 614 GenBank® accession numbers for labeling the rows in yeastvalues, and times, a 1-by-7 vector of time values for labeling the columns in yeastvalues.

load filteredyeastdata

Create variables to contain a subset of the data, specifically the first five rows and first four columns of the yeastvalues matrix, the genes cell array, and the times vector.

yeastvalues = yeastvalues(1:5,1:4);
genes = genes(1:5,:);
times = times(1:4);

Import the microarray object package.

import bioma.data.*

Create a DataMatrix object from the gene expression data.

DMobj = DataMatrix(yeastvalues,genes,times)
DMobj = 

                  0       9.5     11.5      13.5  
    SS DNA     -0.131    1.699    -0.026     0.365
    YAL003W     0.305    0.146    -0.129    -0.444
    YAL012W     0.157    0.175     0.467    -0.379
    YAL026C     0.246    0.796     0.384     0.981
    YAL034C    -0.235    0.487    -0.184    -0.669

Display possible values for all properties that have a fixed set of property values in the DataMatrix object.

set(DMobj)
        Name: 'A DataMatrix's 'Name' property does not have a fixed set of values.'
    RowNames: 'Empty, a cell array of strings or a numeric vector.'
    ColNames: 'Empty, a cell array of strings or a numeric vector.'

Display possible values for a specific property that has a fixed set of property values in the DataMatrix object. For example, display possible values for RowNames.

set(DMobj,'RowNames')
Empty, a cell array of strings or a numeric vector.

Load the file containing yeast data. This file includes three variables: yeastvalues, a 614-by-7 matrix of gene expression data, genes, a cell array of 614 GenBank® accession numbers for labeling the rows in yeastvalues, and times, a 1-by-7 vector of time values for labeling the columns in yeastvalues.

load filteredyeastdata

Create variables to contain a subset of the data, specifically the first five rows and first four columns of the yeastvalues matrix, the genes cell array, and the times vector.

yeastvalues = yeastvalues(1:5,1:4);
genes = genes(1:5,:);
times = times(1:4);

Import the microarray object package.

import bioma.data.*

Create a DataMatrix object from the gene expression data.

DMobj = DataMatrix(yeastvalues)
DMobj = 

         1         2        3         4     
    1    -0.131    1.699    -0.026     0.365
    2     0.305    0.146    -0.129    -0.444
    3     0.157    0.175     0.467    -0.379
    4     0.246    0.796     0.384     0.981
    5    -0.235    0.487    -0.184    -0.669

Set the Name property of the DataMatrix object.

DMobj = set(DMobj,'Name','YeastData');
DMobj.Name
ans = 
'YeastData'

Set multiple properties, for example, set the RowNames and ColNames properties.

DMobj = set(DMobj,'RowNames',genes,'ColNames',times)
DMobj = 

                  0       9.5     11.5      13.5  
    SS DNA     -0.131    1.699    -0.026     0.365
    YAL003W     0.305    0.146    -0.129    -0.444
    YAL012W     0.157    0.175     0.467    -0.379
    YAL026C     0.246    0.796     0.384     0.981
    YAL034C    -0.235    0.487    -0.184    -0.669

Version History

Introduced in R2008b

expand all