transform
Description
returns a table with transformed features generated by the
NewTbl
= transform(Transformer
,Tbl
)FeatureTransformer
object Transformer
. The input
Tbl
must contain the required variables, whose data types must match
those of the variables originally passed to gencfeatures
or
genrfeatures
when
Transformer
was created.
returns a subset of the transformed features, where NewTbl
= transform(Transformer
,Tbl
,Index
)Index
indicates the
features to return.
Examples
Compute Cross-Validation Mean Squared Error Using Generated Features
Generate features to train a linear regression model. Compute the cross-validation mean squared error (MSE) of the model by using the crossval
function.
Load the patients
data set, and create a table containing the predictor data.
load patients Tbl = table(Age,Diastolic,Gender,Height,SelfAssessedHealthStatus, ... Smoker,Weight);
Create a random partition for 5-fold cross-validation.
rng("default") % For reproducibility of the partition cvp = cvpartition(size(Tbl,1),KFold=5);
Compute the cross-validation MSE for a linear regression model trained on the original features in Tbl
and the Systolic
response variable.
CVMdl = fitrlinear(Tbl,Systolic,CVPartition=cvp); cvloss = kfoldLoss(CVMdl)
cvloss = 45.2990
Create the custom function myloss
(shown at the end of this example). This function generates 20 features from the training data, and then applies the same training set transformations to the test data. The function then fits a linear regression model to the training data and computes the test set MSE.
Note: If you use the live script file for this example, the myloss
function is already included at the end of the file. Otherwise, you need to create this function at the end of your .m file or add it as a file on the MATLAB® path.
Compute the cross-validation MSE for a linear model trained on features generated from the predictors in Tbl
.
newcvloss = mean(crossval(@myloss,Tbl,Systolic,Partition=cvp))
newcvloss = 26.9963
function testloss = myloss(TrainTbl,trainY,TestTbl,testY) [Transformer,NewTrainTbl] = genrfeatures(TrainTbl,trainY,20); NewTestTbl = transform(Transformer,TestTbl); Mdl = fitrlinear(NewTrainTbl,trainY); testloss = loss(Mdl,NewTestTbl,testY); end
Train Model Using Subset of Generated Features
Train a linear classifier using only the numeric generated features returned by gencfeatures
.
Load the patients
data set. Create a table from a subset of the variables.
load patients Tbl = table(Age,Diastolic,Height,SelfAssessedHealthStatus, ... Smoker,Systolic,Weight,Gender);
Partition the data into training and test sets. Use approximately 70% of the observations as training data, and 30% of the observations as test data. Partition the data using cvpartition
.
rng("default")
c = cvpartition(Tbl.Gender,Holdout=0.30);
TrainTbl = Tbl(training(c),:);
TestTbl = Tbl(test(c),:);
Use the training data to generate 25 new features. Specify the minimum redundancy maximum relevance (MRMR) feature selection method for selecting new features.
Transformer = gencfeatures(TrainTbl,"Gender",25, ... FeatureSelectionMethod="mrmr")
Transformer = FeatureTransformer with properties: Type: 'classification' TargetLearner: 'linear' NumEngineeredFeatures: 23 NumOriginalFeatures: 2 TotalNumFeatures: 25
Inspect the generated features.
Info = describe(Transformer)
Info=25×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ________________________ __________________________________________________________________________________________
zsc(Weight) Numeric true Weight "Standardization with z-score (mean = 153.1571, std = 26.8229)"
eb5(Weight) Categorical false Weight "Equal-width binning (number of bins = 5)"
c(SelfAssessedHealthStatus) Categorical true SelfAssessedHealthStatus "Variable of type categorical converted from a cell data type"
zsc(sqrt(Systolic)) Numeric false Systolic "sqrt( ) -> Standardization with z-score (mean = 11.086, std = 0.29694)"
zsc(sin(Systolic)) Numeric false Systolic "sin( ) -> Standardization with z-score (mean = -0.1303, std = 0.72575)"
zsc(Systolic./Weight) Numeric false Systolic, Weight "Systolic ./ Weight -> Standardization with z-score (mean = 0.82662, std = 0.14555)"
zsc(Age+Weight) Numeric false Age, Weight "Age + Weight -> Standardization with z-score (mean = 191.1143, std = 28.6976)"
zsc(Age./Weight) Numeric false Age, Weight "Age ./ Weight -> Standardization with z-score (mean = 0.25424, std = 0.062486)"
zsc(Diastolic.*Weight) Numeric false Diastolic, Weight "Diastolic .* Weight -> Standardization with z-score (mean = 12864.6857, std = 2731.1613)"
q6(Height) Categorical false Height "Equiprobable binning (number of bins = 6)"
zsc(Systolic+Weight) Numeric false Systolic, Weight "Systolic + Weight -> Standardization with z-score (mean = 276.1429, std = 28.7111)"
zsc(Diastolic-Weight) Numeric false Diastolic, Weight "Diastolic - Weight -> Standardization with z-score (mean = -69.4286, std = 26.2411)"
zsc(Age-Weight) Numeric false Age, Weight "Age - Weight -> Standardization with z-score (mean = -115.2, std = 27.0113)"
zsc(Height./Weight) Numeric false Height, Weight "Height ./ Weight -> Standardization with z-score (mean = 0.44797, std = 0.067992)"
zsc(Height.*Weight) Numeric false Height, Weight "Height .* Weight -> Standardization with z-score (mean = 10291.0714, std = 2111.9071)"
zsc(Diastolic+Weight) Numeric false Diastolic, Weight "Diastolic + Weight -> Standardization with z-score (mean = 236.8857, std = 29.2439)"
⋮
Transform the training and test sets, but retain only the numeric predictors.
numericIdx = (Info.Type == "Numeric");
NewTrainTbl = transform(Transformer,TrainTbl,numericIdx);
NewTestTbl = transform(Transformer,TestTbl,numericIdx);
Train a linear model using the transformed training data. Visualize the accuracy of the model's test set predictions by using a confusion matrix.
Mdl = fitclinear(NewTrainTbl,TrainTbl.Gender); testLabels = predict(Mdl,NewTestTbl); confusionchart(TestTbl.Gender,testLabels)
Input Arguments
Transformer
— Feature transformer
FeatureTransformer
object
Feature transformer, specified as a FeatureTransformer
object.
Tbl
— Features to transform
table
Features to transform, specified as a table. The rows must correspond to
observations, and the columns must correspond to the predictors used to generate the
transformed features stored in Transformer
. You can enter
describe(Transformer).InputVariables
to see the list of features
that Tbl
must contain.
Data Types: table
Index
— Features to return
numeric vector | logical vector | string array | cell array of character vectors
Features to return, specified as a numeric or logical vector indicating the position of the features, or a string array or cell array of character vectors indicating the names of the features.
Example: 1:12
Data Types: single
| double
| logical
| string
| cell
Output Arguments
NewTbl
— Transformed features
table
Transformed features, returned as a table. Each row corresponds to an observation, and each column corresponds to a generated feature.
Version History
Introduced in R2021a
Comando MATLAB
Hai fatto clic su un collegamento che corrisponde a questo comando MATLAB:
Esegui il comando inserendolo nella finestra di comando MATLAB. I browser web non supportano i comandi MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)