Organize System Data for Diagnostic Feature Designer

The Diagnostic Feature Designer app allows you to interactively analyze data and develop features that can distinguish between data from healthy systems and degraded systems. The app operates on a collection of measurement data and information from set of similar systems such as machines. To use the app, you must first organize your data into a form that the app can import. One way to organize your data is with numerical matrices, which can capture all your measurement data. However, you can also use more flexible formats such as tables, which allow you to incorporate additional information such as health condition and operating conditions. With this information, you can explore features within the app and assess feature ability to distinguish between different specific conditions.

Data Ensembles

Data analysis is the heart of any condition monitoring and predictive maintenance activity.

The data can come from measurements on systems using sensors such as accelerometers, pressure gauges, thermometers, altimeters, voltmeters, and tachometers. For instance, you might have access to measured data from:

  • Normal system operation

  • The system operating in a faulty condition

  • Lifetime record of system operation (run-to-failure data)

For algorithm design, you can also use simulated data generated by running a Simulink® model of your system under various operating and fault conditions.

Whether using measured data, generated data, or both, you frequently have many signals, ranging over a time span or multiple time spans. You might also have signals from many machines (for example, measurements from a number of separate engines all manufactured to the same specifications). And you might have data representing both healthy operation and fault conditions. Evaluating effective features for predictive maintenance requires organizing and analyzing this data while keeping track of the systems and conditions the data represents.

Data Ensembles

The main unit for organizing and managing multifaceted data sets in Predictive Maintenance Toolbox™ is the data ensemble. An ensemble is a collection of data sets, created by measuring or simulating a system under varying conditions.

For example, consider a transmission gear box system in which you have an accelerometer to measure vibration and a tachometer to measure the engine shaft rotation. Suppose that you run the engine for five minutes and record the measured signals as a function of time. You also record the engine age, measured in miles driven. Those measurements yield the following data set.

Now suppose that you have a fleet of many identical engines, and you record data from all of them. Doing so yields a family of data sets.

This family of data sets is an ensemble, and each row in the ensemble is a member of the ensemble.

The members in an ensemble are related in that they contain the same data variables. For instance, in the illustrated ensemble, all members include the same four variables: an engine identifier, the vibration and tachometer signals, and the engine age. In that example, each member corresponds to a different machine. Your ensemble might also include that set of data variables recorded from the same machine at different times. For instance, the following illustration shows an ensemble that includes multiple data sets from the same engine recorded as the engine ages.

In practice, the data for each ensemble member is typically stored in a separate data file. Thus, for instance, you might have one file containing the data for engine 01 at 9,500 miles, another file containing the data for engine 01 at 21,250 miles, and so on.

Ensemble Variables

The variables in your ensemble serve different purposes, and accordingly can be grouped into several types:

  • Data variables (DV) — The main content of the ensemble members, including measured data and derived data that you use for analysis and development of predictive maintenance algorithms. For example, in the illustrated gear-box ensembles, Vibration and Tachometer are the data variables. Data variables can also include derived values, such as the mean value of a signal, or the frequency of the peak magnitude in a signal spectrum.

  • Independent variables (IV) — The variables that identify or order the members in an ensemble, such as timestamps, number of operating hours, or machine identifiers. In the ensemble of measured gear-box data, Age is an independent variable.

  • Condition variables (CV) — The variables that describe the fault condition or operating condition of the ensemble member. Condition variables can record the presence or absence of a fault state, or other operating conditions such as ambient temperature. In the ensemble gear-box data, sensor health might be a condition variable whose state is known for each engine. Condition variables can also be derived values, such as a single scalar value that encodes multiple fault and operating conditions.

Data variables and independent variables typically have many elements. Condition variables are often scalars. In the app, condition variables must be scalars.

Representing Ensemble Data for the App

You can use one of three general approaches to combine your ensemble data and import it into the app. All these approaches require that your ensemble members all contain the same variables.

Create Individual Member Datasets

Import your data in the form of individual datasets — one for each member — and let the app combine these datasets into an ensemble.

This approach requires the least setup before importing the data, but it requires you to select each dataset individually during the import process. This approach is practical only when you have a small number of datasets. If you want to update the ensemble with new members, you must import all members again.

Create an Ensemble Dataset

Import a single ensemble dataset that you create from your member datasets. Each row of your ensemble dataset represents one of your members.

This approach requires more setup before importing the data, but it requires you to select only one item during the import process. It is more practical than the individual approach when you have larger member sets. if you want to update the ensemble with new members, you can do so outside of the app by adding to your existing table. Then import the updated table.

For an example on creating an ensemble dataset from individual member matrices, see Prepare Matrix Data for Diagnostic Feature Designer

Create an Ensemble Datastore Object

Import an ensemble datastore object that contains only the names and paths of member files rather than importing the data itself. This object also includes the information needed for the app to interact with the external files.

This approach is best when you have large amounts of data and variables. Ensemble datastores can help you work with such data, whether it is stored locally or in a remote location such as cloud storage using Amazon S3™ (Simple Storage Service), Windows Azure® Blob Storage, or Hadoop® Distributed File System (HDFS™).

Typically, when you begin exploring your data in the app, you want to import a relatively small number of members and variables. However, later, you might want to test your conclusions on feature effectiveness by bringing in a larger sample size. The ensemble datastore is one method for handling the larger amount of data, especially if the data size exceeds memory limitations for MATLAB.

For more information on ensemble datastore objects, see Data Ensembles for Condition Monitoring and Predictive Maintenance.

Data Types and Constraints for Dataset Import

The app accepts various data types, including numerical matrices and tables that contain condition-variable scalars and embedded measurement timetables. The app bases the interpretation of the imported data on whether you select Import > Import Single-Member Datasets (individual datasets) or Import > Import Multi-Member Ensemble (ensemble dataset or ensemble datastore).

Before importing your data, it must already be clean, with preprocessing such as outlier and missing-value removal. For more information, see Data Preprocessing for Condition Monitoring and Predictive Maintenance.

Import Single-Member Datasets

This option applies to the member datasets approach in the preceding figure. The app accepts individual member table arrays, timetable arrays, or numeric matrices, each containing the same independent variables, data variables, and condition variables.

  • Data variables within these datasets can contain timetables, tables, cell arrays, or numeric arrays.

  • All independent time variables must be of the same type — either all double or all duration or all datetime. If your original data was uniformly sampled, and timestamps were not recorded, the app prompts you to construct a uniform timeline during the import process.

  • Condition variables in a member dataset contain a single scalar. The form of the scalar can be numeric, string, cell, or categorical. You can import condition variables with your data only if your member datasets are tables, timetables, or cell arrays. Matrices cannot accommodate condition variables.

  • Matrices can contain only one independent variable, but can have any number of data variables tied to that independent variable. Matrices cannot accommodate variable names.

Import Multi-Member Ensemble

This option applies to the Ensemble Dataset and Ensemble Datastore approach in the preceding figure. The app accepts:

  • An ensemble table containing table arrays or matrices. Table rows represent individual members.

  • An ensemble cell array containing tables or matrices. Cell array rows represent individual members.

  • An ensemble datastore object that contains the information necessary to interact with files stored externally to the app. The external files have fewer format restrictions than imported datasets. The read function referenced in the ensemble datastore object can adapt to the format of the files.

The members in the collective dataset must all contain the same independent variables, data variables, and condition variables.

  • All independent time variables must be of the same type — either all double or all duration or all datetime. If your original data was uniformly sampled, and timestamps were not recorded, the app prompts you to construct a uniform timeline during the import process.

  • Embedded matrices can contain only one independent variable, but can have any number of data variables tied to that independent variable.

  • Condition variables in a member dataset contain a single scalar. The form of the scalar can be numeric, string, cell, or categorical.

See Also

| | |

Related Topics