Azzera filtri
Azzera filtri

Classification on matrices with linear independent row vectors

3 visualizzazioni (ultimi 30 giorni)
To give you some context: I'm trying to train a classification- and neighbour-analysis-model on a set of aircraft maintenance data. Therefore the Model should predict based on rework dimensions if a specific component is likely to be reauthorized into service.
Therefore each datapoint has a label ('Scrap' or 'Reuse') but consists of n row vectors, each representing the individual rework dimensions.
My problem now is, that I can't find a way of mapping the datapoints in a decision space as each datapoint itself is a subspace consisting of n linear independent row vectors.
My approach (as the rows of each observation are randomly sorted) was to sort them by their presumably most important column to get some ground for comparison between the datapoints.
My question would be if anyone has a better solution to this problem as mine is not yet optimal.
Any tips or experience would be great, thanks!

Risposte (1)

Shubham
Shubham il 21 Feb 2024
Hi Finn,
Given the nature of your problem, where each data point is essentially a set of vectors (rework dimensions) associated with a label ('Scrap' or 'Reuse'), and considering that the order of these vectors within each set is not fixed, you're dealing with a complex input structure that requires careful consideration. Here are some strategies you might consider:
1. Feature Engineering:
Try to extract meaningful features from each set of rework dimensions that can be used to represent each data point in a fixed-size feature space.
  • Aggregation: Compute statistical measures (mean, median, standard deviation, min, max, etc.) across the vectors in each data point.
  • Dimensionality Reduction: Use techniques like PCA (Principal Component Analysis) to reduce the set of vectors to a smaller set of principal components that capture most of the variance. Refer to this documentation link: Principal component analysis of raw data - MATLAB pca (mathworks.com)
2. Sequence Models:
If there is a temporal or logical sequence to the rework dimensions, consider using sequence models such as RNNs, LSTMs, or GRUs, which can handle variable-length input sequences. Refer to this documentation link: Long Short-Term Memory Neural Networks - MATLAB & Simulink (mathworks.com)
  • Sorting: As you mentioned, sorting the vectors by an important column could be a good preprocessing step before feeding them into a sequence model.
  • Padding: If necessary, pad the sequences to a fixed length.
3. Set Functions:
Explore neural network architectures that are invariant to the order of the input vectors, such as set functions.
  • Deep Sets: This architecture can process sets of vectors and is inherently permutation-invariant. You can refer to this research paper for deep sets: [1703.06114] Deep Sets (arxiv.org)
  • Attention Mechanisms: Use attention to weigh the importance of different vectors within each set, which can help the model focus on the most relevant rework dimensions.
4. Graph-Based Models:
If there is a relationship between the rework dimensions, you could represent each data point as a graph, with dimensions as nodes and some logical relationship as edges.
5. Multiple Instance Learning (MIL):
If each data point can be considered as a "bag" of instances (vectors), then MIL is a framework that can be used when there is ambiguity in instance labels. Refer to this: [1609.07257] Using Neural Network Formalism to Solve Multiple-Instance Problems (arxiv.org)
  • Instance-Level Predictions: Make predictions on each vector and then aggregate these predictions to make a bag-level prediction.
  • Aggregate Features: Extract features from each instance and aggregate them to form a single feature vector for the bag.
6. Custom Model:
Design a custom neural network architecture that can handle sets of vectors. This might involve combining different layers and mechanisms to process the input data effectively. Refer to this documentation link: Define Custom Deep Learning Layers - MATLAB & Simulink (mathworks.com)
7. Similarity-Based Methods:
Use similarity or distance metrics to compare the sets of vectors and use these metrics as features for a machine learning model.
Before implementing any of these strategies, it's crucial to understand the nature of the rework dimensions and the domain knowledge behind the aircraft maintenance data. This understanding can guide the feature engineering process and the choice of model architecture. Additionally, it's often beneficial to start with a simple model to establish a baseline before moving on to more complex models.

Categorie

Scopri di più su Guidance, Navigation, and Control (GNC) in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by