How to filter out useless data

Hi everyone, I need to clean a big dataset (more than 1,5 million obs.) so to exclude all those meaningless/useless obs. Basically, each observation comes with several variables (price, delta, implied volatility ecc. ecc.) and I would need to get rid of any obs for which the implied volatility is more than 100%. Moreover, for many obs the implied volatility is just missing (i have a blank cell). So, for any value of the column "implied volatility" which is missing or >1, I want matlab to remove the corresponding observation, that is, the entire row. How could i do that in a smart and quick way? (I am a beginner in matlab) Thanks

4 Commenti

Adam
Adam il 23 Ott 2015
Is your data in a cell array? If so does it need to be in a cell array rather than a regular numeric array?
I think they are in a cell array. As i explained above I'm just a beginner and I don't know very well what is the difference.
Start with the "Getting Started" section in Matlab documentation and spend a few minutes getting familiar with basic concepts of array and cell notation, etc. It'll be time well spent in that it'll be much quicker than waiting on answers here, particularly when you don't yet even have the vocabulary to accurately describe the problem.
On that last, what does
whos _yourvariablename_
return? That'll tell us what the data storage as is, is...
yourvariablename is, of course whatever you are using for the data, be that data, x, whatever, not a literal string.
Nick Hobbs
Nick Hobbs il 27 Ott 2015
Modificato: Nick Hobbs il 27 Ott 2015
I understand you want to remove rows from your cell array based on information in your data. The following documentation link may help you with your goal.
The following link provides an example on how to remove a row from a cell array.

Accedi per commentare.

Risposte (2)

Check out the "ismissing()" function.
And to remove rows from your table with volatility more than 100 I think you can do this (untested)
badRows = mytable.volatility > 100;
mytable(badRows,:) = [];
Thorsten
Thorsten il 27 Ott 2015
Modificato: Thorsten il 28 Ott 2015
iv = data(:,3); % implied volatility, assumed to be stored in column 3
idx = isnan(iv) | iv > 1; % logical array of indices
data(idx,:) = []; % remove all rows where idx is true

Tag

Richiesto:

il 23 Ott 2015

Modificato:

il 28 Ott 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by