Removing Rows from Array Based on Date/Time Value in Cell

7 visualizzazioni (ultimi 30 giorni)
I am trying to remove all lines of data for whom the "last order date" (row 17 of my CSV file) is after 8/31/2019. My code was weeding out some, but there were a few later dates left behind. I switched all the date values to Serial Date Numbers in hopes that would simplify the process, but it's still not weeding out anything after 7183 (the serial date number for 8/31/19). I am out of ideas and would greatly appreciate any feedback.
Code:
segment = readtable('myFile.csv');
a = 1;
rows = height(segment);
newDate = [];
for f = 1:rows
addDate = datenum(segment{f, 17});
newDate(f) = addDate;
end
cutoffDateOG = datetime('08/31/0019','InputFormat','MM/dd/uuuu', 'Format', 'preserveinput');
cutoffDate = datenum(cutoffDateOG);
while a < rows
if newDate(a)>cutoffDate || isnat(segment{a,17}) == 1
segment(a,:) = [];
rows = rows-1;
end
a = a+1;
end
end

Risposta accettata

Cris LaPierre
Cris LaPierre il 30 Lug 2020
Ok, I'll take you through the steps I'd follow. I'll be sharing videos from our Exploratory Data Analysis specialization on Coursera. If you have time, you might find it a great way to learn the latest features in MATLAB for data analysis. You can access all but the graded content for free.
You'll have to get the data into MATLAB first. I suggest using the Import Tool. This video may be helpful for that. If you prefer, you can generate the code used to import the data from the Import Tool. This video on generating and reusing code created by the import tool shows you how.
Make sure to set the data type of the variables you want to use (each column of a table is a variable). Most important is to set the Last Order Date to datetime. You can learn more about working with dates and times here.
Then you'll want to access specific variables from the table for your comparison. This video will explain how to do this. And since you are actually wanting to access a subset of the data in your table, perhaps this video will also be helpful.
% Import data setting date formats
opts = detectImportOptions("MatLab Test Data.csv");
opts = setvaropts(opts,{'x_LastOrderDate','x_FirstShowAttendedDate','x_LastShowAttendedDate','x_NextShowAttendingDate','x_LastCompShowDate'},'InputFormat',"MM/dd/yy");
opts = setvaropts(opts,'DateCreated','InputFormat','MM/dd/yy HH:mm');
segment = readtable("MatLab Test Data.csv",opts);
% This is the part that removes the rows of data with Last Order Date after cutoff date
% Specify the cutoff date
cutoffDate = datetime(2019,08,13);
% Remove rows where Date is after cut off date
segment(segment.x_LastOrderDate>cutoffDate,:)=[];
% Remove rows where Dates has a value of NaT
segment=rmmissing(segment,"DataVariables","x_LastOrderDate")
  1 Commento
Liza Miller
Liza Miller il 30 Lug 2020
Not only did this perfectly do what I needed, I REALLY appreicate the resources for further learning.Thank you so much!

Accedi per commentare.

Più risposte (1)

Cris LaPierre
Cris LaPierre il 29 Lug 2020
Don't use datenums. I would suggest converting your dates to datetimes (segment{f,17}). Then you can just use regular conditional expressions (>,<.==,etc.).
  4 Commenti
Cris LaPierre
Cris LaPierre il 29 Lug 2020
Modificato: Cris LaPierre il 29 Lug 2020
You may be making this more complicated than it needs to be. Without a representative data set, we can only guess, but I suspect something like this should work. Ignore the first few lines. I'm creating a dummy table of various data types for the example.
% Dummy data set including a range of dates including NaT entries.
Dates = [datetime(2019,01,01):days(1):datetime('today') repmat(NaT,1,24)];
segment = table(Dates',zeros(length(Dates),1),zeros(length(Dates),1,'single'),cell(length(Dates),1),strings(length(Dates),1));
segment.Properties.VariableNames=["Dates","Double","Single","Cell","String"];
% Specify the cutoff date
cutoffDate = datetime(2019,08,13);
% Remove rows where Dates has a value of NaT
segment=rmmissing(segment,"DataVariables","Dates")
% Remove rows where Date is after cut off date
segment(segment.Dates>cutoffDate,:)=[];
Since I start Dates on the first day of 2019, and since 8/13/2019 is the 225th day of the year (in 2019), the resulting size of segment should be a table with 225 rows and 5 columns.
day(cutoffDate,'dayofyear')
ans =
225
Liza Miller
Liza Miller il 30 Lug 2020
Modificato: Liza Miller il 30 Lug 2020
Thank you so much; you've been so helpful. I should've prefaced this by saying, I worked with MatLab a little in school, but I'm trying to implement it into my current role as a marketing analyst. I am a super beginner, so after your example I do see how it would be imperative to see the actual data. Here's the spredsheet I'm working off of. The dates I'm trying to sift through is row 17 (last order date). rmmissing is a godsend and is much simpler than what I was doing, but I'm having some trouble translating the row variable name aspect, considering how many rows I have. I tried using a variable for the 17th column of my data, but that didn't work.
You've been so incredibly helpful, and if don't have time/are sick of answering any more I very much understand, but I'd love to be able to utilize this awesome code you've provided, and hopefully implement MatLab into my company!

Accedi per commentare.

Categorie

Scopri di più su Dates and Time in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by