Using a time-stamp to find the median for every hour

Question

Daniel il 5 Nov 2014

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/161436-using-a-time-stamp-to-find-the-median-for-every-hour

Modificato: per isakson il 5 Nov 2014

Hello,

I am working on a project that is examining the consumption of oxygen within the brain of neonatal children with congenital heart disease. We pull data from a variety of sources that sample at different frequencies. All of these different sources come with a time-stamp for each individual entry. There are between 3-8 .asc files for each patient containing around 20,000 rows of data. Currently I have multiple .m files to find the median based off of the number of rows that would exist within each hour (ie. if the sampling occurs every 10 seconds it counts 360 rows). However, this is not perfect since during more frequent sampling such as every second there are missing data points. Additionally, between each file there are missing time points when the monitor was not connected for various reasons.

Ideally, I would like a script that would be able to combine multiple files into one large data-set for each patient. Then utilize the time-stamps to return the median for every hour for 3 separate variables (VO2, VCO2 and SvO2) within this data. Ideally the returned values will be displayed in rows per Hour with each corresponding median.

If it is not clear from this post, I have practically no experience at all with MatLab. I am in contact with some that understand it much better than I but they are all currently very busy and I need this to move forward. If anybody has an idea on how I could make this work please let me know, I would greatly appreciate it! You will be directly helping in research that may save children's lives in the future!

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

per isakson il 5 Nov 2014

Modificato: per isakson il 5 Nov 2014

I think that you increase your chances here if you

supply a couple of sample data files (attached via the paper-clip button)
make a rather detailed out-line of the result file (I assume some kind of text file)
describe how meta-data should be found and presented
describe how the data files can be found (avoid mixing of files from different children)
etc.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Matt Tearle il 5 Nov 2014

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/161436-using-a-time-stamp-to-find-the-median-for-every-hour#answer_157807

If you could show a snippet of one of the data files I might be able to give some more specific guidance, but I'd suggest:

f = cellstr(ls('*.asc')); to get the files to read
loop over the files in f
readtable to read in each file, concatenate to the current data (could be a bit slow, but no way around that unless you know how many rows you have in each file a priori)
sortrows to sort by the time stamp
calculate an hour variable -- something like this: hour = floor(timestamp/3600) (assuming that the time stamp is just a linear time in seconds)
use accumarray or grpstats (if you have Statistics TB) to calculate the median, using the hour as the index/grouping variable: grpstats(data,hour,@median)

Depending on how the files are split, you may be able to skip some of the initial steps. Or do different things. Are the different files different times? Different things being measured (VO2 vs VCO2)?

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Daniel il 5 Nov 2014

Modificato: per isakson il 5 Nov 2014

Sample Data.xlsx

The time stamps represent the actual time of day, spanning over multiple days. So there would be multiple data-points with the same time-stamp. We would want a median for every hour over the duration of the monitoring (which as I mentioned before changes based on each patient).

I do not know how to edit the .asc file so I copied and changed the data into an excel file, however, it appears the exact same way on each file for the patient. The headers will be at the top for each one of the files that belongs to each individual patient.

The variables of interest are located in columns: BL, ED and EE. The negative numbers that are seen represent a missing value. In this example that does not happen for the variables that we are interested in, but for other patients that is a possibility.

The only additional factor that would be incredible if you could account for would be that if the median value for VO2 occurred at the same time point that the RQ value is less than .8 or greater than 1.3 then that value is invalid.

Let me know if a certain part of this did not make sense. Thanks again for all of the help!

per isakson il 5 Nov 2014

Modificato: per isakson il 5 Nov 2014

Two questions regarding the names of the files

is it possible to retrieve an ID of the patient from the name of the file or the name of the folder?
is it possible to retrieve the order in time of the files from their names?

PS. A blank line is needed to separate paragraphs

Accedi per commentare.