Using a time-stamp to find the median for every hour

1 view (last 30 days)
Daniel
Daniel on 5 Nov 2014
Edited: per isakson on 5 Nov 2014
Hello,
I am working on a project that is examining the consumption of oxygen within the brain of neonatal children with congenital heart disease. We pull data from a variety of sources that sample at different frequencies. All of these different sources come with a time-stamp for each individual entry. There are between 3-8 .asc files for each patient containing around 20,000 rows of data. Currently I have multiple .m files to find the median based off of the number of rows that would exist within each hour (ie. if the sampling occurs every 10 seconds it counts 360 rows). However, this is not perfect since during more frequent sampling such as every second there are missing data points. Additionally, between each file there are missing time points when the monitor was not connected for various reasons.
Ideally, I would like a script that would be able to combine multiple files into one large data-set for each patient. Then utilize the time-stamps to return the median for every hour for 3 separate variables (VO2, VCO2 and SvO2) within this data. Ideally the returned values will be displayed in rows per Hour with each corresponding median.
If it is not clear from this post, I have practically no experience at all with MatLab. I am in contact with some that understand it much better than I but they are all currently very busy and I need this to move forward. If anybody has an idea on how I could make this work please let me know, I would greatly appreciate it! You will be directly helping in research that may save children's lives in the future!
  1 Comment
per isakson
per isakson on 5 Nov 2014
I think that you increase your chances here if you
  • supply a couple of sample data files (attached via the paper-clip button)
  • make a rather detailed out-line of the result file (I assume some kind of text file)
  • describe how meta-data should be found and presented
  • describe how the data files can be found (avoid mixing of files from different children)
  • etc.

Sign in to comment.

Answers (2)

Matt Tearle
Matt Tearle on 5 Nov 2014
If you could show a snippet of one of the data files I might be able to give some more specific guidance, but I'd suggest:
  • f = cellstr(ls('*.asc')); to get the files to read
  • loop over the files in f
  • readtable to read in each file, concatenate to the current data (could be a bit slow, but no way around that unless you know how many rows you have in each file a priori)
  • sortrows to sort by the time stamp
  • calculate an hour variable -- something like this: hour = floor(timestamp/3600) (assuming that the time stamp is just a linear time in seconds)
  • use accumarray or grpstats (if you have Statistics TB) to calculate the median, using the hour as the index/grouping variable: grpstats(data,hour,@median)
Depending on how the files are split, you may be able to skip some of the initial steps. Or do different things. Are the different files different times? Different things being measured (VO2 vs VCO2)?
  5 Comments
per isakson
per isakson on 5 Nov 2014
Two questions regarding the names of the files
  • is it possible to retrieve an ID of the patient from the name of the file or the name of the folder?
  • is it possible to retrieve the order in time of the files from their names?
PS. A blank line is needed to separate paragraphs

Sign in to comment.


Chad Greene
Chad Greene on 5 Nov 2014
Edited: Chad Greene on 5 Nov 2014
The hourly median of ~10 second data can be found easily if you have time in Matlab's datenum format. Use downsample_ts:
If V02_10 and t_10 are your V02 readings and corresponding time vector,
V02_hourly_medians = downsample_ts(V02_10,t_10,'median','hour');

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by