Best way to read in massive amount of data from csv files?

5 visualizzazioni (ultimi 30 giorni)
I am working with two CSV files ~25 GB each. When I read in one file all at once, I get a vector of size 9.8 GB. I only have about 24 GB of RAM, and with two vectors and further computations it is putting quite a strain on my computer. I was wondering if it was better in this case to read in the files piece by piece, and keep going back to them to read in the next data segment, or if I should load in all the data into memory at once? Either way I have to go through all the data, and timing is a consideration since at the present moment it takes nearly 20 minutes for my computer to read in one entire file into a vector. I imagine this time would increase were I to constantly go back and make more, albeit smaller, calls to csv read with row indexing?

Risposte (1)

Stephane Dauvillier
Stephane Dauvillier il 25 Giu 2019
Hi,
If you have huge data file(s), you may want to look at datastore.
First datastore can apply on fileS, folder.
Then if you don't specify it, datastore wit not deal file by file but block by block and will simply treat another file when the current one is finish.
For column data, you can specify which column to really import (very effective if you know you only wants some of the columns and not everyone).
Is your files have the same number of column and they contains the same "data" (I mean for instance column 1 in your two file represent the ssame observation like Name, age, height, ....)?

Categorie

Scopri di più su Large Files and Big Data in Help Center e File Exchange

Prodotti


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by