How to deal with the matrix with the size of 5000000*8760

1 visualizzazione (ultimi 30 giorni)
How to deal with the matrix with the size of 5000000*8760? 5000000 here is the vehicle number and 8760 represent the hourly charging status of the vehicle for one year (8760 hours). The non-zero elements account for 10%, so I tried sparse matrix. However, it doesn't work. I have no idea how to generate, save and load so big matrix. Thank you!
% 1 generate weekly charge status
chargestatusweek=zeros(5000000,168,'single');
% 2 generate yearly charge status from weekly matrix
chargestatusyear=zeros(5000000,8760,'single');
chargestatusyear(:,1:8736)=repmat(chargestatusweek,[1,52]);
chargestatusyear(:,8737:end)=chargestatusweek(:,1:24);
% 3 save this matrix and load it in another code
% 4 scan 8760 hour to update the charging status of chargestatusyear
for t=1:8760
chargestatusyear(4,13)=0;
chargestatusyear(344,2300)=0;
chargestatusyear(3459,5600)=0;
...
end
I have tried to only use the weekly data (168 hours) and covert all other hours to the first week. However, I just found this operation will make the update the same for t=169 and t=337, as this two time will be translated to the 1st hour for the 1st week. But the correct update should be different for t=169 and t=337. That is why I am now finding ways to generate yearly data.
  2 Commenti
James Tursa
James Tursa il 31 Mar 2018
Can you show us some of your code, and tell us how you are getting the data into this variable and what your downstream processing of this data will be?
Chaoyang Jiang
Chaoyang Jiang il 31 Mar 2018
I have edited my question accordingly. Thank you!

Accedi per commentare.

Risposte (1)

John D'Errico
John D'Errico il 31 Mar 2018
Modificato: John D'Errico il 31 Mar 2018
I still do not see you say what you are doing with the matrix. "downstream processing" is not sufficient information.
There may be good reasons why you need it as a matrix. Or perhaps there are not.
Remember that this matrix is huge. Even in single precision, it will require something like
5000000*8760*4/2^30
ans =
163.17
163 GIGABYTES of memory to store that matrix.
Even if you store it in sparse form, and it is 90% zero, sparse is not supported for single precision. (At least not in R2017b. I need to download R2018a, but the release notes to R2018a do not indicate support for single sparse arrays.) Therefore you would need to store the matrix as a sparse double precision array.
The memory required for a sparse double of that size would still be on the order of 31 gigabytes of RAM. In order to use it in any way, depending on what you would do with it, MATLAB might even be forced to make copies of the array. While that might be possible, you would need a lot of RAM, and a fast hard drive. A SSD drive would be useful, because your computer will be doing a lot of memory shuffling.
Next, while you said that you TRIED a sparse matrix, we don't know how you tried to create that sparse matrix. My guess is you did not use sparse correctly, nor did you create the matrix properly. No matter what, it will require a LOT of memory just to create the list of non-zero elements, and their positions in that final sparse matrix. Then to make the matrix itself, you will create a copy of all that information. So you will end up needing something on the order of 60 to 80 GIGABYTES of RAM to create the sparse matrix. Again, a lot of memory.
You might want to read this link carefully:
https://www.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
In the end, I would suggest that you are trying to process too large an amount of data at once for the capabilities that you have, both in terms of the memory management skills you have, and in terms of what your current computer is capable of storing. Just because you were able to process weekly data like this does not mean that you should jump to now processing yearly data all at once. Of course, even if that was easily done, then you might decide to get good accuracy, what you really needed to do was to process 5 or 10 years of data at a time. This is how things work. I need MORE DATA is the common refrain. But can you work more efficiently instead?
So I would strongly suggest that you consider reformulating how you process things.
Perhaps you might generate the array in blocks. For example, you could generate blocks that are 4 weeks in size, saving them out to disk. Save as many such blocks of data as you wish in separate files. Then read them in as you need them, replacing the previous block of data in current memory. Yes, this will require fast disk access speeds, so a large SSD drive will be useful.
Perhaps a better way to store this data would be to use a DATASTORE.
https://www.mathworks.com/help/matlab/datastore.html https://www.mathworks.com/help/matlab/import_export/what-is-a-datastore.html
This will help MATLAB to do some of the memory management work for you. Again, I don't know what you will do with this array after you create it, as you never told us that.
  2 Commenti
Walter Roberson
Walter Roberson il 31 Mar 2018
You might be able to use tall arrays. But not with those repmat the way they are, I suspect.
Chaoyang Jiang
Chaoyang Jiang il 31 Mar 2018
Modificato: Chaoyang Jiang il 31 Mar 2018
Thank you very much for your answer. The way I generate the sparse matrix is:
chargestatusweeknew=sparse(chargestatusweek).
Then the memory of new chargestatusweeknew is larger than using int8 without sparse operation.
For using the datastore/tall arrays, do I still need to generate the chargestatusyear=zeros(5000000,8760,'single') data? As I read the help documents and found that basically, you need to have a .csv(or text) file before using datastore. So I am wondering how to save the big chargestatusyear.mat file before using the datastore.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by