How does ValueStore (Parallel Computing Toolbox) deal with concurrent write access?

Question

Leonie Schicketanz il 14 Ott 2022

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1826158-how-does-valuestore-parallel-computing-toolbox-deal-with-concurrent-write-access

Commentato: Leonie Schicketanz il 11 Nov 2022

Usecase: random and parallel read access on network where operating system read-cache functionality is ineffective. Therefore, we implement our own read cache.

I am working on a script where in a parfor loop data is loaded and stored, going through the cache and also filling the cache.

Thus, each parallel worker has to access - namely read and extend - this cache/shared data storage independently.

After working with persistent variables (+external interprocess library), I found the very new ValueStore [1] functionality which seems to do the job with less hassle.

But I couldn't find in the documentation whether I risk running into race conditions. For instance when the workers extend the same data entry simultaneously, which can theoretically happen, while with low propability. Atomic opertations and syncronization or mutexes are not mentioned in the documentation (at least for write access?), which is a little bit scary.

To ask more precisely, is Matlab doing its operations atomic (thread safe) for ValueStore functions? And, where does ValueStore store its data exactly - the documentation only mention where it is not stored? Is it stored on local persistent storage relying on OS for cache and performance?

[1] https://de.mathworks.com/help/parallel-computing/parallel.valuestore.html

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Stuart Moulder il 17 Ott 2022

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1826158-how-does-valuestore-parallel-computing-toolbox-deal-with-concurrent-write-access#answer_1077148

ValueStore is designed for workers to incrementally store data during job execution. ValueStore therefore only makes the following guarantees:

Multiple processes may safely read or write to the same entry. It should not be possible to end up with corrupted data.
Since there is no synchronization, concurrent writes to the same entry from multiple processes do not have a well defined order. The final stored value of the entry will be that of the "last" writer as deemed by the implementation of ValueStore.
Since there is no synchronization, concurrent reads during writes from multiple processes may see a stale value.

To answer your question then, ValueStore does not provide the synchronization mechanisms required for multiple processes to coordinate a shared entry. As you point out, this would require the addition of mutexes or atomic read-then-modify operations to do so safely.

Data stored in ValueStore uses the JobStorageLocation of the cluster running your parfor. Depending on your cluster setup, this will veither be a shared filesystem location or a database. Since this involves file system access it is also unlikely that ValueStore would be more performant than an operating system read-cache.

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

Leonie Schicketanz il 18 Ott 2022

Thank you for your answer! However, I am still left with a few questions that I hope you can answer as well.

You said that it should not be possible to end up with corrupt data but that concurrent reads during writes may see a stale value. So, in the end, even if the data is not corrupted you can't really rely on what's in your ValueStore, right? Now, what is the use case of this functionality? When would I want to concurrently write and read data into a ValueStore with multiple processes and not care if what I read is correct?

Concerning the storage: out of interest, I put more and more data into the ValueStore to test the limitations. At some point, a blue Windows pop-up message came up warning me about low disk space. I had to restart Matlab but the data in the ValueStore on my SSD was never deleted anymore (even after restarting my PC it wasn't). Isn't there a mechanism to clean up the data or is there a way to make sure it will get cleaned up even if Matlab crashes?

Stuart Moulder il 19 Ott 2022

Apri in MATLAB Online

Entries in the ValueStore obey eventual consistency. For most environments this consistency will occur instantly, however for others the ValueStore could be using the network file system which may have significant delays depending on the user setup.

ValueStore is intended for one process to write an entry which other processes can see. Intended use cases for this include:

One process writing an entry once. In this case the entry either exists or does not, so all other processes which see the entry are seeing the final result. This could be used for one process to offload a result which another process then post-processes. It can also be used to offload large results out of memory where they can be easily found later by the client MATLAB.
One process repeatedly writing the same entry. This could be used to share progress or some intermediate result. Any process which reads this entry should expect that it may be overwritten and therefore without some external synchronisation mechanism will always have to accept that the result could be stale.

The use case you describe where multiple process coordinate on the value of shared entry is not supported. Since there is no locking or atomic read-and-modify operations any process which reads an entry and then tries to write an updated value to the same entry cannot guarantee that another process hasn't modifed the entry in the meantime. If you do need to coordinate actions between workers in a pool you might wish to try the spmdBarrier, spmdSend, spmdReceive etc. operations which use mpi to communicate between workers.

As discussed, the data inside ValueStore is stored in the JobStorageLocation of the cluster which is running your pool. To delete the data you must delete the Job. Usually when you delete a parpool the backing Job and its associated data is automatically deleted as well. In this case it sounds like too much data was stored and MATLAB existed without running the Job delete hook. To clean up your SSD you can manually delete the Job using the Cluster object

cluster = parcluster();
jobs = cluster.Jobs;
% Find which Job is the one related to the parpool
delete(jobs(indexOfPoolJob));

Leonie Schicketanz il 11 Nov 2022

Hello, thank you for your time to answer all my questions (and sorry for the late reaction).

I tried to delete the job as you proposed but when Matlab had crashed due to the error I described above I couldn't access this job to delete the data (so I had to search the folder and delete the data manually). And even when it works, it is a hassle if it doesn't clean up by itself at some point. Thus, I hope that this behaviour can be improved.

Also, would it be possible to extend the documentation on ValueStoren with the explanation you gave me (especially the details in your first answer)? This info was very insightful in order to be able to deal with the ValueStore correctly and I imagine others feel the same.

Accedi per commentare.

How does ValueStore (Parallel Computing Toolbox) deal with concurrent write access?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

How does ValueStore (Parallel Computing Toolbox) deal with concurrent write access?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

3 Commenti Mostra 1 commento meno recenteNascondi 1 commento meno recente

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente