Read Only Globals in Parallel Computing

Question

Dan il 17 Lug 2012

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/43829-read-only-globals-in-parallel-computing

When porting a serial application to parallel, I often run into the problem that the script I'm trying to port uses globals. I know its costly to implement full read/write globals, but it should be possible to implement a read only global block for the labs. Especially in local, single workstation, environments. It would be possible to establish a detached memory block that is set read only and give each lab process access to it. This wouldn't be too hard in a distributed environment either.

My only other alternative is to pass the structures through arguments which is very cumbersome. Here I'm talking about a long list of variables that describe the environment in which the analysis is running. Many functions reference these variables and passing them through argument lists is a nightmare.

I'd be intereseted to know if other parallel toolbox users would find this useful. I'd also be interested to know if Mathworks would consider implementing this.

Thanks, Dan

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Walter Roberson il 17 Lug 2012

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/43829-read-only-globals-in-parallel-computing#answer_53783

There is a FEX contribution for shared matrices on a single host. It does, however, only share a single matrix at a time, which could get clumsy for your purposes.

Extending shared memory to DCS would present difficulties. One worker per host would have to be nominated to create the shared matrix on that host, and the "last" worker (or that initial worker) would have to be responsible for removing it; this would require coordination between the workers that is not currently present. Especially as new copies of the worker might get scheduled on the host before all of the "first round" had disappeared.

The implementation could possibly take advantage of the fact that ipcrm will not actually delete a shared memory segment until the last detach. Well, on Linux and OS-X at least; I've never been convinced that MS Windows really implements POSIX to that level.

I suspect the matter might present problems in MS Windows on 32 bit systems, problems attaching the segment to the exact same virtual address in each process. The same virtual address needs to be used because the memory pointers that MATLAB uses internally are absolute addresses (absolute relative to the virtual address space of the process, as opposed to being segment + offset style that can be relatively easily relocated.)

As best I can tell, MATLAB is not designed to be able to work with memory pools internally, not designed to be able to bundle symbol tables and descriptors of variables and actual memory blocks, all within a restricted memory range. The creation of variables entirely within a memory segment for later sharing could possibly take a lot of internal work... I don't know.

2 Commenti
Mostra NessunoNascondi Nessuno

Dan il 17 Lug 2012

Thanks for quick reply. I did this type of thing under VMS years ago, so I know it can be done with virtual memory system. Management of the distributed global will be a bit dicey as you describe it.

Walter Roberson il 17 Lug 2012

Is it possible to write programs that can share memory effectively? Yes. Is MATLAB written to handle it? I don't think so.

Accedi per commentare.

Answer 2

Mark il 19 Apr 2013

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/43829-read-only-globals-in-parallel-computing#answer_82817

I'm quite late to the party on this one, but it seems that this is worth a reread and repost. I think many of us are finding that the need for managing the data we are working with is not given enough focus. Specifically, we throw around terms like "Big Data" yet have few tools to intelligently data-mine, manipulate or transform same.

If you consider one of the fundamental functions of MatLab and the family of similar solutions, embarrassingly parallel and / or distributed jobs are often parametric sweeps. I would even assert that the majority of jobs are.

Some component of the classic parametric sweep process is most certainly static, and likely growing exponentially in size as our Big Data notion continues to evolve. In the financial services space this borders often on unwieldy. It was trivial to throw a dataset (matrix or otherwise) around when its size was simply N a decade ago, but we now finds ourselves at N^t which I posit is no small issue.

I would appeal to those that understand this paradigm within Mathworks to consider it a high priority need that we provide a mechanism to declare and access a shared memory space of static global constants over the course of a job or jobs at the node level. If we need to invoke a process at onset and closing of a job process as overhead in perhaps an example of a distributed job, this seems a trivial price to pay in comparison of moving large datasets.