Making use of multiple harddrives to avoid IO bottlenecks?

4 visualizzazioni (ultimi 30 giorni)
I am reading in a lot of data (1.5 terabyte). So I would like to minimize disk IO.
  • I have 4 NVME drives (2 tb each)
  • 'a lot' of ram (okay, a lot = 128 gb, which could mean not that much in fact)
  • I have data that I would like to postprocess in matlab
  • I am using parfor loops to read data
Typically, I would put all the data on 1 drive. Even though NVME drives IO is quite quick (~4000 mb/s), my question is:
  • Would it make sense to distribute the (to be postprocessed data) on all 4 drives, which would then be read in by matlab, in order to minimize IO bottlenecks?

Risposta accettata

Walter Roberson
Walter Roberson il 20 Giu 2022
You should ideally distribute the data to different drives and distribute the drives to different controllers.
However you might be constrained by your architecture. I seem to recall having read about some architectures that could only handle three full-width PCIx and the fourth one had to run at half speed. You also need to take into account that the other drives on your system will need some lanes. PCIx cannot allocate (for example) 12 lanes for one device, and 2 for each of two other devices for a total of 16: if I recall correctly, you can only allocate powers of 2 - so the first device could get 8, and the other 2 each, with the remaining 4 unused.
You might be interested in some of the Linustech videos, as in some of them he shows difficulty in maxing out drives.
The reviews seem to say that in the mass pro market these days (not very low volume specialty manufacturers), the Samsung 9x0 are close to the best read rates (not always the best write rates compared some of the small manufacturers).
While I am on the topic: anyone using external enclosures and needing high performance, should look seriously at some of Thunderbolt 4 NAS or DAS. The performance ratings for the well designed enclosures are sometimes several times what you would get from the low cost mass market drives.
  3 Commenti
Walter Roberson
Walter Roberson il 21 Giu 2022
If the cluster is cloud computing that is emulating drives over some internal layer, then that is probably something that would require getting a specific service agreement for separate hardware.
If the cluster can give you multiple drives each on separate controllers, you would typically prefer that. If you are using spinning platter drives, then two drives per controller is commonly the most efficient.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Cluster Configuration in Help Center e File Exchange

Prodotti


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by