fwrite and MATLAB for a raid0 disk - Only one lane?

Hello everyone,
I have a raid0 NVMe disk (made up of 4 NVMe disks connected together through a PCIe card adaptator).
The disk works great (up to 12GB/s OUTSIDE MATLAB, PCIe 3.0) but I cannot reach such speed in MATLAB.
It looks like MATLAB is using a single bus lane (aka 3.5GB/s) to write the data to the disk (simple example):
data = randn(1024, 1024, 1024, 'double'); %8 GB
fid = fopen('test.bin', 'W');
tic;
fwrite(fid, data(:), 'double');
toc;
fclose(fid);
Takes about 2.3 seconds which is about 3.5 GB/s so like using one lane... where the raid0 uses 4 lanes (4x4 PCIe).
I am running out of solution, this is not related to the disk/raid0 itself; I tested a lot of raid0 configuration (bios, VROC, Windows raid), the issue only occur in MATLAB. Using hd5f files does not solve that issue, it seems to be related to MATLAB itself.
FYI: I need such speed, in my field/lab we are creating about 1TB data per 5 min the bottleneck is always related to saving the data.
EDIT 1: Removed "b" argument from "fopen"
EDIT 2: Added type "double" to "fwrite"
Thank you a lot.

5 Commenti

Odd. I would not have expected that any program would be able to do this. RAID does not tend to be transparent.
Jan
Jan il 29 Mar 2022
Modificato: Jan il 29 Mar 2022
By the way, there is no 'b' format in fopen() for over 20 years now. It is still accepted (and ignored) for keeping background compatibility. 'double' is not a valid machine format expected as 3rd input of fopen.
What happens with fopen('w') (a lower case w)?
What do you see for:
data = randn(1024, 1024, 'double'); % 8 MB
fid = fopen('test.bin', 'W');
tic;
for k = 1:1024
fwrite(fid, data, 'double');
end
toc;
fclose(fid);
Changing 'W' to 'w' does not change anything sadly. I tried it after reading several articles about it.
Getting high speed transfer to disk can require using special system calls. I do not have any information about how it is done in Windows; in Linux apparently there are methods that can avoid round-trips to user mode. It is unlikely that MATLAB implements those methods.
In Windows... I don't know. Is WriteFileEx still used in practice? https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefileex That does asynchronous writes, which historically has been an important step in performance improvement. Or perhaps WriteFileGather() https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefilegather ?
In a logging situation, you would like to be able to grab a buffer full of input, schedule it to be written, and continue on without waiting for the I/O to complete.
I suspect that MATLAB simply uses C or C++ fwrite() https://www.cplusplus.com/reference/cstdio/fwrite/ which waits for I/O to complete
@Walter Roberson I did a MEX file using WriteFile without success. I will try some asynchronous writes with WriteFileEx and also try WriteFileGather.
I did contact the support to get some answers about that.
I tried fwrite/ofstream/WriteFile (MEX files) even in chuncks, without any success.
Thanks for taking the time, I will read those links and try those approaches.

Accedi per commentare.

Risposte (2)

Jan
Jan il 29 Mar 2022
Modificato: Jan il 29 Mar 2022
What about trying it as C-Mex?
data = randn(1024, 1024, 1024, 'double'); %8 GB
tic
uglyCWrite(data);
toc
// Short hack, UNTESTED!!!
// uglyCWrite.c
#include "mex.h"
#include <stdio.h>
#include <stdlib.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *data;
size_t n, w;
File *fid;
data = (double *) mxGetData(prhs[0]);
n = mxGetNumberOfElements(prhs[0]);
w = mxGetElementSize(prhs[0]);
fid = fopen("test.bin", "w");
fwrite(data, n, w, fid);
fclose(fid);
}

2 Commenti

Thank you for taking the time to put that piece of code together.
This morning I tested several MEX implementations from this post: https://stackoverflow.com/questions/70126690/write-binary-file-to-disk-super-fast-in-mex
Those are not faster than fwrite in MATLAB:
void writeBinFile(int16_t *data, size_t size)
{
FILE *fID;
fID = fopen("file_fopen.bin", "W");
fwrite(data, sizeof(int16_t), size, fID);
fclose(fID);
}
void writeBinFileFast(int16_t *data, size_t size)
{
ofstream file("file_ostream.bin", std::ios::out | std::ios::binary);
file.write((char *)&data[0], size * sizeof(int16_t));
file.close();
}
void writeBinFilePartByPart(int16_t *int_data, size_t size)
{
size_t part = 64 * 1024 * 1024;
size = size * sizeof(int16_t);
char *data = reinterpret_cast<char *> (int_data);
HANDLE file = CreateFileA (
"windows_test.bin",
GENERIC_WRITE,
0,
NULL,
CREATE_ALWAYS,
FILE_FLAG_SEQUENTIAL_SCAN,
NULL);
// Expand file size
SetFilePointer (file, size, NULL, FILE_BEGIN);
SetEndOfFile (file);
SetFilePointer (file, 0, NULL, FILE_BEGIN);
DWORD written;
if (size < part)
{
WriteFile (file, data, size, &written, NULL);
CloseHandle (file);
return;
}
size_t rem = size % part;
for (size_t i = 0; i < size-rem; i += part)
{
WriteFile (file, data+i, part, &written, NULL);
}
if (rem)
WriteFile (file, data+size-rem, rem, &written, NULL);
CloseHandle (file);
}

Accedi per commentare.

I was playing around with this and found that this is much faster (by a factor of 3 on my machine):
fwrite(fid,data(:),"double");

1 Commento

Vincent Perrot
Vincent Perrot il 29 Mar 2022
Modificato: Vincent Perrot il 29 Mar 2022
Thank you.
Sadly we tried it, this is how I got the 3.5GB/s I was talking about in my first message.
I played around with the code and forgot to put it back in my question, sorry about that.
I edited my question, we are still at 3.5GB/s instead of 12 GB/s ish.

Accedi per commentare.

Categorie

Scopri di più su MATLAB in Centro assistenza e File Exchange

Prodotti

Release

R2021a

Richiesto:

il 28 Mar 2022

Modificato:

il 30 Mar 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by