Is it possible to create a sparse binary (.bin) file on disk?

Question

0 voti

I have a project where I would like to save my results to a binary (.bin) file that is stored on disk. Results need to be saved as they are generated (so that memory can be cleared), but the order in which these results are added to the binary file is not necessarily sequential (e.g., first I write to bytes 1-100, then 1001-1100, then 301-400, etc.).

In order to write non-sequentially to a binary file, I believe that file needs to be pre-allocated on the disk in some form or another. Is it possible to create a "sparse" binary file that has an area on disk set aside but which does not require writing zeros to every bit in the .bin file? I know how many bytes the file will take up when I am done saving to it, so this isnt a problem. Alternately, is there a way for me to write non-sequentially to a binary file without pre-allocating it first?

Thanks.

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Anthony Barone il 25 Mag 2018

Modificato: Anthony Barone il 25 Mag 2018

Apri in MATLAB Online

1 voto

In case anyone comes across this question looking for the same thing...at some point in the last year I figured out a much better way to do this. Make a system call to

fallocate (Linux/UNIX - create or extend file)
fsutil file createnew (Windows - create file)
fsutil file seteof (Windows - extend file)
mkfile -n (MacOS - create file)

I haven't figured out extending a file on MacOS, but since this is a very unusual use case for me I have it setup to either zero-write to the end of the file or to read the data, delete, allocate a larger file, and re-write the data when a file of MacOS needs to be sparse-extended.

This is effectively instant, since it is true write-less allocation. For example, as a test I just allocated a 4 GB file in 0.05 seconds.

That said, writing non-sequentially to a file like this can be very slow, so you might be better off adding in zeros and writing data to the end of the file on the fly as needed, but write less allocation is possible to implement from within MATLAB.

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Answer 2

Jan il 13 Mar 2017

Apri in MATLAB Online

1 voto

You can use this to expand (or shrink) a file efficiently: FEX: FileResize. It is twice as fast as appending zeros with fwrite.

function InsertData(File, Data, Format, Pos)
fid = fopen(File, 'r+');
if fid == -1
  error('*** %s: Cannot open file: %s', mfilename, File);
end
fseek(fid, 0, 1);  % Spool to end
Len = ftell(fid);
if Pos > Len
  FileResize(File, Pos);
end
fwrite(fid, Data, Format);
fclose(fid);
end

If multiple worker write to the same file... Hm. I'm not sure what happens, when two works access the same file and one writes into the section which is expanded by the other currently.

What about inventing your own "sparse" file format?

function InsertData(File, Data, Format, Pos)
fid = fopen(File, 'a');
if fid == -1
  error('*** %s: Cannot open file: %s', mfilename, File);
end
Header = [ndims(data), size(data)];
fwrite(fid, Header, 'uint64');
fwrite(fid, Data, Format);
fclose(fid);
end

A method for reading or creating full files in a post-processing will be equivalently easy. The file is read or spooled in blocks afterwards, but this will not be dramatically slower.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Anthony Barone il 13 Mar 2017

Thanks for suggesting FileResize. I will have to experiment if it works correctly with multiple workers.

As far as making my own type of sparse file - there is a specific format of for the file I am writing to that will allow it to be used in other applications (in paritcular .segy, which stores binary data along with a pre-defined list of header information). Making my own format would just require me to re-format it into the desiredformat when the code finishes, and as such wouldnt save me any time or trouble.

That said, even if I didnt have a target format I'm not sure this would be a good idea. The data is being written in such a way that sequential blocks of information are likely to be loaded with each other when you are loading part of the data (they represent data from locations that are physically close to each other). Introducing this type of sparse format would help initially, but seems like it would create significantly more work for accessing data once a significant amount of data has been added to the file, since it would have to jump around the file instead of reading sequentially.

Accedi per commentare.

Answer 3

Walter Roberson il 10 Mar 2017

0 voti

Unfortunately, No.

The POSIX standard operation that allows for sparse files is to fseek() to a location past end of file and write data there; the file system is then permitted to leave "holes" in the parts where nothing has been written.

Unfortunately, in MATLAB, if you fseek() beyond the end of file, the location "sticks" at the end of file.

Therefore, in MATLAB, if you want to write to a scattered location, the general write procedure is:

fopen() without the 't' (text) attribute (important!), with 'a' access (not 'w' or 'w+' or 'a+' for this purpose)
fseek() to end of file
ftell() to determine the position of the end of file, in bytes
if the current end of file is before the place you need to be, fwrite() 0's to the place you need to be; otherwise fseek() to the place you need to be
fwrite() the data you want

The general read procedure is:

fopen() without the 't' (text) attribute (important!), with 'r' or 'a' or 'a+' access (not 'w' or 'w+') -- it is fine to keep the file open with 'a' access for reading and writing
fseek() to the position you need to be
ftell() to determine the position you ended up in, in bytes
if the current position is before the place you need to be, the data has not been written yet, so act appropriately
otherwise fread() the data, keeping in mind that you might encounter end of file if you were not consistent about the blocksize -- or even if the end of file happened to be exactly at the place you want to start reading

You can modify this procedure to test that the entire block of data is available before you read it.

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente

Walter Roberson il 13 Mar 2017

For that kind of situation, perhaps memmapfile() would be suitable.

Anthony Barone il 13 Mar 2017

To clarify, when I referred to "accessing the files" the only access that is required is a single access to write the data. After the data is written I wont need to access the written data again until after the code has finished running and all results from the code have been written to disk.

This makes me think that using memmapfile would just result in unnecessairy additions to the vitrual memory addresses, and wouldnt actually give any benefit since I dont need to access the data again after it is written. Am I correct in thinking this, or do I misunderstand something?

Accedi per commentare.

Is it possible to create a sparse binary (.bin) file on disk?

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Più risposte (2)

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente

Categorie

Prodotti

Tag

Community Treasure Hunt

Is it possible to create a sparse binary (.bin) file on disk?

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Più risposte (2)

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

3 Commenti Mostra 1 commento meno recente Nascondi 1 commento meno recente

Categorie

Prodotti

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente