how can I append a Parquet file?
33 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hello,
I have a Parquet file that I wish to append. I looked at the documentation of parquetwrite but doesnt provide any info on appending. It looks like this was an option in the old interface setting the option 'AppendData' to true:
0 Commenti
Risposta accettata
Kevin Gurney
il 10 Set 2020
The version of parquetwrite introduced in R2019a does not currently support appending to preexisting Parquet files on disk.
The "AppendData" name-value pair that you referenced in the Parquet Support Package does not append to a preexisting file, but rather incrementally writes chunks of data to an open Parquet file output stream.
The Support Package uses a "stateful Writer object" in conjunction with multiple write() calls to achieve this. The Parquet file output stream is closed when a call to finish() is made.
There is currently no equivalent ParquetWriter object shipping in MATLAB.
----------
An alternative workflow to appending chunks of data to a preexisting Parquet file, would be to write out new Parquet files and then "emulate" the behavior of having one contiguous Parquet file using parquetDatastore.
If you write multiple Parquet files to disk in sequence (one for each chunk), which have consecutive numeric suffixes (e.g. data_01.parquet, data_02.parquet, ..., data_0N.parquet), you can use parquetDatastore to order these files as though they were one contiguous Parquet file. With this approach, you can call readall(parquetDatastore) to read the entire sequence of Parquet file "chunks" in one function call.
An example:
% Assuming the current directory contains data_01.parquet, data_02.parquet, ..., data_0N.parquet.
>> data = readall(parquetDatastore("data*.parquet"));
0 Commenti
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Data Import and Analysis in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!