how can I append a Parquet file?

30 visualizzazioni (ultimi 30 giorni)
8eodosis
8eodosis il 13 Mag 2020
Risposto: Kevin Gurney il 10 Set 2020
Hello,
I have a Parquet file that I wish to append. I looked at the documentation of parquetwrite but doesnt provide any info on appending. It looks like this was an option in the old interface setting the option 'AppendData' to true:

Risposta accettata

Kevin Gurney
Kevin Gurney il 10 Set 2020
The version of parquetwrite introduced in R2019a does not currently support appending to preexisting Parquet files on disk.
The "AppendData" name-value pair that you referenced in the Parquet Support Package does not append to a preexisting file, but rather incrementally writes chunks of data to an open Parquet file output stream.
The Support Package uses a "stateful Writer object" in conjunction with multiple write() calls to achieve this. The Parquet file output stream is closed when a call to finish() is made.
There is currently no equivalent ParquetWriter object shipping in MATLAB.
----------
An alternative workflow to appending chunks of data to a preexisting Parquet file, would be to write out new Parquet files and then "emulate" the behavior of having one contiguous Parquet file using parquetDatastore.
If you write multiple Parquet files to disk in sequence (one for each chunk), which have consecutive numeric suffixes (e.g. data_01.parquet, data_02.parquet, ..., data_0N.parquet), you can use parquetDatastore to order these files as though they were one contiguous Parquet file. With this approach, you can call readall(parquetDatastore) to read the entire sequence of Parquet file "chunks" in one function call.
An example:
% Assuming the current directory contains data_01.parquet, data_02.parquet, ..., data_0N.parquet.
>> data = readall(parquetDatastore("data*.parquet"));

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by