Does matlab support parquet partitions

16 visualizzazioni (ultimi 30 giorni)
Jerry Duggan
Jerry Duggan il 30 Dic 2022
Risposto: Sudarshan il 2 Gen 2023
I have a large data set written using parquet partitioning. The partition variable is called 'mdRun', and I have 10 parquet files created in 10 directories as follows:
.../events/mdRun=0/events-0.parquet
../events/mdRun=1/events-0.parquet
and so on. I created these files using pyarrow Hive partitioning.
Using pyarrow, I can read the parquet file corresponding to a single partition using the filter argument, which will read only the parquet file stored in the appropriate directory. As a nice side effect, the mdRun column is not stored in the parquet file, but it is automatically included when I read a partition file(s).
Is it possible to read a parquet partitioned dataset in matlab in the same way?
Thank you!

Risposte (1)

Sudarshan
Sudarshan il 2 Gen 2023
Hi Jerry,
As per my knowledge, the feature is not supported by MATLAB in R2022b. This request has already been forwarded to the relevant team.
However, MATLAB R2022b does support parquet file reading and writing. I have attached a few documentation links that may help you in working with parquet functions.
You can refer to the link below for various functions that could be useful in your case:
You can refer to link below for the detailed documentation of the data type mappings:
To help you read parquet files, you can refer the link below:
I hope that this helps!

Categorie

Scopri di più su Data Type Conversion in Help Center e File Exchange

Prodotti


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by