Iwan Aucamp

04/03/2023, 1:31 PM
Probably not the best place to ask about Parquet, but what would be the “canonical” way of writing Parquet from go or python? is it with Apache arrow or using purpose built libraries?

Mike M

04/03/2023, 2:16 PM
This is probably better in #random, but Arrow is largely the current method for both. Note that if you're used to Spark, Arrow does separate out components (mainly if you're reading direct from CSV / etc). You can also use something like DuckDB, which might be easier depending on your source data. I believe there's also a python "fastparquet" library, but I've not used it.
