This is probably better in #random, but Arrow is largely the current method for both. Note that if you're used to Spark, Arrow does separate out components (mainly if you're reading direct from CSV / etc). You can also use something like DuckDB, which might be easier depending on your source data. I believe there's also a python "fastparquet" library, but I've not used it.
gratitude thank you 1
delta io 1
👍 4