Matthew Powers06/13/2023, 7:35 PM
Will Jones06/13/2023, 7:36 PM
Kees Duvekot06/13/2023, 10:55 PM
Matthew Powers06/14/2023, 1:43 AM
Kees Duvekot06/14/2023, 3:59 AM
Matthew Powers06/15/2023, 3:09 AM
Thanks for pointing me in the right direction. The resulting Parquet file had 50,476 row groups 😲 The notebook in case you’re interested.
writer = None with pyarrow.csv.open_csv(in_path) as reader: for next_chunk in reader: if next_chunk is None: break if writer is None: writer = pq.ParquetWriter(out_path, next_chunk.schema) next_table = pa.Table.from_batches([next_chunk]) writer.write_table(next_table) writer.close()