Matthew Powers
06/13/2023, 7:35 PMWill Jones
06/13/2023, 7:36 PMKees Duvekot
06/13/2023, 10:55 PMMatthew Powers
06/14/2023, 1:43 AMKees Duvekot
06/14/2023, 3:59 AMcoalesc(1)
Matthew Powers
06/15/2023, 3:09 AMwriter = None
with pyarrow.csv.open_csv(in_path) as reader:
for next_chunk in reader:
if next_chunk is None:
break
if writer is None:
writer = pq.ParquetWriter(out_path, next_chunk.schema)
next_table = pa.Table.from_batches([next_chunk])
writer.write_table(next_table)
writer.close()
Thanks for pointing me in the right direction. The resulting Parquet file had 50,476 row groups ๐ฒ The notebook in case youโre interested.