Martin
01/19/2023, 5:55 PMdf.repartition(F.col("country")).write.partitionBy("country").format(
"delta"
).saveAsTable("country_people")
I was wondering: is there a benefit of performing an explicit repartition
on the partition column before writing into a partitioned table?
Delta does the repartition implicitly anyhow, doesn't it?Matthew Powers
01/19/2023, 6:11 PMNick Karpov
01/19/2023, 8:10 PMpartitionBy
doesn't shuffle rows, so without hte repartition call you are likely to have multiple tasks writing multiple files to a single partitionMatthew Powers
01/19/2023, 8:25 PMrepartition(F.col("country"))
was mainly just to output a single file to make the example cleaner. But I should probably note that in the blog.Martin
01/19/2023, 8:28 PMGerhard Brueckl
01/20/2023, 8:15 AMrepartition()
I guess it can be slightly slower but ensures that you write the minimum amount of files. Would be interesting if optimizeWrite
would consolidate the files in a similar way as an explicit repartition
🤔Nick Karpov
01/20/2023, 4:00 PMGerhard Brueckl
01/20/2023, 5:21 PMNick Karpov
01/20/2023, 6:09 PMGerhard Brueckl
01/20/2023, 6:17 PM