https://delta.io logo
s

Steve Quan

02/13/2023, 4:07 AM
Hi guys, what's the difference between "optimize compaction bin-packing" and "repartition" in terms of combing small files into fewer big files? Do we always suppose to choose "optimize operation" over manually re-partition?
j

JosephK (exDatabricks)

02/13/2023, 12:16 PM
optimize is an operation. Repartition is a dataframe method used before a write. Optimize will outperform repartition because it will make files the correct size. Repartition will give you equally sized writes, but there is no way to control the size.
3 Views