https://delta.io logo
r

Rajath Chandregowda

04/17/2023, 3:30 PM
Hi Team, I'm using OSS delta 2.3.0 and spark 3.3.2. I'm using the mentioned configurations to set the max file size (optimize.maxFileSize=500*1024*1024) to 500mb I was playing around with configuration and noticed that. If i set the file size to 500mb offcourse the part file are adhered to the configs but when I change the max file size (optimize.maxFileSize=128*1024*1024) to 128mb, the compaction is not working. Is there any extra configurations I'm supposed to add or doing a repartition or coalesce is the only option as of now ? PS - The other way round of compaction works i.e. 128mb to 500mb
a

Ajex

04/17/2023, 3:45 PM
Your configuration is not working; the default file size for compaction is 1 GB. The correct configuration for Delta OSS is
spark.databricks.delta.optimize.maxFileSize =<file_size>
r

Rajath Chandregowda

04/17/2023, 3:48 PM
Ya that’s the same configuration I have used. But I’m not able to come down from 1gb (just an example). I have tried the samething from 500 mb to 128mb it didn’t work.
@Ajex
a

Ajex

04/17/2023, 4:00 PM
What do you mean by "the compaction is not working"? After compaction, does it return with a larger file size, or are there nothing updates in the
_delta_log
folder? And how did you check for the output file size after compact, using hdfs command or ... ?
r

Rajath Chandregowda

04/17/2023, 4:41 PM
@Ajex I checked in s3. Let me explain step by step, Consider a tpcds table store_sales (total size of table 5gb), 1. I set the spark.databricks.delta.optimize.maxFileSize=500*1024*1024 in spark config 2. I read the table and wrote back to delta (there were roughly around 50 part files) 3. Then I ran a optimize and vaccum cmd . I was able to see only 10-11 part files each 500mb. which is what i expected 4. Next I changed the spark session to spark.databricks.delta.optimize.maxFileSize=128*1024*1024 5. Then again I ran the optimize and vaccum cmd. I expected the to have around 40 part files with 128mb. this is what I'm not able to achieve
with repartition, we can do but this should have worked with the above configs and optimize. I'm not sure whats happening
13 Views