Hi all,
I wrote some 60 million records to a S3 and created Delta table on top of it.
Initially, i could see all files are created of 128mb size
When i run Merge on day 2 which has updates and inserts, all files merged to single file of size 786mb.
If i run OPTIMIZE tableName ZORDER BY columnName, i could see files are created back with 128MB,
These are some of Spark configs i use
("spark.databricks.delta.retentionDurationCheck.enabled", "false"),
("spark.databricks.delta.autoOptimize.optimizeWrite", "true"),
("spark.databricks.delta.autoOptimize.autoCompact", "auto"),
("spark.databricks.delta.autoCompact.maxFileSize", 134217728),
("spark.databricks.delta.optimize.maxFileSize", 134217728),
("spark.databricks.delta.tuneFileSizesForRewrites", "false"),
I am unable to find what's kicking off this file merge to single file.
Can you please give me an idea.