https://delta.io logo
s

Sukumar Nataraj

01/18/2023, 6:50 AM
Hi team, we have tried to set both min and max file size conf to 128GB in spark conf, but unfortunately it is honouring these conf, during compaction run. https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/OptimizeTableCommand.scala#L168 we have also tried to set it in table properties,
Copy code
ALTER TABLE dbx.tab1 SET TBLPROPERTIES ('delta.targetFileSize' = '104857600');
but still no luck. are we missing something, any idea on this?. Appreciating your help.
o

Omkar

01/18/2023, 10:03 AM
Considering the description you gave, I'm assuming here that you're using Delta Lake OSS. Unfortunately, the table property
delta.targetFileSize
is available in Databricks Delta and not in Delta Lake OSS right now. For full list of table properties which are honoured in Delta Lake OSS, refer to this list: https://docs.delta.io/latest/table-properties.html Databricks Delta seems to have some more supported table properties, that list is here: https://docs.databricks.com/delta/table-properties.html
n

Nick Karpov

01/19/2023, 8:17 PM
Unfortunately, the table property
delta.targetFileSize
is available in Databricks Delta and not in Delta Lake OSS right now.
this is not correct, this configuration is there and respected, you can see further down in the code path execution that the bin packing code uses the targerfilesize to determine the sizes to bin smaller files into https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/OptimizeTableCommand.scala#L248
@Sukumar Nataraj please double check that everything is configured correctly
o

Omkar

01/23/2023, 11:31 AM
@Nick Karpov Looks like that's a different property (
optimize.maxFileSize
) that you've mentioned which is passed from here and initialized in the Delta SQL Config here (not
delta.targetFileSize
). Please let me know if I'm missing something here, thanks. @Sukumar Nataraj you can try to validate if the properties are set properly on your table using the
SHOW TBLPROPERTIES
sql on your respective Delta table. Additionally, there's this Github issue similar to the issue you're facing which may also help: https://github.com/delta-io/delta/issues/1139
n

Nick Karpov

01/23/2023, 10:02 PM
ah, yes, you're correct, but I believe the OP was actually asking why configs weren't respected during an existing compaction run, aka. the batch command that has been supported in OS for some time... if that's correct then the configuration you're looking for @Sukumar Nataraj is the one @Omkar shared above
optimize.maxFileSize
👍🏼 1
👍 1
4 Views