https://delta.io logo
b

Ben Temple

02/01/2023, 6:31 PM
Hi all I'm getting a job failure due to
Total size of serialized results of 462 tasks (4.0 GiB) is bigger than spark.driver.maxResultSize 4.0 GiB
. I have checked the usual suspects for this issue such as collect statements, really high numbers of partitions and large broadcast joins, none of which occur in the job. Digging further I believe this could be caused by the Delta writer as when looking at the stage that returns this it shows this line
com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:70)
. I do not have access to the source code but wondered if anybody had any insight as to why this line could be returning fairly large amounts of data to the driver (I assume metrics and information on file locations for updating the delta log), but it seems that each task is returning a lot of information and for the total number of tasks for this stage (2787, which is equal to the current number of partitions) it would require a much increased result size than currently configured. I would rather work out if this result size can be reduced than keep increasing the Spark configuration and driver size. Can anybody please give any insights into what information is returned to the driver in the write function mentioned above and if there are any particular types of data that could cause each result set to be so large?
j

JosephK (exDatabricks)

02/01/2023, 6:32 PM
I think I’ve seen this happen when you’re trying to read half a million files and the driver has trouble scheduling that many file reads/tasks. Bigger drive may help
b

Ben Temple

02/01/2023, 6:41 PM
Thanks Joseph, I've just checked the number of source files for the job and it's only ~6000 so I don't think that should be an issue
🤷 1
n

Nick Karpov

02/01/2023, 7:01 PM
that's interesting, also surprised this is happening given the scale in your description... i would open a ticket with Databricks support team to get a better understanding
b

Ben Temple

02/07/2023, 4:03 PM
For anybody finding this later it was solved by repartitioning manually on our partition columns before writing out to the Delta table
15 Views