on my delta tables.
I'm using the scala API to perform those operations. I run this for each table in parallel.
I have often an OOM and I'm suspecting that the jobs run on the driver.
Does anyone have an idea if this is true and if it's the case it's possible to run them on the executors.
02/28/2023, 10:31 AM
from my experience, the actual deletion of files using
runs single-threaded on the driver whereas the file-listing (finding files to delete) runs on the executors
runs on the executors
02/28/2023, 10:43 AM
Thank you for response!
So in this case we can't distribute the vacuum operation. In my use case as a job of maintenance where I need to run it in all tables it's a problem to give the needed resources to perform this on local parallel way.
Do you know if the community is willing to implement the vacuum run on executors on future releases?