https://delta.io logo
d

dhia Gharsallaoui

02/28/2023, 10:25 AM
👋 Hello, team! I have a CRON spark job that run
compaction
and
vacuum
on my delta tables. I'm using the scala API to perform those operations. I run this for each table in parallel. I have often an OOM and I'm suspecting that the jobs run on the driver. Does anyone have an idea if this is true and if it's the case it's possible to run them on the executors. Thank you!
g

Gerhard Brueckl

02/28/2023, 10:31 AM
from my experience, the actual deletion of files using
vacuum
runs single-threaded on the driver whereas the file-listing (finding files to delete) runs on the executors
optimize
runs on the executors
d

dhia Gharsallaoui

02/28/2023, 10:43 AM
Thank you for response! So in this case we can't distribute the vacuum operation. In my use case as a job of maintenance where I need to run it in all tables it's a problem to give the needed resources to perform this on local parallel way. Do you know if the community is willing to implement the vacuum run on executors on future releases?
g

Gerhard Brueckl

02/28/2023, 12:05 PM
no idea, sorry
m

Martin

02/28/2023, 3:12 PM
@dhia Gharsallaoui please open a Feature Request for this: https://github.com/delta-io/delta/issues
👍 1
d

dhia Gharsallaoui

02/28/2023, 4:46 PM
3 Views