https://delta.io logo
s

Sajal Singhal

08/31/2023, 5:24 AM
Hi Team Delta, I want to understand, on what happens when we perform Vacuum on a Delta table, as per below scenario 🤔🤔🤔🤔 Scenario - I have a Delta Table of 1TB with only 20GB as active and remaining as inactive and current cluster with 1 Driver and 2 Executor (Executor having 8GB memory each and 4 cores each) I want to perform Vacuum once a day but this should not impact my Job processing performance. What shall I do? Shall i Increase the cluster size a or Its is not required?
@Matthew Powers @Dominique Brezinski @Gerhard Brueckl can anyone of you please help
j

JosephK (exDatabricks)

08/31/2023, 11:32 AM
You shouldn't set up a cluster this way. It's better to scale up a single node rather than have 2 tiny nodes.
s

Sajal Singhal

08/31/2023, 12:49 PM
Hi @JosephK (exDatabricks) Spinning up a new cluster for vacuum is not possible, we have to perform it on the same cluster, That is why want to confirm if I can allocate a dedicated thread to this process without scaling up the cluster, and it it won't impact my current processing
can anyone please help on this
j

JosephK (exDatabricks)

08/31/2023, 4:46 PM
My comment had nothing to do with vacuuming. You should never set up a cluster that way for any process.
g

Gerhard Brueckl

08/31/2023, 6:40 PM
If I understood correctly, you simply want to vacuum a Delta table which is also constantly written to. In that case you can simply run VACUUM from another cluster and it will have no impact on your processing performance.
d

Dominique Brezinski

09/01/2023, 3:13 PM
Why do you have the constraint it has to be run on the same cluster?
j

JosephK (exDatabricks)

09/01/2023, 8:11 PM
Don't have permission to spin up another cluster 😄