https://delta.io logo
f

Filippo Vecchiato

03/21/2023, 9:59 AM
Hi team, I ran an optimize + vacuum operation on a unpartitioned table in rust. The table size approximately doubled in size after the optimize/vacuum. Is it an expected behaviour? thanks!
k

kox

03/21/2023, 10:15 AM
I was wondering the same question. Last week I got similar behavior while I was experimenting with
delta-rs
.
r

Robert

03/21/2023, 10:51 AM
Just to clarify. The data files representing the new version are twice as large as the data files representing the old version?
… in total size across all files.
f

Filippo Vecchiato

03/21/2023, 10:59 AM
Yes that is correct
Even in the log, the files written after the optimize action are about double in size as the files marked for deletion
w

Will Jones

03/21/2023, 2:53 PM
I assume the original files are written by something else? I wonder if we are using a different compression algorithm (or maybe forgot to set compression settings?)
f

Filippo Vecchiato

03/21/2023, 3:19 PM
The original files are written using a recordBatchWriter, I used the RecordBatch imported from deltalake crate actually. I was also wondering if different compressions are supported actually, i can see Snappy is somewhat hardcoded atm
w

Will Jones

03/21/2023, 3:26 PM
Noted on the compression configuration: https://github.com/delta-io/delta-rs/issues/1235
🙌 1
If you have time to create example code that reproduces it, I would welcome a Github issue. Definitely something we’ll want to look into.
f

Filippo Vecchiato

03/21/2023, 3:34 PM
Sounds good, let me finish some other work and will open an issue with example code, thanks!
2 Views