Hi! About deletion vectors: I just noticed that a Databricks job that I assumed would not take very long due thanks to deletion vectors actually took more than the configured timeout period. It does seem that logic of the type "DELETE FROM table WHERE id IN (SELECT id FROM another_table)" triggers rewriting of the modified parquet-files, instead of using deletion vectors. I confirmed this with some extra testing on Databricks runtime 13.2. Logic of the type "DELETE FROM table WHERE id > 100" however uses deletion vectors and does not rewrite any parquet files. So: Is there a way to enable deletion vectors also for the more complex type of delete-operations that use filtering logic of "WHERE id IN (...)"? Thanks!
08/21/2023, 4:08 PM
interesting, can you please report this through Databricks support? they can take a closer look
08/22/2023, 5:35 AM
Thanks Nick, I sent the question also to Databricks