https://delta.io logo
k

Ketan Khairnar

03/02/2023, 9:12 AM
Hi Team ๐Ÿ‘‹๐Ÿฝ While doing cost analysis I started thinking about below
Change Data Feed (CDF) includes the row data along with metadata indicating whether the specified row was inserted, deleted, or updated.
This effectively doubles storage for tables with CDF enabled. is that correct?
j

JosephK (exDatabricks)

03/02/2023, 12:07 PM
It depends on how much data changes. If you change the entire table twice, it will triple your storage. CDF isnโ€™t some magic process that will give you extra features for free. Storage is fairly inexpensive in the modern world and itโ€™s likely not something to focus on.
๐Ÿ‘ 2
d

Dominique Brezinski

03/02/2023, 4:23 PM
However if it is mostly appends with few updates, the storage cost of CDF is negligible. Really depends on the operation types and volume
๐Ÿ‘ 1
k

Ketan Khairnar

03/03/2023, 5:03 AM
thanks @JosephK (exDatabricks) that makes sense.
๐Ÿ˜„ 1
@Dominique Brezinski in case of inserts too - there would be CDF entries so can you please explain this in detail.
However if it is mostly appends with few updates, the storage cost of CDF is negligible. Really depends on the operation types and volume
I agree it's low when there is low change rate whether inserts/updates ( 2x here)/deletes
d

Dominique Brezinski

03/03/2023, 2:39 PM
Nope, if it is only inserts in a transaction, no data is replicated in CDF. Under the covers it just adds the additional metadata to each record as you read it through the CDF interface. This is because the inserts are already isolated in their own files in the transaction. It is only when a transaction also includes updates and deletes that CDF has to replicate the modified data.
2 Views