https://delta.io logo
a

Ajex

03/10/2023, 8:04 AM
Hello everyone. Recently, we have received a data stream from Kafka, which contains several fields in the form of text with very large lengths. I was wondering if anyone has any solutions to optimize storage for this data.
j

JosephK (exDatabricks)

03/10/2023, 11:46 AM
Parquet, which Delta uses, supports compression. Snappy is the default, but you can change this when you save the files or at the cluster level. https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option
1
a

Ajex

03/13/2023, 6:39 AM
thank you
3 Views