https://delta.io logo
d

Daniel Bariudin

01/15/2023, 12:38 PM
Hey everybody, Is there a way to define the TBLPROPERTIES with '.option' ? Something like this:
Copy code
df.write.format("delta")
 .mode("overwrite")
 .option("TBLPROPERTIES", "key1=value1, key2=value2") <-- HERE
 .save("./")
I know there is a way to update TBLPROPERTIES with the following code:
Copy code
alter_table_query = f"ALTER TABLE delta.`hdfs:///{table_path}` SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 1 days')"

spark.sql(alter_table_query)
but I want to add this as part of streaming write.
l

Lennart Skogmo

01/15/2023, 2:02 PM
Some properties are reachable through options atleast.
Copy code
df.write.format("delta") \  .option("delta.autoOptimize.optimizeWrite , "true") \  .option("delta.autoOptimize.autoCompact", "true") \
.saveAsTable(table)
d

Daniel Bariudin

01/15/2023, 3:44 PM
Do you know where I can find the list of properties that I can initialize with .option? Can I initialize
delta.logRetentionDuration
?
o

Omkar

01/17/2023, 8:28 AM
Hey Daniel, I think you're looking for table properties documentation. Hope you'll find the below links useful. If you're using Delta Lake (open source), you can refer to this doc: https://docs.delta.io/latest/table-properties.html If you're using Databricks Delta Lake, you can refer to this doc: https://docs.databricks.com/delta/table-properties.html
āœ… 1
d

Daniel Bariudin

01/18/2023, 9:24 AM
Hey @Omkar The links you provided are for the list of table properties in general. I searching the list of table properties that I can initialize using the
.option()
When I tiring to initialize the properties like this:
Copy code
df.write.format("delta") \ 
 .option("delta.autoOptimize.optimizeWrite" , "true") \ <---This way
 .saveAsTable(table)
the TBLPROPERTIES don't change.
o

Omkar

01/18/2023, 9:44 AM
@Daniel Bariudin This particular table property
delta.autoOptimize.optimizeWrite
is available in Databricks Delta and not available in Delta Lake (open source). You'll have to check which one you're using and plan accordingly.
@Daniel Bariudin Regarding setting the other table properties (like
delta.logRetentionDuration
for example), you can try setting them in your Spark Session Config as follows:
Copy code
from pyspark.sql import SparkSession

spark = SparkSession \
  .builder \
  .appName("...") \
  .master("...") \
  .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
  .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
  .config("spark.databricks.delta.properties.defaults.logRetentionDuration", "interval 7 days") \
  .getOrCreate()
Benefit of setting these properties in the Spark Session config is that you can set them once and then use in all Spark dataframes referring to that Spark Session. Also this should work with streaming writes since properties are set at Spark Session level. For more info and other ways of setting these properties, you can refer to this section: https://docs.delta.io/latest/delta-batch.html#table-properties
d

Daniel Bariudin

01/18/2023, 10:25 AM
Hey @Omkar I tried your solution and although it compiles web I run
SHOW TBLPROPERTIES
I don't see the propertyā˜¹ļø More details of what I have done: 1. initialize spark-session the same way you showed above 2. Reading existing Delta table 3. Writing the data to the same table 4. Running
SHOW TBLPROPERTIES
to see the property - don't see the property Do I missing something? Have any other ideasšŸ™?
o

Omkar

01/18/2023, 10:29 AM
@Daniel Bariudin Can you post the Spark Session initialization config you're using and the output of
SHOW TBLPROPERTIES
here? Let's check it out.
d

Daniel Bariudin

01/18/2023, 10:39 AM
@Omkar Here are the configs:
Copy code
spark = SparkSession.builder.appName("test")\
        .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")\
        .config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.delta.catalog.DeltaCatalog")\
        .config("spark.databricks.delta.properties.defaults.logRetentionDuration", "interval 7 days")\
        .getOrCreate()
and after writing to the Delta table the output is :
Copy code
+----------------------+-----+
|key                   |value|
+----------------------+-----+
|delta.minReaderVersion|1    |
|delta.minWriterVersion|2    |
+----------------------+-----+
o

Omkar

01/18/2023, 10:45 AM
@Daniel Bariudin do you have an already existing Delta table or you're creating a new one? Asking because the default table properties only work on new delta tables created in the Spark session where you set the properties.
If you've already existing Delta tables, you can try the following: ā€¢ For each existing delta table: ā—¦ Execute the spark sql with the table properties, example:
ALTER TABLE tablename SET TBLPROPERTIES ('delta.logRetentionDuration' = 'interval 7 days')
ā—¦ Confirm the property is set with
SHOW TBLPROPERTIES tablename
ā€¢ Then you can run your streaming writes on the tables.
d

Daniel Bariudin

01/18/2023, 10:57 AM
@Omkar I working on existing Delta table I this case , if I create new table and initialize it with delta table properties( that I add to spark session) , can I change the property in the future, in the same way but with existing table? I assume that on table create I can add default property (like delta.logRetentionDuration", "interval 7 days" ) but I want to insure it can be changed. **The problem with running streaming write is that I getting MetadataChangedException ( I make concurrent write and table property change, and I don't want to shut down the writing)
o

Omkar

01/18/2023, 11:03 AM
Yes correct @Daniel Bariudin, it will be a good idea to not change table properties on the tables which are actively running in streaming write mode, as Delta doesn't allow that (this is mentioned in this doc in the Note section that I shared above). Quoting the Note section here for your reference:
Note
ā€¢ Modifying a Delta table property is a write operation that will conflict with other concurrent write operations, causing them to fail. We recommend that you modify a table property only when there are no concurrent write operations on the table.
Yes your assumption is correct, the new tables created will have the table properties that you'll set in the Spark Session config. You can change them any time in the future with the
ALTER TABLE tablename SET TBLPROPERTIES (...)
query - only do this when no other write operation is being performed on that particular Delta table. Hope this answers all your questions!
d

Daniel Bariudin

01/18/2023, 11:05 AM
@Omkar Thank you very much!!!!!!!!!!!!!!!!!
šŸ˜ 1
šŸ‘šŸ¼ 1
7 Views