https://delta.io logo
a

Anton Abilov

05/31/2023, 4:24 PM
Is there a way to set
maxBytesPerTrigger
dynamically? I would like for my Spark streaming job to process large batches (i.e. 50GB) when there is a lot of data to backfill, however when it’s caught up it should process smaller batches.
j

JosephK (exDatabricks)

05/31/2023, 5:25 PM
Not a delta lake question. Doesn't available now address this? https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
a

Anton Abilov

06/01/2023, 1:21 PM
My bad, I guess it’s more related to Spark — but was hoping delta streaming might have a solution for this. From the docstring of availableNow:
Copy code
availableNow : bool, optional
    if set to True, set a trigger that processes all available data in multiple batches then terminates the query. Only one trigger can be set.
This is a continuous query with new data arriving — we don’t want it to terminate
j

JosephK (exDatabricks)

06/01/2023, 11:53 PM
For streaming, the only thing I've seen is the delta live tables autoscaler or roll your own.