Sukumar Nataraj

04/28/2023, 4:12 AM
Hi Team, I am facing one weird issue, When am running MERGE sql inside Spark Structured Streaming foreachbatch it takes about 300+ seconds to complete. But when I ran it as independent sql in spark-shell it gets completes in just 25 seconds. Almost facing 10X degrade when am running inside my batch. But both cases am registering and using my batch df as physical table. Anyone faced like this issue before?.

Grainne B

04/28/2023, 9:43 AM
Could it be parallelizing the data ?

Sukumar Nataraj

05/02/2023, 4:14 PM
how to parallelize the dataframe in merge sql?. am creating temp view in spark with current microbatch. And using this temp view against my main table in MERGE command.