Hi All,
I am processing around 130 GB of delta table in EMR on EKS with fargate(Spark Version 3.3.2 and delta 2.2) and it took around 6 hrs for processing. In processing, I selected only a few columns and wrote into another delta table with partitionby on one column with the help of merge function. I tested this job with 16vCPU and 80 GB memory of driver and executor and the number of executors set to 50. So could you please suggest a few optimization techniques which help me to reduce run time.
Thanks in advance!
#delta-community #delta-oss #deltalake-questions