Caoduy Vonguyen
05/19/2023, 7:27 AMYousry Mohamed
05/19/2023, 7:55 AMcount
while the other is showString
Actually
df = spark.read.format("delta").load("path").count()
means df holds the count not the DataFrame itself
Also the number of tasks in both screens is really massive. Can you check your delta folder and see how many files are there and how big they are. Could be a tiny file problem.Caoduy Vonguyen
05/19/2023, 8:11 AMYousry Mohamed
05/19/2023, 8:21 AMAssociated SQL Query
in job details page with a hyperlink. Click the hyperlink and it would take you to a page with more details like how many files scanned. total number of cloud API calls, whether there is a shuffle or not, etc.Caoduy Vonguyen
05/19/2023, 8:51 AMYousry Mohamed
05/19/2023, 9:17 AMspark.sql(f"DESC HISTORY delta.`{<your path variable or hard code it here>}`").show()
Caoduy Vonguyen
05/19/2023, 9:42 AM