Omer Ozsakarya
03/13/2023, 6:27 AMYousry Mohamed
03/13/2023, 7:36 AMcurrent_timestamp()
, then you will get something different each time an action is executed on the dataframe (unless being cached).Omer Ozsakarya
03/13/2023, 2:11 PMDominique Brezinski
03/13/2023, 5:39 PMconcat_ws
if row oriented, so A) the split across workers doesn't matter if operating on a single DataFrame (or table) B) joins are deterministic so long as the join keys are deterministic, so the resulting rows will be the same when the inputs are the same in the resulting DataFrame (see A).Yousry Mohamed
03/13/2023, 9:26 PMconcat_ws
and the inputs don’t change across different invocations, then the function will produce same result. I took a chance of your question and wrote a post on caching yesterday that may be a bit relevant 🙂 https://yousry.medium.com/back-to-basics-spark-caching-key-ideas-789be2b04ebdChristopher Grant
03/14/2023, 8:03 PM