Omer Ozsakarya03/13/2023, 6:27 AM
Yousry Mohamed03/13/2023, 7:36 AM
, then you will get something different each time an action is executed on the dataframe (unless being cached).
Omer Ozsakarya03/13/2023, 2:11 PM
Dominique Brezinski03/13/2023, 5:39 PM
if row oriented, so A) the split across workers doesn't matter if operating on a single DataFrame (or table) B) joins are deterministic so long as the join keys are deterministic, so the resulting rows will be the same when the inputs are the same in the resulting DataFrame (see A).
Yousry Mohamed03/13/2023, 9:26 PM
and the inputs don’t change across different invocations, then the function will produce same result. I took a chance of your question and wrote a post on caching yesterday that may be a bit relevant 🙂 https://yousry.medium.com/back-to-basics-spark-caching-key-ideas-789be2b04ebd
Christopher Grant03/14/2023, 8:03 PM