Does anyone happen to know any spark tricks that can be used to influence on what workers/in memory partitions the scan/read 'stages' put induvidual records? Even though neither data sources/data frames have hard partitions per se, it would be nice to be able to co-locate based on key columns on read to avoid exchange/shuffle etc later on when doing joins or using window functions. I suppose the command repartition(partitionExprs : Column)* is the closest but even that happens after the data is read if I'm understanding correctly.