Hi, what is the recommended way to 'read' a subset of data from a Delta Table and either display the results in a notebook or use them in a DataFrame for further computation? Ideally, the chosen approach would utilize metadata statistics & partitioning (where available) to achieve optimal performance.
a) spark.read.table("MyTable").filter("created_date > '2020-07-01'").display()
b) spark.sql("select * from MyTable where created_date > '2020-07-01'").display()
c) DeltaTable.forName(spark, "MyTable").toDF().filter("created_date > '2020-07-01'").display()
Or does it not matter, and behind the scenes, they all result in the same processing logic being executed by Spark?
04/20/2023, 8:54 AM
the core processing will be the same
there might be some slight difference when reading the metadata/path but that should be negligible