https://delta.io logo
b

Beni

04/20/2023, 8:44 AM
Hi, what is the recommended way to 'read' a subset of data from a Delta Table and either display the results in a notebook or use them in a DataFrame for further computation? Ideally, the chosen approach would utilize metadata statistics & partitioning (where available) to achieve optimal performance. a) spark.read.table("MyTable").filter("created_date > '2020-07-01'").display() b) spark.sql("select * from MyTable where created_date > '2020-07-01'").display() c) DeltaTable.forName(spark, "MyTable").toDF().filter("created_date > '2020-07-01'").display() Or does it not matter, and behind the scenes, they all result in the same processing logic being executed by Spark? Thank you
g

Gerhard Brueckl

04/20/2023, 8:54 AM
the core processing will be the same there might be some slight difference when reading the metadata/path but that should be negligible
👍 2
b

Beni

04/21/2023, 10:38 PM
Good to know. Thank you. Much appreciated.
55 Views