https://delta.io logo
s

Shira Bodenstein

02/14/2023, 12:40 PM
Hi All, I am writing a Java Spark application that writes to a Delta table without any partition. Another application updates and/or deletes rows using a query on one of the fields in the data. I'm using the following Delta API:
Copy code
DeltaTable.forPath(sparkSession, hdfsPath)
                    .alias("oldData")
                    .merge(data.alias("newData"), deltaMergeQuery)
                    .whenMatched()
                    .updateAll()
                    .whenNotMatched()
                    .insertAll()
                    .execute();
The deletion will be in a similar way. Now to the question: Since there are no partitions on the fields on the query, does it mean that Spark will have to open and read all files? Thanks in advance!
j

JosephK (exDatabricks)

02/14/2023, 1:10 PM
It can use some of the file stats such as min and max to do some file skipping, but in general it would do a full scan
s

Shira Bodenstein

02/15/2023, 6:05 AM
Thanks!
4 Views