Hi, Can Delta Lake be the backend of the online web system, providing the function of filtering and querying datasets? how much latency can generally achieve?
r
rtyler
04/07/2023, 2:23 AM
it really comes down to how large the data is, how many data files there are, and what the access patterns are.
at the end of the day, try to remember that delta lake is basically parquet files and
.json
transaction log files sitting in an object store/file system
j
junhui huang
04/07/2023, 3:17 AM
Thank you very much for the reminder.
1. We have 20 million datasets planned to be stored in the Delta Lake Table Dataset, and the web system needs to filter based on the dataset uuid or project_id or other information to query the dataset details.
2. Another usage pattern is to query the images meta information via the dataset uuid.
In this use case, is it appropriate to directly use Delta Lake as the backend for web system storage? how to shorten the response latency of a single request from Delta Lake?
j
Jim Hibbard
04/07/2023, 7:58 AM
I'll echo rtyler with the "it depends", but depending on access patterns you can do some frontend caching of your data with something like IndexedDB. If access patterns are predictable you can send a buffer of data to your frontend.
j
JosephK (exDatabricks)
04/07/2023, 11:28 AM
I think it's important for you to know that delta lake is a file protocol and not a backend or query engine.