Simon Thelin

09/15/2023, 12:35 PM
Is there anyone here who have built some realtime database on top of delta tables? I don’t want to use apache druid due to it being a bit too heavy and it does not have native delta support.

Dominique Brezinski

09/15/2023, 1:07 PM
You need to quantify “realtime.”

Simon Thelin

09/18/2023, 2:29 PM
Something close to realtime, ms latency for micro batch streaming, similar to apache druid when connecting to kafka topics, but that has a native delta connector.
So for example we use trino to query delta tables, but trino is not great for ms latency. So the alternative would be for example to do a dual sink and stream out to both a kafka topic and a delta table, and then handle low latency ms querying on topics rather than a delta table, but it would be nice to consolidate it all to delta tables if possible. I did this in the past with apache druid to fulfil “realtime” dashboards but still keep the data available in delta tables for other BI and applications etc for long time storage since a topic is not meant to be a long time storage.
Apache Pinot for example looks interesting as an alternative to druid.