Seungchan Lee08/01/2023, 2:09 AM
Simon Thelin08/02/2023, 3:39 PM
pg -> s3 -> delta
Seungchan Lee08/03/2023, 3:59 PM
do you mean you use Airbyte to connect
pg -> s3 -> delta
and then have some sort of manual script from
pg -> s3
? Would you mind elaborating a little on the setup?
s3 -> delta
Simon Thelin08/04/2023, 1:22 PM
, so I have to quite a lot of things to make sure it propagates correctly. I unfortunately can’t setup CDC atm due to the PG I am using atm is manually setup so worried what might happen, I would otherwise just go full debezium and then use
which means data I just synced my no longer exist
, and I will move to this at some point. I think Airbyte is quite slow and annoying in general. But it does the job, we are only a team of 3 so it saves me a lot of time right now. I would advice you @Seungchan Lee if you can, use
. This way you will get a very high performant sync directly to delta. Given your postgres is properly setup with IaC and you can easily tear it up and down.
spark streaming/batch jobs, combined with multiple compute sources like
lambda architecture (not aws lambda)
. I hope to be able to move to full kappa soon and ditch airbyte all together
. I will be featured soon in a full article on starburst and happy to explain more about this if it helps you in any way.
potentially use it for salesforce
Seungchan Lee08/04/2023, 3:07 PM
Simon Thelin08/04/2023, 4:06 PM
but keen to try
, I host everything on my own, the only thing I don’t host is trino atm, due to not having enough hours in a day haha
and for BI tools to connect correct, it is very versatile in that sense. Databricks has some SQL engine of their own but last time I checked it was pretty useless. I don’t mind at all for sure.
in your cluster, when you then later submit a job, each job becomes its own small cluster. I mimic what https://www.datamechanics.co/ are doing but I just host it myself.
Seungchan Lee08/04/2023, 11:25 PM