Seungchan Lee
08/01/2023, 2:09 AMSimon Thelin
08/02/2023, 3:39 PMpg -> s3 -> delta
atm.Seungchan Lee
08/03/2023, 3:59 PMpg -> s3 -> delta
do you mean you use Airbyte to connect pg -> s3
and then have some sort of manual script from s3 -> delta
? Would you mind elaborating a little on the setup?Simon Thelin
08/04/2023, 1:22 PMwhich means data I just synced my no longer exist
, so I have to quite a lot of things to make sure it propagates correctly.
I unfortunately can’t setup CDC atm due to the PG I am using atm is manually setup so worried what might happen, I would otherwise just go full debezium and then use kafka-delta-ingest
, and I will move to this at some point.
I think Airbyte is quite slow and annoying in general. But it does the job, we are only a team of 3 so it saves me a lot of time right now.
I would advice you @Seungchan Lee if you can, use debezium
and then kafka-delta-ingest
. This way you will get a very high performant sync directly to delta. Given your postgres is properly setup with IaC and you can easily tear it up and down.lambda architecture (not aws lambda)
spark streaming/batch jobs, combined with multiple compute sources like dask, polars
.
I hope to be able to move to full kappa soon and ditch airbyte all together potentially use it for salesforce
.
I will be featured soon in a full article on starburst and happy to explain more about this if it helps you in any way.Seungchan Lee
08/04/2023, 3:07 PMSimon Thelin
08/04/2023, 4:06 PMstrimzi
but keen to try red panda
, I host everything on my own, the only thing I don’t host is trino atm, due to not having enough hours in a day hahadelta connector
and for BI tools to connect correct, it is very versatile in that sense. Databricks has some SQL engine of their own but last time I checked it was pretty useless.
I don’t mind at all for sure.spark operator
in your cluster, when you then later submit a job, each job becomes its own small cluster.
I mimic what https://www.datamechanics.co/ are doing but I just host it myself.Seungchan Lee
08/04/2023, 11:25 PM