03/21/2023, 8:13 PM
Hi everyone, I am used to work woth architectures where I use adf to ingest data(database, sftp, s3, etc.) and databricks to process data. Is there an existing architecture where databricks is used for ingestion and process and can somebody share a link or picture?

Lennart Skogmo

03/21/2023, 8:19 PM
I think any information about using spark to do ETL more or less applies. Databricks also has the autoloader in addition. But when using spark to do ETL, its probably smart to plan a head to be able to make lightweigth use of other ETL tools to produce files to start from in cases where you are unable to connect directly with spark.


03/21/2023, 8:22 PM
Yeah good one, I did not think about that. Adf has probably more connectors than spark. Thx.

Dominique Brezinski

03/21/2023, 10:37 PM
We built a multi-protocol frontend that aggregates data to S3, and use auto-loader to feed into our Spark-based ETL pipeline. That is actually the original use-case for Auto-loader. THe way things are going is more like Kafka-ingest--a service layer that gets data from sources and direct commits to Delta Lake tables, and then Spark-based processing for deeper transformations, relations, and aggregations.