https://delta.io logo
l

Li Sun

06/13/2023, 9:27 PM
Anyone has experience of persisting data from SQS to Delta in S3? I am considering SQS -> Kinesis -> Spark Streaming or use SQS -> Lambda + Python (delta-rs). Which one do you think is more appropriate? We just need append to the table and I am not sure if delta-rs can handle concurrent write to a table out of the box?
r

rtyler

06/13/2023, 9:36 PM
SQS -> Lambda works quite well, so long as you don't have strict ordering requirements that is fine
l

Li Sun

06/13/2023, 9:45 PM
Thanks, it simplified lots of steps, I will do a POC on that
m

Matthew Powers

06/13/2023, 10:08 PM
You might find this blog from @Nick Karpov helpful: https://delta.io/blog/2023-04-06-deltalake-aws-lambda-wrangler-pandas/
๐Ÿ‘€ 1
l

Li Sun

06/13/2023, 10:25 PM
Thanks, I am following this article and found that the deltalake python package jumped from 56MB to 100+MB from version 0.8 to 0.9, so I guess there is no way to use 0.9 if I want to use the lambda layer solution
๐Ÿ˜ 1
๐Ÿ˜ฑ 1
k

Kees Duvekot

06/13/2023, 11:01 PM
I found streaming from Kinesis not really handy ... Because it can only run continuously (last time I tried) ...
We had AWS DMS as a source ... But that can now use Kafka as a target ... So we are now working on that part
n

Nick Karpov

06/14/2023, 2:09 AM
@Li Sun oof thatโ€™s crappy!! Iโ€™ll take a look. I think it may be under size limit if you do the container approach for lambda though if you want to try that for now
l

Li Sun

06/14/2023, 1:43 PM
Yeah, thanks. I will try the container approach
r

rtyler

06/14/2023, 1:44 PM
yeah that ballooning in size is interesting to me as well. FWIW writing Rust lambdas is pretty simple too. I ... actually have code which I have open sourced which is exactly what you're looking for ๐Ÿ˜†
๐Ÿ‘€ 2
The only missing component which I have not made a decision on how I want to handle is whether to have the Lambda do schema inference from the table and validate messages coming from SQS based off of that
l

Li Sun

06/14/2023, 1:55 PM
I never used Rust before, but I am interested to see your solution glitch crab. Please share the repo once you finish
๐Ÿ‘ 1