https://delta.io logo
s

S Thelin

07/10/2023, 5:28 AM
Has anyone considered using redis as a lock state db instead of dynamodb? I am working on a big personal project where I am building everything in native k8s. I like the dynamodb solution but I thought maybe I could achieve the same thing just via redis? And maybe somebody already done it. Maybe you already thought about that @User.
r

rtyler

07/10/2023, 2:25 PM
@S Thelin while I haven't considered redis explicitly, @John Darrington has a need for a PostgreSQL lock. Just yesterday I was returning to our work in https://github.com/delta-incubator/locking-object-store to make a quality abstraction here for locking.
👀 1
s

S Thelin

07/10/2023, 2:29 PM
Ah interesting, would redis not be a good solution here? I was naively thinking it is pretty light weight, and I was playing around writing the state myself from my parallel tasks. So it does not have a direct connection with delta-rs framework. What you have there looks better than what I am trying to do at the moment.
r

rtyler

07/10/2023, 3:30 PM
redis is a good cheap locking mechanism IMHO if you've got it available. The objective for this repository is to kind of hide/abstract the details of locking with the
ObjectStore
trait so that a user can just create the
ObjectStore
and let it handle the locking transparently without needing to deal with a bunch of custom code in delta-rs
s

S Thelin

07/10/2023, 4:41 PM
Sounds great, I will follow this one closely, will fiddle around a bit with redis and see if I can share anything from that.
j

John Darrington

07/10/2023, 5:06 PM
That would be great. I’ve had the same thought as I’ve worked on the Postgres one when I’ve had the chance
s

S Thelin

07/11/2023, 7:51 AM
@rtyler @John Darrington @Matthew Powers I have made a very stupid python example now with
Copy code
from redis import StrictRedis, lock
With No Redis Lock input
Copy code
df1 = DataFrame({'id': [1]})
df2 = DataFrame({'id': [2]})
df3 = DataFrame({'id': [3]})
df4 = DataFrame({'id': [4]})
Fed via
joblib
to ensure to try to mimic a multi process situation. Here you can see it writes the physical, files, but only 3 commits in the delta log.
With Redis Lock And with my lock, they appear.
I have written a small python class with redis, which wraps the
write_deltalake
so it is not as nice as yours @rtyler but it works from what I can see. I am tempted to make a quick pypi library where I just expose
write_redis_lock_deltalake
or similar.
It is very stupid right now and just creates a lock for any write not a specific table, which I would need to modify for as well.
Upper is without lock, as you can see the order is random. In the second with the lock it forces it to wait but it is still not in order so there is nothing forcing the order of the write itself, but they all appear.
Again this is a super stupid test I did now would need to stress test it a bit more. but looks quite promising at glance.
2 Views