https://delta.io logo
g

Gurunath

01/30/2023, 5:09 PM
Hi Everyone, Good Day, Recently got a chance to implement Delta-sharing protocol specification in Python using amazing deltalake and FastAPI python packages. Thanks to detailed Protocol specification documentation combined with deltalake python package capabilities makes implementing delta-sharing protocol in python a breeze Kudos to the delta-sharing community for wonderful documentation on REST protocol specification ! Looking forward to collaborate and improve this implementation . 🙂 (Tried this delta-sharing protocol for Iceberg Table format as well - super alpha state ) Blog Post: https://guruengineering.substack.com/p/lakehouse-sharing Github: https://github.com/rajagurunath/lakehouse-sharing
👀 2
🔥 3
🙌 3
s

Shane Torgerson

01/30/2023, 7:11 PM
Nice. What would you say some improvements/differences in your library vs the reference implementation?
My main complaint with the reference implementation is that you cant assign permissions. The token gives access to the entire server from what I've observed.
m

Matthew Powers

01/30/2023, 7:22 PM
Cool, thanks for sharing. Looks like a really cool project.
🙌 1
gratitude thank you 1
g

Gurunath

01/30/2023, 7:59 PM
Hi @Shane Torgerson Thanks for the A2A Here are the few difference between this implementation (python based) and reference implementation (Scala based): • You are absolutely right, reference implementation uses single token to interact with all the APIs available, this Python based implementation has Authentication (using JWT tokens) and Authorization mechanisms (using custom implemented Tables) - planning to improve Authorization mechanism by integrating with PyCasbin for more advanced RBAC access • Reference implementation Supports CDF (change Data Feed) or /changes api, thereby clients like Spark Streaming can read the changes in every micro-batch and replicate the data in near real time. this /changes API was not supported yet in python based implementation. Need some help from the community on how to get change data feed in python based deltalake package. • Python based implementation was in super alpha state, need to test lot of things (currently tested only with AWS S3 need to test with other cloud provider’s, storage bucket add more test cases etc .. ) Hope this helps, Please let me know if you need any further information ! Cheers !!
6 Views