https://delta.io logo
r

rtyler

05/06/2023, 12:50 AM
@Will Jones @Florian Valeye @Matthew Powers while watching ☝️ wait for GitHub Actions to finish, I am feeling more and more like it is time for us to put the python bindings in its own repository so that it can be developed more independently from the Rust bindings. Curious what y'all think
m

Matthew Powers

05/06/2023, 12:56 AM
I know @Andy Grove has the Rust Arrow DataFusion code in arrow-datafusion and the Python code in arrow-datafusion-python, so he might be able to weigh in on the pros/cons of this approach. I don’t personally know enough either way to have an opinion.
w

Will Jones

05/06/2023, 3:27 AM
If it’s CI that’s the concern we can have fewer of the Python jobs run on code that only touches Rust
I think having them in the same repo but separate packages and release cycles is working fine otherwise. But let me know if there’s other points of friction that will make this worth it
Idk mostly just hate to have to go through the work of upgrading the rust dependency in Python bindings on my own. I know it’s a little annoying for DataFusion upgrades, though.
a

Andy Grove

05/06/2023, 12:33 PM
The main motivation for us moving to two repos was that almost none of the contributors have Python/pyo3 skills, so it was frustrating for the Rust devs to make changes that broke the Python API and then could not fix the issue themselves. For us, having two repos and separate release cycles is working well.
m

Matthew Powers

05/06/2023, 5:40 PM
One thing that would be nice would be having a “deltalake” repo with a Python-only README. Having the repo name and PyPi package name in alignment would be more conventional. This would also allow us to de-emphasize the Rust implementation detail of the project (I don’t think most pandas users care that this is written in Rust).
👍 1
w

Will Jones

05/06/2023, 5:51 PM
I can modify the PyPI GitHub link to point to the Python-specific README. I agree for end-users the code being Rust is a detail we don’t need to emphasize. But for contributors and folks who want to browse the code (that’s more our GitHub audience), that’s a more important detail 🙂 IMO the most important consideration on whether to split the repo is whether the folks working on each part are a distinct group of people. Right now I think there is a lot of overlap in who works in the Python bindings and who works on the core library. If that eventually changes and we have two distinct groups of devs, then I’m all for splitting. But IMO right now is too early.
👍 2
r

Robert

05/07/2023, 6:25 AM
I agree with @Will Jones here. Adding to his argument, right now we are moving more and more into higher level operations which potentially will involve significant API changes on the rust and python side and may also involve quite a bit of data movement across language barriers. At least for me it is easier to keep this in sync within the same repo.
👍 1
2 Views