https://delta.io logo
#dat
Title
m

Matthew Powers

05/09/2023, 12:25 PM
Here’s my proposal: • update basic_partitioned to not include any null partition values • update multi_partitioned to not include any null partition values • Add basic_partitioned_with_null (which will be the existing basic_partitioned just renamed) • Add multi_partitioned_with_null (just the existing multi_partitioned renamed)
w

Will Jones

05/09/2023, 1:46 PM
This gets at an interesting question I was asking myself when writing these: Should the DAT test be like unit tests where they try to only test one thing at a time? Or should we keep them consolidated and emphasize the edge cases (like this one)?
I was initially thinking the second one, since that sounds more like acceptance tests to me, and I didn’t think we necessarily wanted DAT to be a replacement for unit tests. But willing to be wrong on this. I think the only danger though is the unit test route means we will have many more tables.
m

Matthew Powers

05/09/2023, 1:54 PM
Ah interesting. From the “connector writer perspective”, I think treating DAT like unit tests is nice. I am working on the Dask connector and it’s going to be so nice to be able to pass all these pre-made Delta tables off to that community. I think they’ll be really happy to see that the Dask connector can read most partitioned Delta tables. It’ll be good to highlight the edge case deficiency, but good to show them that the “standard case” is working. I am open to your push back on this one.
w

Will Jones

05/09/2023, 2:03 PM
My only two hesitations on treating them as unit tests will be: 1. It worry will mean a lot more tables. Mostly that means it might be a little challenging to navigate. But also generating them with PySpark isn’t super fast. It’s like 0.5 - 2 sec per table IIRC which adds up after a while. 2. That’s a lot of work to set them all up. From delta-rs perspective, we have a good enough unit test suite. We are much more interested in integration tests. So I’m not sure how many DAT users (downstream devs) will be wanting to contribute unit test like tests. I’d welcome input from the devs on other connectors though
m

Matthew Powers

05/09/2023, 2:09 PM
@Will Jones - Those seem like valid counterpoints and I’ve switched to your line of thinking, haha. Can we just add one “basic partitioned” example? That’ll be the only exception I ask for. I like the idea of generally keeping the reference tables as “challenging edge cases” and preventing an unnecessary proliferation of reference tables.
w

Will Jones

05/09/2023, 2:12 PM
That’s fair. We can add one without nulls
m

Matthew Powers

05/09/2023, 2:14 PM
Great, I will add a PR. I will also add a little documentation to the README to explain how DAT reference tables are more like acceptance tests and aren’t meant to be a full replacement for unit tests.