https://delta.io logo
m

Matthew Powers

03/24/2023, 11:12 AM
Here’s a draft blog post on How to create and append to Delta Lake tables with pandas. Feel free to respond in the thread or DM me with your email if you’d like permissions to comment on the doc. This post highlights the ability to overwrite specific partitions of Delta tables using pandas in the new release. I put this feature in a broader context and set the blog post title so that it’ll rank well in SEO searches and get more traffic over time. I plan to publish this one on April 5th!
👀 2
g

Gareth Western

03/24/2023, 4:13 PM
Thanks for sharing this Matthew! Just an observation: I notice the library doesn't create checkpoints in the transaction log. Is this a known issue?
m

Matthew Powers

03/24/2023, 4:57 PM
It should be creating a checkpoint in the transaction log every 10 transactions, right?
g

Gareth Western

03/24/2023, 5:11 PM
That's correct: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints I didn't see any when i made a simple test looping and calling write_deltalake in "append" mode (99 transactions)
n

Nick Karpov

03/24/2023, 7:52 PM
@Gareth Western yup this is known, https://github.com/delta-io/delta-rs/issues/913
looks like it's undocumented but it is possible to manually trigger checkpoint creation from python bindings, https://github.com/delta-io/delta-rs/blob/main/python/deltalake/table.py#L429-L430
1
y

Yousry Mohamed

03/25/2023, 2:21 AM
Probably worth mentioning how to get started by
pip install deltalake
👍 1
g

Gareth Western

03/25/2023, 3:31 PM
Thanks @Nick Karpov. We'll test it out 🙂
2 Views