Michael Shtelma
04/17/2023, 2:00 PMrtyler
04/18/2023, 3:32 PMarrow
and parquet
modules in our documentation.
I am finding downstream users importing these themselves and not being able to take on new versions of delta-rs without some manual upgrade work. This is especially problematic for downstream consumers who are 2 orders removed from us. Do y'all have opinions on why somebody should be adding both arrow
and deltalake
to their Cargo.toml
?swest
04/18/2023, 6:59 PMwrite_deltalake()
, it seems like we can only specify schemas with pyarrow, which unfortunately doesn't preserve the metadata dictionary in the conversion to a deltalake.schema
.S Thelin
04/19/2023, 12:46 PMdask
to a delta table with delta-rs
?
Will it be able to handle the log properly or will it fail silently?rtyler
04/19/2023, 6:35 PMtarget/
directory for building/testing delta-rs is 3.4GB of stuff 🙀Cole MacKenzie
04/19/2023, 11:22 PMJsonWriter::buffer_len()
always return 0?邓凯
04/20/2023, 10:51 AMrtyler
04/20/2023, 5:18 PMCargo.toml
, e.g. the next API changes should result in 0.10.0
.
• Every two weeks we take main
and assuming we're stable and there is desire (e.g. "I want to consume this feature that landed") we cut a patch release, e.g. 0.9.1, 0.9.2, 0.9.3.Will Jones
04/20/2023, 5:27 PM.devXXXX
suffix. This is similar to how DuckDB operates, and it nice because folks can try out new features and fixes without having to build the packages themselves. https://pypi.org/project/duckdb/#history
This should be completely automated and require no manual intervention (unless there is a CI failure)Cole MacKenzie
04/20/2023, 11:24 PMIan Joiner
04/21/2023, 11:29 PMDavid Blajda
04/23/2023, 2:20 AMinput_file_name
from Spark would be beneficial since it would help with determining the origin a record which can be used to support delete, update, and merge operations.Maico Timmerman
04/25/2023, 9:33 AMrtyler
04/25/2023, 4:41 PMarrow-37
topic branch 🤞Matthew Powers
04/26/2023, 9:40 PMSELECT id1, sum(v1) AS v1 from the_table GROUP BY id1
. Here are my results:
• DuckDB: query errors out, even on the 1e8 dataset (notebook)
• DataFusion: 9.8 seconds 1e8 / 99 seconds 1e9 (notebook)
• Polars: 7.5 seconds 1e8 / 229 seconds 1e9 (notebook)
• Pandas 2.0 with usecols: 19.3 seconds 1e8 / 204 seconds 1e9 (notebook)
I tried a bunch of tweaks to get DuckDB working, but it only worked on the tiny 1e7 dataset, so not sure what I’m doing wrong.
I was really impressed with DataFusion’s performance on the 1e9 dataset. The Polars performance surprised me on the 1e9 dataset. And Pandas 2.0 just seems way better than what it used to be.
I am going to use this dataset and structure some queries that show how Delta Lake can allow for faster query runtimes.rtyler
04/27/2023, 3:52 PMrtyler
04/28/2023, 3:24 PMrtyler
04/28/2023, 3:42 PMMatthew Powers
04/29/2023, 3:59 PMrtyler
04/29/2023, 7:51 PMrtyler
05/02/2023, 3:37 PMHEAD
of main
and release 0.10.0 of the Rust crate today/tomorrow.
Yay 👍 or Nay 👎
@Florian Valeye @misha @Robert @Will Jones @Christian Williams @Thomas Vollmer (Microsoft) @QP Hou @Denny Leertyler
05/02/2023, 11:36 PMDenny Lee
05/02/2023, 11:36 PMrtyler
05/03/2023, 12:12 AMtests/test_fs.py::test_roundtrip_azure_env
keeps timing out for unrelated changes, I'm inclined to just disable the dang thing. It feels like more trouble than it's worth, but then again I'm not using deltalake for Azure 😛rtyler
05/03/2023, 12:13 AMMatthew Powers
05/04/2023, 1:44 PMIan Joiner
05/05/2023, 2:27 PMIan Joiner
05/05/2023, 2:28 PMIan Joiner
05/05/2023, 2:29 PM