Kevin Lim
09/18/2023, 7:12 AMAlex Wilcoxson
09/18/2023, 4:40 PMKevin Lim
09/18/2023, 9:03 PMIon
09/19/2023, 6:00 PMIon
09/19/2023, 6:12 PMrtyler
09/19/2023, 6:14 PMIon
09/19/2023, 6:16 PMrtyler
09/20/2023, 3:22 AMrtyler
09/20/2023, 3:41 AMEero Lihavainen
09/20/2023, 11:15 AMMatthew Powers
09/21/2023, 3:28 AMHey Matt - resurrecting this thread!
We now have the CI infrastructure set up to allow community members to run the full test suite. Additionally, we’ve taken the Delta Lake PR and rebased it for the author here: https://github.com/dagster-io/dagster/pull/16463
We’re just waiting to hear back from the author as there are a few more things that need to be fixed. If he doesn’t get to it we plan on having our engineers take it over the finish line at the end of October. However, if the author ends up finishing it we will ship it earlier.
Let me know if you still want to do comms on this because we definitely do!
Matthew Powers
09/21/2023, 3:28 AMEero Lihavainen
09/21/2023, 11:25 AMlist_with_offset
with S3 currently push the offset down to the S3 API, or does it fall back to the ObjectStore default implementation that uses a simple list
with post-filtering? Given that list_with_offset
is not defined here: https://github.com/delta-io/delta-rs/blob/a74589be7c39315360925049c716d1d70b906970/rust/src/storage/s3.rs#L470rtyler
09/21/2023, 6:46 PMmain
and one or two fixes in flight. I'm planning to pick up your modularization work after that has been released.Tony Wang
09/21/2023, 7:33 PMTony Wang
09/21/2023, 7:33 PMIon
09/22/2023, 6:23 AMIon
09/22/2023, 7:00 PMJohn Darrington
09/23/2023, 2:27 AMIon
09/23/2023, 1:27 PMIon
09/23/2023, 5:18 PMMatthew Powers
09/23/2023, 11:03 PMIon
09/24/2023, 8:27 PMIon
09/24/2023, 8:43 PMSlackbot
09/25/2023, 11:50 AMrtyler
09/25/2023, 5:41 PMEric Ávila
09/26/2023, 10:47 AMwrite_deltalake()
from delta-rs using Python.
I have set the following env vars:
AWS_S3_LOCKING_PROVIDER=dynamodb
DYNAMO_LOCK_TABLE_NAME=delta_log
AWS_REGION=eu-west-1
AWS_DEFAULT_REGION=eu-west-1
Then I try to write a table
write_deltalake("<s3://my-bucket/tmp/some_delta_table>", my_table, mode="overwrite", schema=data_schema)
And I'm getting the following error from dynamodb:
Generic DeltaS3ObjectStore error: Dynamo Error: Get item error: The provided key element does not match the schema (GetItemError(Validation("The provided key element does not match the schema"))).
ERROR MicroBatchExecution: Query [id = f3781630-6c8b-45c3-b494-06af9c2f893a, runId = 35b19a83-a98c-453b-8242-482ad9dfe406] terminated with error
py4j.Py4JException: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last):
File "/home/pysparkarrow.py", line 67, in write_parquet
write_deltalake("<s3://my-bucket/tmp/some_delta_table>", my_table, mode="overwrite", schema=data_schema)
File "/home/venv/lib/python3.9/site-packages/deltalake/writer.py", line 324, in write_deltalake
_write_new_deltalake(
OSError: Generic DeltaS3ObjectStore error: Dynamo Error: Get item error: The provided key element does not match the schema (GetItemError(Validation("The provided key element does not match the schema"))).
Ion
09/26/2023, 12:11 PMIon
09/26/2023, 7:24 PMIon
09/26/2023, 7:52 PM