Will Jones
01/06/2023, 5:07 AMDenny Lee
01/08/2023, 8:59 PMMatthew Powers
01/09/2023, 9:02 PMpip install deltalake
?Matthew Powers
01/09/2023, 9:02 PMrtyler
01/11/2023, 12:17 AMrtyler
01/11/2023, 5:21 PMJim Hibbard
01/11/2023, 8:24 PMwithout_files
parameter?
2. What's the best way to get a reference to the filesystem object being used by a DeltaTable if I want to pass it to another method but with the same configuration / credentials?Matthew Powers
01/12/2023, 1:20 PMRobert
01/12/2023, 11:37 PMcargo clippy --fix
, and I like it 😄.Herry
01/13/2023, 6:19 PMRyan Johnson
01/13/2023, 7:57 PMStringArray
full of json into an Array
or Table
of nested data. This surprises me, becuase pyarrow.json.read_json does exactly the right thing... but only for line-delimited json files. At least, I didn't see anything e.g. in pyarrow.compute and a google search came up empty. Am I missing something obvious here?Will Jones
01/19/2023, 4:39 AMIan Joiner
01/19/2023, 10:52 PMMatthew Powers
01/20/2023, 1:20 PMMatthew Powers
01/21/2023, 2:11 PMsrikanth sharma boddupalli
01/22/2023, 4:47 AMGrainne B
01/23/2023, 12:55 AM{PyDeltaTableError('Failed to read delta log object: Generic DeltaS3ObjectStore error: Atomic rename requires a LockClient for S3 backends. Either configure the LockClient,
or set AWS_S3_ALLOW_UNSAFE_RENAME=true to opt out of support for concurrent writers.')}
However, the fix seems to just be allowing to pass AWS_S3_ALLOW_UNSAFE_RENAME
as an option.
This can cause issues with concurrent writes, as a lock is not acquired before renaming.
I was hoping that the solution would involve being able to set a lock on the bucket, and releasing the lock after a write was performed.
Does the above solution mean that concurrent writes are not supported/advised with delta lake ?
Keen to hear how others are approaching this issue 🙏Grainne B
01/23/2023, 6:55 AMMatthew Powers
01/23/2023, 9:24 PMMaks Lyzhkov
01/24/2023, 1:22 PMdeltalake.PyDeltaTableError: Failed to read delta log object: Generic GCS error: Error performing copy request test_delta/_delta_log/_commit_0025723a-ff2f-4bb6-9f7f-7c80f95fd5b9.json.tmp: response error
......
<p>POST requests require a <code>Content-length</code> header. <ins>That's all we know.</ins>
to reproduce it:
import os
import pyarrow as pa
import deltalake
storage_options = {"SERVICE_ACCOUNT": os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")}
pylist = [{'n_legs': 2, 'animals': 'Flamingo'}, {'year': 2021, 'animals': 'Centipede'}]
table = pa.Table.from_pylist(pylist)
deltalake.write_deltalake("<gs://some_bucket/test_delta>", table, storage_options=storage_options)
Will Jones
01/26/2023, 3:04 AMMatthew Powers
01/26/2023, 11:43 PMMatthew Powers
01/26/2023, 11:44 PMMatthew Powers
02/03/2023, 5:28 PMrtyler
02/03/2023, 11:59 PMMatthew Powers
02/05/2023, 4:35 PMWill Jones
02/05/2023, 8:32 PMMatthew Powers
02/17/2023, 2:09 PMMatthew Powers
02/17/2023, 2:19 PMMatthew Powers
02/21/2023, 4:12 PMDeltaTable("../rust/tests/data/simple_table", version=2)
• delta-spark: DeltaTable.forPath(spark, "/path/to/table")
- no version argument available
Are there any implications of this difference we should think about?