https://delta.io logo
r

rtyler

05/07/2023, 4:29 AM
Looking for a gut check here, I'm working with S3 Bucket Notifications, they come in with a bucket name and the object created (e.g. the full prefix and file name of a put object). I am wondering if it would be a safe assumption to make that the first non-partition segment of object key should be where
_delta_log/
goes. E.g. •
some/path/to/a/prefix/alpha.parquet
, the
_delta_log/
should be in
some/path/to/a/prefix/
•
/some/path/ds=2023-05-05/site=<http://delta.io/beta.parquet|delta.io/beta.parquet>
, the
_delta_log/
should be in
some/path/
Are there instance where this assumption might not hold true?
r

Robert

05/07/2023, 7:58 AM
I guess in the most general sense, the log and data files could reside in separate buckets. delta-rs does not support this yet, but the protocol allows for fully qualified paths in add actions. One can also configure random file prefixes for data files, which replace partition information. Here I am not sure if this would manifest as a prefix to the filename, or as a new segment separated via the separator.
šŸ‘ 1
w

Will Jones

05/07/2023, 9:07 PM
Not sure the exact use case, but if you are watching for changes in delta tables would it make more sense to watch for keys that contain _`delta_log`
r

rtyler

05/07/2023, 9:19 PM
@Robert I had missed the "full URL" part of the protocol, just double checked the add action definition. In my use-case @Will Jones there is no
_delta_log
to watch for. Rather oxbow is intended to take locations with arbitrary
.parquet
files and convert it into delta io .
šŸ‘ 1