Ian Joiner
04/13/2023, 3:45 AMIan Joiner
04/13/2023, 3:46 AMSerge Smertin
04/13/2023, 9:42 AMrtyler
04/13/2023, 2:51 PMprimitive("timestamp")
column in the schema because the ParquetReader gives Timestamp(nanosecond)
from the int96 column, but the Arrow schema delta-rs thinks it should be using is Timestamp(microsecond)
, which I believe to be the correct interpretation here.
My gut feel is that this will be the case for parquet files written by Delta/Spark, but I'm not sure where the best place to introduce the necessary conversion will be. Putting it in our writer feels correct but I think we would have other readers not doing the right thing on this type eitherIan
04/13/2023, 5:17 PMtry:
DeltaTable(str(delta_path))
return True
except PyDeltaTableError:
raise NotFoundException(f'Delta Table not found: delta_table_path={delta_path}')
# app_logger.error(f'Delta Table not found: delta_table_path={delta_path}')
# return False
except Exception as e:
app_logger.error(e)
return False
rtyler
04/14/2023, 6:08 AMmain
into the branch, there's one minor thing that's broken from my local build and I think we might be close to being mergableIan Joiner
04/14/2023, 1:51 PMrtyler
04/14/2023, 4:29 PMIan Joiner
04/14/2023, 4:50 PMrtyler
04/14/2023, 4:55 PMIan Joiner
04/14/2023, 4:55 PMIan Joiner
04/14/2023, 4:56 PMrtyler
04/14/2023, 4:56 PMmain
, based on some comments from @Robert I think there may be some datafusion incompatibilities to fix with arrow 37 and we'd be waiting for datafusion 23 there anyways πrtyler
04/14/2023, 4:56 PMIan Joiner
04/14/2023, 4:56 PMrtyler
04/14/2023, 4:56 PMIan Joiner
04/14/2023, 4:57 PMIan Joiner
04/14/2023, 5:33 PMrust-v0.9.0
tag?rtyler
04/14/2023, 5:34 PMrtyler
04/14/2023, 5:44 PMrtyler
04/14/2023, 6:01 PMDenny Lee
04/14/2023, 6:08 PMIn particular, there are three open and competing table formats that bring an ACID layer to data lakes, one of which will likely become the de-facto industry standard eventually. Besides Delta Lake, there are also Apache Iceberg and Apache Hudi, and while they all offer a variety of advanced features and open-source implementations, we've found that the most mature Rust implementation is that of Delta via the delta-rs project. It also has an active and welcoming community of maintainers, so we finally decided to replace our custom storage layer with delta-rs instead.
rtyler
04/14/2023, 6:09 PMarrow-37
topic branch to collaborate on those API changes. I'll spend more time with that over the weekend if I'm able (FYI @Robert)W. Logan Downing
04/14/2023, 6:13 PMCole MacKenzie
04/14/2023, 6:30 PMdelta-rs@0.9.0
and I think I found a regression. https://github.com/delta-io/delta-rs/issues/1291Cole MacKenzie
04/14/2023, 6:31 PMstring
as the column type as well. The resulting parquet file also doesn't happen to include that column.Alex Wilcoxson
04/14/2023, 7:38 PMS Thelin
04/15/2023, 5:04 PM0.8.1
with a local k8s
MinIO
.
I get
thread '<unnamed>' panicked at 'not stream', /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/object_store-0.5.5/src/aws/credential.rs:173:27
File "/Users/simon/Library/Caches/pypoetry/virtualenvs/heureka-MBcEyvac-py3.9/lib/python3.9/site-packages/deltalake/table.py", line 122, in __init__
self._table = RawDeltaTable(
pyo3_runtime.PanicException: not stream
storage_options = {"AWS_ACCESS_KEY_ID": os.environ["AWS_ACCESS_KEY_ID"],
"AWS_SECRET_ACCESS_KEY": os.environ["AWS_SECRET_ACCESS_KEY"],
"AWS_REGION": aws_region,
"AWS_ENDPOINT_URL": "localhost:30000" # I also tried, trino-minio-svc.trino:9000, and host.docker.internal:30000
}
I know my delta table in MinIO works fine because I also use it for Trino, I was keen on testing with MinIO here instead of AWS S3.
DeltaTable(
table_uri=table_uri,
storage_options=self.storage_options,
version=version,
)
Anyone encountered this issue?
I found this open issue: https://github.com/delta-io/delta-rs/issues/809
Seems like an open issue perhaps?Matthew Powers
04/17/2023, 10:58 AMdelta-rs/python/usage.html#loading-a-delta-table
that should be delta-rs/python/usage/loading-a-delta-table
). This seems minor, but itβs actually a major SEO issue.
β’ mdBook provides some nice formatting, navigation, and search options out of the box
Interested in what folks think about this option. After we decide on the framework, we can brainstorm the optimal structure of the docs. I have some ideas that should work well from a readability & SEO perspective.Matthew Powers
04/17/2023, 11:07 AM