https://delta.io logo
r

rtyler

07/07/2023, 12:08 AM
@Robert I know you're aware of the dropping of
AWS_PROFILE
support in the newer
object_store
crate versions. I'm wondering if most people are using it as a means to use permissions from EC2 instances (e.g. IMDSv2). That's how I have used it fwiw If so, then I don't think we don't need to find a way to support
AWS_PROFILE
so much as provide sane defaults for using the metadata service for permissioning of storage access 🤔
👀 1
r

Robert

07/07/2023, 5:12 AM
Good point - I don't have much experience when it comes to AWS. object store has native support for the metadata endpoints as well as web-identities I believe. Not sure what other means are configured via the aws profile ... datafusion CLI has some code where they are using aws profile I think, I'll have a loot at wht they are doing 🙂
r

rtyler

07/07/2023, 6:52 AM
I'm fairly familiar with these AWS credential quirks. I'm going to see if I can get a coworker convinced to explore using
deltalake
in native Python for their project, and if I can, then I'll be able to figure this out on company time 😉
r

Robert

07/07/2023, 5:03 PM
That would of course be excellent! Let me know if it does not work out though, then I may find some time ... 🙂
r

rtyler

07/07/2023, 11:34 PM
I did some tinkering and I have a code snippet which will negotiate an assumed role with IMDSv2 before initializing a table. I'll need to write this up in a quick blog post, but basically there's no changes needed on our side to support this, just a little more footwork ahead of time by the caller(s)
I wrote up this blog post which demonstrates how to fetch temporary credentials from AWS STS for authentication of both Python and Rust deltalake applications without hard-coding credentials or using things like
AWS_PROFILE
1
👍 3
j

Jackson Newhouse

07/21/2023, 9:24 PM
I've recently implemented S3 writes in Arroyo, and the approach I took to get an AmazonS3 with default credentials was to wrap the DefaultCredentialsProvider from rusoto_credential, which you can see here. Could we improve object_store by having it use stronger default behavior?
👋 1
👀 1
r

rtyler

07/21/2023, 9:33 PM
hiya @Jackson Newhouse! I'm curious what you mean by "stronger" default behavior? Do you mean discouraging IAM keys/secrets being slapped in?
j

Jackson Newhouse

07/21/2023, 9:41 PM
Currently I have a local ~/.aws/credentials. rusoto is able to pick that up, in the same way that the awscli does. I believe the removed aws_profile feature in object_store also let this happen. Reading through the various associated issues mentioned at https://github.com/apache/arrow-rs/issues/4556#issuecomment-1646241315, it seems like this is still a live issue. The object_store behavior I would like to target is that object_store::parse_url() would, in the case of AWS, provide default access in the absence of other configuration. Without doing that I'd have to inspect the URL, determine that it is an S3 URL, and then override the credentials, which is a pain.
r

rtyler

07/21/2023, 9:44 PM
Ah, so would it be safe to say that you're seeking a more seamless set of default initialization behaviors for object store?
j

Jackson Newhouse

07/21/2023, 9:56 PM
Yeah, pretty much. If we used the DefaultCredentialsProvider as the final default rather than Instance credential provider, that'd mean non-EC2 boxes wouldn't need to have credentials passed down to them. The default is currently at https://github.com/apache/arrow-rs/blob/6ee30a57e9935ddd3fb7828062e3dfbfacf574a4/object_store/src/aws/mod.rs#L1021-L1035