martin-traverse opened a new issue, #47353: URL: https://github.com/apache/arrow/issues/47353
### Describe the enhancement requested Hi - great to see the new Azure FileSystem implementation which completes the set with AWS and GCP. We're looking at switching our platform to use the Arrow native implementation for Azure, instead of going through FS Spec. Running locally everything works great. What doesn't work is workload identity credentials. We use that in CI (GitHub Actions with OIDC) and it works perfectly in fsspec / AzureBlobFileSystem with no special configuration. After some digging, the issue appears to be that the fsspec implementation uses the Azure Python SDK to handle credentials, while Arrow goes through the Azure C++ SDK with a lower level of control of the over the credentials mechanism. I can also see that in fsspec / Python SDK a credentials chain is being used, not sure if that has been set up in the Arrow implementation. Is it possible to get workload identity credentials added to the AzureFileSystem implementation? Best I can tell all the pieces do exist in the Azure C++ SDK and there seem to be references to workload identity in the Arrow C++ code as well. Is there a way to plug it together and expose in the Python API? There was a similar thing a while back with GCP and it got resolved quite quickly once support was available in the GCP libraries: https://github.com/apache/arrow/issues/34595 In case it is helpful, here are some logs from our CI running fsspec / AzureBlobFileSystem with OIDC from GitHub Actions: tracdap.rt._plugins.storage_azure.AzureBlobStorageProvider - Using [default] credentials mechanism azure.identity.aio._credentials.environment - No environment configuration found. azure.identity.aio._credentials.managed_identity - ManagedIdentityCredential will use IMDS azure.identity._credentials.environment - No environment configuration found. azure.identity._credentials.managed_identity - ManagedIdentityCredential will use IMDS tracdap.rt._impl.core.storage.CommonFileStorage - INIT [tracdap_ci_storage_setup]: Common file storage, fs = [abfs], impl = [fsspec], root = [tracdap-ci-storage/] azure.identity.aio._credentials.chained - DefaultAzureCredential acquired a token from AzureCliCredential And this is what happens with the same setup using Arrow's own AzureFileSystem: tracdap.rt._plugins.storage_azure.AzureBlobStorageProvider - Using [default] credentials mechanism tracdap.rt._impl.core.storage.CommonFileStorage - INIT [tracdap_ci_storage_setup]: Common file storage, fs = [abfs], impl = [arrow], root = [tracdap-ci-storage/] -- snip -- prior_stat: pa_fs.FileInfo = self._fs.get_file_info(resolved_path) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^ File "pyarrow/_fs.pyx", line 615, in pyarrow._fs.FileSystem.get_file_info File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowException: Unknown error: Check for Hierarchical Namespace support on 'https://******.blob.core.windows.net/tracdap-ci-storage' failed: N5Azure4Core11Credentials23AuthenticationExceptionE: Failed to get token from DefaultAzureCredential. This is how the account looks in CI: { "environmentName": "AzureCloud", "homeTenantId": "***", "id": "***", "isDefault": true, "managedByTenants": [], "name": "****", "state": "Enabled", "tenantId": "***", "user": { "name": "***", "type": "servicePrincipal" } } And this is what I have locally, which works fine as you would expect: { "environmentName": "AzureCloud", "homeTenantId": "****, "id": "****, "isDefault": true, "managedByTenants": [], "name": "****", "state": "Enabled", "tenantDefaultDomain": "*****.onmicrosoft.com", "tenantDisplayName": "Default Directory", "tenantId": "****", "user": { "name": "****", "type": "user" } } It looks like managed identities are a bit better supported than workload identities, so we're going to test that in our sandbox environment. We can manage with static secrets in CI if need be, but it would be good to get workload identity working. If there's anything I can do to help, please let me know! ### Component(s) C++, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
