jabbera opened a new issue, #46085: URL: https://github.com/apache/arrow/issues/46085
### Describe the bug, including details regarding any error messages, version, and platform. TLDR: Trying to write a dataset using a user delegated sas token will always fail with a 403 error. User delegated sas tokens in Azure have no account level permissions. This limits their ability to get container level properties. This causes the check here: https://github.com/apache/arrow/blob/968721b0898457b03d4eebd18a6fdb3156c53423/cpp/src/arrow/filesystem/azurefs.cc#L2223-L2224 to always return a 403 error. I think the appropriate thing to do here would be to assume the container exists but am happy to be provided an alternate approach. This is the approach the adlfs fsspec implementation has [taken]( https://github.com/fsspec/adlfs/blob/adb9c53b74a0d420625b86dd00fbe615b43201d2/adlfs/spec.py#L178-L183). I'm motivated to resolve this and have contributed in the past (https://github.com/apache/arrow/pull/45706 and https://github.com/apache/arrow/pull/45759). I'm just looking for some direction before I create the PR. Raised Error: ``` An error occurred using pyarrow AzureFileSystem: GetProperties for 'https://abd1234.blob.core.windows.net/lab?se=2025-04-17T03%3A38%3A20Z&sig=2agAW7B47PRDp%2B%2B1qXmdn7TPQ%2BC9Sj5LYf/FbMMgxxs%3D&ske=2025-04-17T03%3A38%3A20Z&skoid=4d2c13a8-9cee-4848-b4bc-d1c6c2ced41b&sks=b&skt=2025-04-10T03%3A38%3A20Z&sktid=337b9f7b-9e69-4689-9b0d-3417bd3d8566&skv=2025-05-05&sp=racwdlmeop&sr=c&sv=2025-05-05' failed. Azure Error: [AuthorizationFailure] 403 This request is not authorized to perform this operation. This request is not authorized to perform this operation. ``` Reproduction below: Create the HNS enabled storage account specified at: STORAGE_ACCOUNT_NAME. (Same issue with a blob account however. This repro is just written against HNS). Assign Storage Blob Data Owner RBAC role to the account to whatever Entra ID account you are using. Wait 10-15 minutes for good measure. below script can be run with : uv run or you can pip install the deps listed at the top into a venv. ``` # /// script # dependencies = [ # "adlfs>=2024.12.0", # "pandas>=2.1.0", # "pyarrow>=19", # "azure-storage-blob>=12.25.1", # "azure-storage-file-datalake>=12.20.0", # "azure-identity>=1.21.0", # "numpy>=1.24.0", # ] # /// from adlfs import AzureBlobFileSystem import pyarrow as pa import pyarrow.fs as fs import pyarrow.dataset as ds import numpy as np from datetime import datetime, timedelta, timezone from azure.identity import DefaultAzureCredential from azure.storage.filedatalake import ( generate_file_system_sas, DataLakeServiceClient, FileSystemSasPermissions, ) # Generate Random table to write: # Define the size of the dataset rows = 1 # Adjust the number of rows cols = 10 # Number of columns # Generate random data data = {f"col_{i}": np.random.rand(rows) for i in range(cols)} # Create a PyArrow Table table = pa.Table.from_pydict(data) # Generate User Delegated Sas Token STORAGE_ACCOUNT_NAME = "abd1234" CONTAINER_NAME = "lab" FILE_LOCATION = f"{CONTAINER_NAME}/personal/michael.barry/temp/random_dataset/" dl_client = DataLakeServiceClient( account_url=f"https://{STORAGE_ACCOUNT_NAME}.dfs.core.windows.net", credential=DefaultAzureCredential(), ) TOKEN_TIME_TO_LIVE = 7 * 24 * 60 * 60 start_time = datetime.now(timezone.utc) - timedelta( hours=1 ) # start in the past to avoid any clock skew issues end_time = start_time + timedelta(seconds=TOKEN_TIME_TO_LIVE) user_delegation_key = dl_client.get_user_delegation_key(start_time, end_time) all_permissions = FileSystemSasPermissions( read=True, write=True, delete=True, list=True, add=True, create=True, move=True, execute=True, manage_ownership=True, manage_access_control=True, ) sas_token = generate_file_system_sas( STORAGE_ACCOUNT_NAME, CONTAINER_NAME, credential=user_delegation_key, expiry=end_time, permission=all_permissions, ) # write to pyarrow azure_fs = fs.FileSystem.from_uri(f"abfs://{CONTAINER_NAME}@{STORAGE_ACCOUNT_NAME}.dfs.core.windows.net/?{sas_token}")[0] try: ds.write_dataset( table, FILE_LOCATION, format="parquet", filesystem=azure_fs, existing_data_behavior="overwrite_or_ignore", ) except Exception as e: print(f"An error occurred using pyarrow AzureFileSystem: {e}") print("Writing with adlfs instead to show sas token works fine") write_fs = AzureBlobFileSystem( account_name=STORAGE_ACCOUNT_NAME, sas_token=sas_token, ) ds.write_dataset( table, FILE_LOCATION, format="parquet", filesystem=write_fs, existing_data_behavior="overwrite_or_ignore", ) # If the dataset exists, the read works fine so we know the sas_token is okay print(ds.dataset( FILE_LOCATION, format="parquet", filesystem=azure_fs, ).to_table().to_pandas().shape) print("Read Successful") ``` ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org