NikitaMatskevich opened a new pull request, #2661:
URL: https://github.com/apache/iceberg-python/pull/2661

   <!--
   Thanks for opening a pull request!
   -->
   
   <!-- In the case this PR will resolve an issue, please replace 
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
   <!-- Closes #${GITHUB_ISSUE_ID} -->
   
   # Rationale for this change
   
   We are using default credential pipeline to get access to Azure (more 
concretely, [managed 
identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview)).
 We found out that fsspec library [only allows it if we set 
anon=False](https://github.com/fsspec/adlfs/blob/main/adlfs/spec.py#L357-L367) 
and specify the account name.
   
   Thus, the anon property is added to pyiceberg config of the file io.
   
   ## Are these changes tested?
   
   We've tested that this works with the following snippet:
   ```
   import os
   from fsspec import AbstractFileSystem
   from pyiceberg.io.fsspec import FsspecFileIO
   from pyiceberg.catalog.rest import RestCatalog
   from typing import Any
   
   ADLS_ANON = "adls.anon"
   ADLS_CONNECTION_STRING = "adls.connection-string"
   ADLS_ACCOUNT_NAME = "adls.account-name"
   ADLS_ACCOUNT_KEY = "adls.account-key"
   ADLS_SAS_TOKEN = "adls.sas-token"
   ADLS_TENANT_ID = "adls.tenant-id"
   ADLS_CLIENT_ID = "adls.client-id"
   ADLS_CLIENT_SECRET = "adls.client-secret"
   ADLS_ACCOUNT_HOST = "adls.account-host"
   
   Properties = dict[str, Any]
   
   def my_adls(properties: Properties) -> AbstractFileSystem:
       from adlfs import AzureBlobFileSystem
   
       for key, sas_token in {
           key.replace(f"{ADLS_SAS_TOKEN}.", ""): value for key, value in 
properties.items() if key.startswith(ADLS_SAS_TOKEN)
       }.items():
           if ADLS_ACCOUNT_NAME not in properties:
               properties[ADLS_ACCOUNT_NAME] = key.split(".")[0]
           if ADLS_SAS_TOKEN not in properties:
               properties[ADLS_SAS_TOKEN] = sas_token
   
       return AzureBlobFileSystem(
           connection_string=properties.get(ADLS_CONNECTION_STRING),
           anon=properties.get(ADLS_ANON),
           account_name=properties.get(ADLS_ACCOUNT_NAME),
           account_key=properties.get(ADLS_ACCOUNT_KEY),
           sas_token=properties.get(ADLS_SAS_TOKEN),
           tenant_id=properties.get(ADLS_TENANT_ID),
           client_id=properties.get(ADLS_CLIENT_ID),
           client_secret=properties.get(ADLS_CLIENT_SECRET),
           account_host=properties.get(ADLS_ACCOUNT_HOST),
       )
   
   injected_file_io = FsspecFileIO(properties={ADLS_ANON: False, 
ADLS_ACCOUNT_NAME: "usagestorageprod"})
   injected_file_io.get_fs = lambda scheme: my_adls(injected_file_io.properties)
   
   CATALOG_URI = "https://lakehouse...";
   
   catalog_config = {
       "uri": CATALOG_URI,
       "properties": {
           "io-impl": "pyiceberg.io.fsspec.FsspecFileIO",
       },
       ...
   }
   
   catalog = RestCatalog("lakehouse", **catalog_config)
   catalog.file_io = injected_file_io
   
   table = catalog.load_table("some_ns.some_table")
   table.io = injected_file_io
   table.scan(snapshot_id=xxx).count()
   ```
   
   ## Are there any user-facing changes?
   
   Zero breaking changes
   
   <!-- In the case of user-facing changes, please add the changelog label. -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to