jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557425441

   Thank you @kevinjqliu , just try to clear my head a little bit
   
   > I think a potential solution might be to omit the "region" property and 
allow the S3FileSystem to determine the proper region using resolve_s3_region. 
This is recommended in the [S3FileSystem 
docs](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html)
 for region.
   
   Is the change I made in accordance with this option? What I've done 
essentially is using the `netloc` to determine the bucket region. Only in case 
when, for some reason, the region cannot be determined then we fall back to the 
`properties` configuration.
   
   > Another potential issue is the way we cache fs, it assumes that there's 
only one fs per scheme. With the region approach above, we break this 
assumption.
   
   Please correct me if I miss something for how the fs cache works. But here 
is my understanding:
   
   I see we use `lru_cache`, so it should cache one fs for each different 
bucket since they will have different `netloc` and thus a different key in the 
cache. Previously, it looks like we only have one cached fs. It seems relates 
to the `netloc` not being used. As a result, `netloc` is not connected with the 
`client_kwargs["region"]` configuration. In this case, even two cache keys 
point to two fs instances, the two fs instances are still of the same region 
(the one configured in `properties`).
   
   I think solving the `netloc` issue will also resolve the cache issue as the 
`lru_cache` key now links with the region and will return the correct instance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to