kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557381909

   @jiakai-li Thanks for working on this! And happy holidays :) 
   
   > I noticed the PyArrowFileIO._initialize_fs function doesn't take netloc 
parameter into account when initialize the S3FileSystem
   
   looking through the usage for `_initialize_fs`, it doesnt look like `netloc` 
is used at all. 
   
   > it always uses the region found in properties
   
   I think that's one of the problems we need to tackle. The current S3 
configuration requires a specific "region" to be set. This assumes that all 
data and metadata files are from the same region as the one specified. But what 
if i have some files in one region and some in another? 
   
   I think a potential solution might be to omit the "region" property and 
allow the S3FileSystem to determine the proper region using 
`resolve_s3_region`. This is recommended in the [S3FileSystem 
docs](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html)
 for `region`. 
   
   Another potential issue is the way we cache fs, it assumes that there's only 
one fs per scheme. With the region approach above, we break this assumption. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to