kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896072489


##########
pyiceberg/io/pyarrow.py:
##########
@@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc: 
Optional[str] = None) -> FileSyste
             if force_virtual_addressing := 
self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING):
                 client_kwargs["force_virtual_addressing"] = 
property_as_bool(self.properties, force_virtual_addressing, False)
 
+            # Override the default s3.region if netloc(bucket) resolves to a 
different region

Review Comment:
   nit: what do you think of moving this closer to where `region` is set? 
easier to debug in the future



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1394,7 +1399,6 @@ def __init__(
     ) -> None:
         self._table_metadata = table_metadata
         self._io = io
-        self._fs = _fs_from_file_path(table_metadata.location, io)  # TODO: 
use different FileSystem per file

Review Comment:
   :)



##########
tests/io/test_pyarrow.py:
##########


Review Comment:
   I saw a way to set up multiple minio endpoints and pretend that they are in 
different regions. This will require us to override s3 endpoint per "region" 
   i.e. port 9001 is us-east-1, port 9002 is us-east-2. 
   
   I think its too tedious and doesn't help us much in terms of testing 



##########
tests/io/test_pyarrow.py:
##########
@@ -360,10 +360,11 @@ def test_pyarrow_s3_session_properties() -> None:
         **UNIFIED_AWS_SESSION_PROPERTIES,
     }
 
-    with patch("pyarrow.fs.S3FileSystem") as mock_s3fs:
+    with patch("pyarrow.fs.S3FileSystem") as mock_s3fs, 
patch("pyarrow.fs.resolve_s3_region") as mock_s3_region_resolver:

Review Comment:
   nit: maybe if `s3.region` is set in the config, we just use it and dont 
override the region. what do you think?



##########
tests/io/test_pyarrow.py:
##########
@@ -2074,3 +2076,34 @@ def 
test__to_requested_schema_timestamps_without_downcast_raises_exception(
         _to_requested_schema(requested_schema, file_schema, batch, 
downcast_ns_timestamp_to_us=False, include_field_ids=False)
 
     assert "Unsupported schema projection from timestamp[ns] to timestamp[us]" 
in str(exc_info.value)
+
+
+def test_pyarrow_file_io_fs_by_scheme_cache() -> None:
+    pyarrow_file_io = PyArrowFileIO()
+    us_east_1_region = "us-eas1-1"

Review Comment:
   ```suggestion
       us_east_1_region = "us-east-1"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to