Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-06 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2573265511 Thank you @jiakai-li for the contribution and @Fokko for the review :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-06 Thread via GitHub
kevinjqliu merged PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-05 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903446304 ## pyiceberg/io/pyarrow.py: ## @@ -351,77 +351,141 @@ def parse_location(location: str) -> Tuple[str, str, str]: return uri.scheme, uri.netloc,

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-05 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903376358 ## pyiceberg/io/pyarrow.py: ## @@ -351,76 +344,146 @@ def parse_location(location: str) -> Tuple[str, str, str]: return uri.scheme, uri.netloc, f

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-05 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903363474 ## pyiceberg/io/pyarrow.py: ## @@ -351,76 +344,146 @@ def parse_location(location: str) -> Tuple[str, str, str]: return uri.scheme, uri.netloc, f

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-05 Thread via GitHub
Fokko commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903352592 ## pyiceberg/io/pyarrow.py: ## @@ -190,13 +190,6 @@ T = TypeVar("T") -class PyArrowLocalFileSystem(pyarrow.fs.LocalFileSystem): Review Comment: People cou

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-05 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2571735953 https://github.com/apache/iceberg-python/pull/1485 to replace `pycln` with ruff linter -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-04 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2571396659 > somethings going on with github runner, `make lint` works locally https://github.com/apache/iceberg-python/actions/runs/12612627362/job/35150820385?pr=1453 I saw that as

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-04 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2571394282 somethings going on with github runner, `make lint` works locally https://github.com/apache/iceberg-python/actions/runs/12612627362/job/35150820385?pr=1453 -- This is an au

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-04 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903135543 ## pyiceberg/io/pyarrow.py: ## @@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: def _initialize_fs(self, scheme: str, netl

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902287419 ## pyiceberg/io/pyarrow.py: ## @@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: def _initialize_fs(self, scheme: str, net

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-03 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902277899 ## pyiceberg/io/pyarrow.py: ## @@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]: def _initialize_fs(self, scheme: str, netl

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2025-01-03 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902219038 ## mkdocs/docs/configuration.md: ## @@ -102,21 +102,21 @@ For the FileIO there are several configuration options available: -| Key | E

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-29 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251 ## pyiceberg/io/pyarrow.py: ## @@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste "region"

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-29 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251 ## pyiceberg/io/pyarrow.py: ## @@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste "region"

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-29 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251 ## pyiceberg/io/pyarrow.py: ## @@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste "region"

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-28 Thread via GitHub
Fokko commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068498 ## pyiceberg/io/pyarrow.py: ## @@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes( if self._limit is not None and total_row_count >

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-28 Thread via GitHub
Fokko commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068498 ## pyiceberg/io/pyarrow.py: ## @@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes( if self._limit is not None and total_row_count >

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-28 Thread via GitHub
Fokko commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068195 ## pyiceberg/io/pyarrow.py: ## @@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste "region": ge

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-28 Thread via GitHub
Fokko commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068165 ## pyiceberg/io/pyarrow.py: ## @@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste "region": ge

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2560549414 This PR is ready for review now. Thanks very much and merry christmas! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896150261 ## pyiceberg/io/pyarrow.py: ## @@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste if force_vi

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896132680 ## pyiceberg/io/pyarrow.py: ## @@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste if force_vir

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896130636 ## tests/io/test_pyarrow.py: ## @@ -360,10 +360,11 @@ def test_pyarrow_s3_session_properties() -> None: **UNIFIED_AWS_SESSION_PROPERTIES, } -

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896126623 ## pyiceberg/io/pyarrow.py: ## @@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes( if self._limit is not None and total_row_cou

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896125712 ## pyiceberg/io/pyarrow.py: ## @@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes( if self._limit is not None and total_row_co

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896124443 ## tests/io/test_pyarrow.py: ## Review Comment: I think testing `PyArrowFileIO.fs_by_scheme` is good enough. in the unit test, maybe mention https://git

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896119481 ## tests/io/test_pyarrow.py: ## Review Comment: Yes, I thought about that as well, essentially I tried to set up another service like `minio-ap-southeast

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896072489 ## pyiceberg/io/pyarrow.py: ## @@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste if force_vi

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-23 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2560234773 > According to what I found here seems fsspec doesn't have the same issue as pyarrow. So I guess we can leave it? wow thats interesting, i didn't know about that. I like t

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-22 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2558702444 > BTW there are 2 FileIO implementations, one for pyarrow, another for fsspec. > > We might want to do the same for fsspec > > https://github.com/apache/iceberg-pyt

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-22 Thread via GitHub
jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1895102443 ## tests/io/test_pyarrow.py: ## Review Comment: I did some more search but I think [this comment](https://github.com/minio/minio/discussions/20683#discus

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1894520472 ## pyiceberg/io/pyarrow.py: ## Review Comment: Make doc changes as well. https://py.iceberg.apache.org/configuration/#s3 ## pyiceberg/io/p

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557893390 BTW there are 2 FileIO implementations, one for pyarrow, another for fsspec. We might want to do the same for fsspec https://github.com/apache/iceberg-python/blob/dbcf65b

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557894124 Sweet, I'll go ahead with this approach then. Thanks very much @kevinjqliu ! -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557893079 > I did some search and seems in terms of s3 scheme, the format is s3:///. The netloc parsed from urlparse (essentially passed to the _initialize_fs call) then points to the buc

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557880783 Thank you @kevinjqliu , can I have some more guidance on this please? > Im dont think netloc can be used to determine the region. S3 URI scheme doesn't use netloc, only S3

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557750804 > Can I tackle on this issue as well if there is no one working on it? I don't think anyone's working on it right now, feel free to pick it up. -- This is an automated

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557749844 > Is the change I made in accordance with this option? What I've done essentially is using the netloc to determine the bucket region. Only in case when, for some reason, the reg

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557480803 > BTW theres a similar issue in #1041 Can I tackle on this issue as well if there is no one working on it? -- This is an automated message from the Apache Git Service. To

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557425441 Thank you @kevinjqliu , just try to clear my head a little bit > I think a potential solution might be to omit the "region" property and allow the S3FileSystem to determine

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1894176293 ## tests/io/test_pyarrow.py: ## @@ -381,10 +382,11 @@ def test_pyarrow_unified_session_properties() -> None: **UNIFIED_AWS_SESSION_PROPERTIES, }

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557381909 @jiakai-li Thanks for working on this! And happy holidays :) > I noticed the PyArrowFileIO._initialize_fs function doesn't take netloc parameter into account when initial

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-19 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2556106456 Hey @kevinjqliu , I hope you are ready for the Christmas time :-) After some investigation, I noticed the `PyArrowFileIO._initialize_fs` function doesn't take `netloc` para