kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2573265511
Thank you @jiakai-li for the contribution and @Fokko for the review :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
kevinjqliu merged PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@i
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903446304
##
pyiceberg/io/pyarrow.py:
##
@@ -351,77 +351,141 @@ def parse_location(location: str) -> Tuple[str, str,
str]:
return uri.scheme, uri.netloc,
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903376358
##
pyiceberg/io/pyarrow.py:
##
@@ -351,76 +344,146 @@ def parse_location(location: str) -> Tuple[str, str,
str]:
return uri.scheme, uri.netloc, f
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903363474
##
pyiceberg/io/pyarrow.py:
##
@@ -351,76 +344,146 @@ def parse_location(location: str) -> Tuple[str, str,
str]:
return uri.scheme, uri.netloc, f
Fokko commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903352592
##
pyiceberg/io/pyarrow.py:
##
@@ -190,13 +190,6 @@
T = TypeVar("T")
-class PyArrowLocalFileSystem(pyarrow.fs.LocalFileSystem):
Review Comment:
People cou
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2571735953
https://github.com/apache/iceberg-python/pull/1485 to replace `pycln` with
ruff linter
--
This is an automated message from the Apache Git Service.
To respond to the message,
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2571396659
> somethings going on with github runner, `make lint` works locally
https://github.com/apache/iceberg-python/actions/runs/12612627362/job/35150820385?pr=1453
I saw that as
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2571394282
somethings going on with github runner, `make lint` works locally
https://github.com/apache/iceberg-python/actions/runs/12612627362/job/35150820385?pr=1453
--
This is an au
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1903135543
##
pyiceberg/io/pyarrow.py:
##
@@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]:
def _initialize_fs(self, scheme: str, netl
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902287419
##
pyiceberg/io/pyarrow.py:
##
@@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]:
def _initialize_fs(self, scheme: str, net
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902277899
##
pyiceberg/io/pyarrow.py:
##
@@ -352,7 +352,7 @@ def parse_location(location: str) -> Tuple[str, str, str]:
def _initialize_fs(self, scheme: str, netl
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1902219038
##
mkdocs/docs/configuration.md:
##
@@ -102,21 +102,21 @@ For the FileIO there are several configuration options
available:
-| Key | E
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251
##
pyiceberg/io/pyarrow.py:
##
@@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
"region"
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251
##
pyiceberg/io/pyarrow.py:
##
@@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
"region"
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251
##
pyiceberg/io/pyarrow.py:
##
@@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
"region"
Fokko commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068498
##
pyiceberg/io/pyarrow.py:
##
@@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes(
if self._limit is not None and total_row_count >
Fokko commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068498
##
pyiceberg/io/pyarrow.py:
##
@@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes(
if self._limit is not None and total_row_count >
Fokko commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068195
##
pyiceberg/io/pyarrow.py:
##
@@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
"region": ge
Fokko commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899068165
##
pyiceberg/io/pyarrow.py:
##
@@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
"region": ge
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2560549414
This PR is ready for review now. Thanks very much and merry christmas!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to G
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896150261
##
pyiceberg/io/pyarrow.py:
##
@@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
if force_vi
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896132680
##
pyiceberg/io/pyarrow.py:
##
@@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
if force_vir
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896130636
##
tests/io/test_pyarrow.py:
##
@@ -360,10 +360,11 @@ def test_pyarrow_s3_session_properties() -> None:
**UNIFIED_AWS_SESSION_PROPERTIES,
}
-
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896126623
##
pyiceberg/io/pyarrow.py:
##
@@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes(
if self._limit is not None and total_row_cou
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896125712
##
pyiceberg/io/pyarrow.py:
##
@@ -1508,7 +1512,7 @@ def _record_batches_from_scan_tasks_and_deletes(
if self._limit is not None and total_row_co
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896124443
##
tests/io/test_pyarrow.py:
##
Review Comment:
I think testing `PyArrowFileIO.fs_by_scheme` is good enough. in the unit
test, maybe mention https://git
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896119481
##
tests/io/test_pyarrow.py:
##
Review Comment:
Yes, I thought about that as well, essentially I tried to set up another
service like `minio-ap-southeast
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896072489
##
pyiceberg/io/pyarrow.py:
##
@@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
if force_vi
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2560234773
> According to what I found here seems fsspec doesn't have the same issue as
pyarrow. So I guess we can leave it?
wow thats interesting, i didn't know about that. I like t
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2558702444
> BTW there are 2 FileIO implementations, one for pyarrow, another for
fsspec.
>
> We might want to do the same for fsspec
>
>
https://github.com/apache/iceberg-pyt
jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1895102443
##
tests/io/test_pyarrow.py:
##
Review Comment:
I did some more search but I think [this
comment](https://github.com/minio/minio/discussions/20683#discus
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1894520472
##
pyiceberg/io/pyarrow.py:
##
Review Comment:
Make doc changes as well.
https://py.iceberg.apache.org/configuration/#s3
##
pyiceberg/io/p
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557893390
BTW there are 2 FileIO implementations, one for pyarrow, another for fsspec.
We might want to do the same for fsspec
https://github.com/apache/iceberg-python/blob/dbcf65b
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557894124
Sweet, I'll go ahead with this approach then. Thanks very much @kevinjqliu !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log o
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557893079
> I did some search and seems in terms of s3 scheme, the format is
s3:///. The netloc parsed from urlparse (essentially
passed to the _initialize_fs call) then points to the buc
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557880783
Thank you @kevinjqliu , can I have some more guidance on this please?
> Im dont think netloc can be used to determine the region. S3 URI scheme
doesn't use netloc, only S3
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557750804
> Can I tackle on this issue as well if there is no one working on it?
I don't think anyone's working on it right now, feel free to pick it up.
--
This is an automated
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557749844
> Is the change I made in accordance with this option? What I've done
essentially is using the netloc to determine the bucket region. Only in case
when, for some reason, the reg
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557480803
> BTW theres a similar issue in #1041
Can I tackle on this issue as well if there is no one working on it?
--
This is an automated message from the Apache Git Service.
To
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557425441
Thank you @kevinjqliu , just try to clear my head a little bit
> I think a potential solution might be to omit the "region" property and
allow the S3FileSystem to determine
kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1894176293
##
tests/io/test_pyarrow.py:
##
@@ -381,10 +382,11 @@ def test_pyarrow_unified_session_properties() -> None:
**UNIFIED_AWS_SESSION_PROPERTIES,
}
kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557381909
@jiakai-li Thanks for working on this! And happy holidays :)
> I noticed the PyArrowFileIO._initialize_fs function doesn't take netloc
parameter into account when initial
jiakai-li commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2556106456
Hey @kevinjqliu , I hope you are ready for the Christmas time :-)
After some investigation, I noticed the `PyArrowFileIO._initialize_fs`
function doesn't take `netloc` para
44 matches
Mail list logo