[I] Extend Snapshot Metadata Lifecycle [iceberg]

2024-07-05 Thread via GitHub
szehon-ho opened a new issue, #10646: URL: https://github.com/apache/iceberg/issues/10646 ### Proposed Change **Motivation** Currently, a snapshot's lifecycle is handled by 'ExpireSnapshots(long olderThan)'. This operation does the following: - Choose a set of snapshots

Re: [PR] Add interfaces for Action RemoveExpiredFiles [iceberg]

2024-07-05 Thread via GitHub
anuragmantri commented on code in PR #10643: URL: https://github.com/apache/iceberg/pull/10643#discussion_r1667313200 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table tabl

Re: [PR] Add interfaces for Action RemoveExpiredFiles [iceberg]

2024-07-05 Thread via GitHub
ajantha-bhat commented on code in PR #10643: URL: https://github.com/apache/iceberg/pull/10643#discussion_r1667261838 ## api/src/main/java/org/apache/iceberg/actions/RemoveExpiredFiles.java: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Add interfaces for Action RemoveExpiredFiles [iceberg]

2024-07-05 Thread via GitHub
ajantha-bhat commented on code in PR #10643: URL: https://github.com/apache/iceberg/pull/10643#discussion_r1667260941 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table tabl

Re: [I] idea: Refactor the README to be more user-oriented [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #429: URL: https://github.com/apache/iceberg-rust/issues/429#issuecomment-2211632617 Hi, @liurenjie1024, could you start a tracking issues for all features that we lack compared to java impl? -- This is an automated message from the Apache Git Service. To respond

Re: [I] idea: Refactor the README to be more user-oriented [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #429: URL: https://github.com/apache/iceberg-rust/issues/429#issuecomment-2211629222 Quick Start is not ready yet, I will start another PR to add it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] Bump certifi from 2024.2.2 to 2024.7.4 [iceberg-python]

2024-07-05 Thread via GitHub
dependabot[bot] opened a new pull request, #899: URL: https://github.com/apache/iceberg-python/pull/899 Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4. Commits https://github.com/certifi/python-certifi/commit/bd8153872e9c6fc98f4023df9c2deaf

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
syun64 commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1667216339 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.

[I] REST Spec: Server-side Metadata Tables [iceberg]

2024-07-05 Thread via GitHub
flyrain opened a new issue, #10645: URL: https://github.com/apache/iceberg/issues/10645 ### Feature Request / Improvement ### Proposed Change This proposal introduces table metadata APIs to the Iceberg REST catalog (IRC) specification. One of Iceberg's most advantageous

[PR] Bump tenacity from 8.4.2 to 8.5.0 [iceberg-python]

2024-07-05 Thread via GitHub
dependabot[bot] opened a new pull request, #898: URL: https://github.com/apache/iceberg-python/pull/898 Bumps [tenacity](https://github.com/jd/tenacity) from 8.4.2 to 8.5.0. Release notes Sourced from https://github.com/jd/tenacity/releases";>tenacity's releases. 8.5.0 Wha

[PR] Bump deptry from 0.16.1 to 0.16.2 [iceberg-python]

2024-07-05 Thread via GitHub
dependabot[bot] opened a new pull request, #897: URL: https://github.com/apache/iceberg-python/pull/897 Bumps [deptry](https://github.com/fpgmaas/deptry) from 0.16.1 to 0.16.2. Release notes Sourced from https://github.com/fpgmaas/deptry/releases";>deptry's releases. 0.16.2

Re: [I] Add `make clean` option [iceberg-python]

2024-07-05 Thread via GitHub
Fokko closed issue #875: Add `make clean` option URL: https://github.com/apache/iceberg-python/issues/875 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] Makefile clean utility to remove cython cached objects [iceberg-python]

2024-07-05 Thread via GitHub
Fokko merged PR #881: URL: https://github.com/apache/iceberg-python/pull/881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] [Docs] Add examples for DataFrame branch writes [iceberg]

2024-07-05 Thread via GitHub
anuragmantri commented on PR #10644: URL: https://github.com/apache/iceberg/pull/10644#issuecomment-2211415230 @szehon-ho @flyrain - Could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] [Docs] Add examples for DataFrame branch writes [iceberg]

2024-07-05 Thread via GitHub
anuragmantri opened a new pull request, #10644: URL: https://github.com/apache/iceberg/pull/10644 Moved `Writing to branches` into SQL and data frame writes section and added examples of writing to branches with dataframes. https://github.com/apache/iceberg/assets/13743212/403d745

Re: [PR] Spark 3.3/3.4: support read of partition metadata column when table is over 1k [iceberg]

2024-07-05 Thread via GitHub
dramaticlly closed pull request #10641: Spark 3.3/3.4: support read of partition metadata column when table is over 1k URL: https://github.com/apache/iceberg/pull/10641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] Add interfaces for Action RemoveExpiredFiles [iceberg]

2024-07-05 Thread via GitHub
huaxingao opened a new pull request, #10643: URL: https://github.com/apache/iceberg/pull/10643 Co-authored-by: Yufei Gu [yu...@apache.org](yu...@apache.org) Co-authored-by: Huaxin Gao [huaxin_...@apple.com](mailto:huaxin_...@apple.com) This PR adds interfaces for Action RemoveExpi

[PR] Add interfaces for Action CheckSnapshotIntegrity [iceberg]

2024-07-05 Thread via GitHub
huaxingao opened a new pull request, #10642: URL: https://github.com/apache/iceberg/pull/10642 Co-authored-by: Yufei Gu [yu...@apache.org](yu...@apache.org) Co-authored-by: Huaxin Gao [huaxin_...@apple.com](mailto:huaxin_...@apple.com) This PR adds interfaces for Action CheckSnaps

Re: [PR] Spec: Make NDV blob metadata property required [iceberg]

2024-07-05 Thread via GitHub
amogh-jahagirdar commented on PR #10549: URL: https://github.com/apache/iceberg/pull/10549#issuecomment-2211248500 @findepi I think I largely agree with you that a Puffin V2 is probably too broad for this since we're not even changing the metadata; we can granularly update the theta sketch

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
syun64 commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1667028145 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.

Re: [I] Unable to get field from serde: org.apache.iceberg.mr.hive.HiveIcebergSerDe [iceberg]

2024-07-05 Thread via GitHub
pvary commented on issue #10633: URL: https://github.com/apache/iceberg/issues/10633#issuecomment-2211195925 Hive 4 uses its own embedded Iceberg. You don't need to add additional jars to the lib. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] Unable to get field from serde: org.apache.iceberg.mr.hive.HiveIcebergSerDe [iceberg]

2024-07-05 Thread via GitHub
pvary commented on issue #10633: URL: https://github.com/apache/iceberg/issues/10633#issuecomment-2211196219 Also this is most probably a Hive related questions, so better asked on the Hive user list. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Unable to get field from serde: org.apache.iceberg.mr.hive.HiveIcebergSerDe [iceberg]

2024-07-05 Thread via GitHub
pvary closed issue #10633: Unable to get field from serde: org.apache.iceberg.mr.hive.HiveIcebergSerDe URL: https://github.com/apache/iceberg/issues/10633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] ugi not correct in WORKER_POOL [iceberg]

2024-07-05 Thread via GitHub
pvary commented on issue #10639: URL: https://github.com/apache/iceberg/issues/10639#issuecomment-2211205533 @deniskuzZ: What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] Spark 3.3/3.4: support read of partition metadata column when table is over 1k [iceberg]

2024-07-05 Thread via GitHub
dramaticlly opened a new pull request, #10641: URL: https://github.com/apache/iceberg/pull/10641 Backport of pull request #10547 but looks like junit5 and assertJ is not there so keep the original unit test style -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Rename `data_sequence_number` to `sequence_number` [iceberg-python]

2024-07-05 Thread via GitHub
soumya-ghosh commented on issue #893: URL: https://github.com/apache/iceberg-python/issues/893#issuecomment-2211163215 @Fokko I would like to take a shot at this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Rename `data_sequence_number` to `sequence_number` [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on issue #893: URL: https://github.com/apache/iceberg-python/issues/893#issuecomment-2211166513 > Is there a way on the Java/spark side to turn metadata information into JSON? With https://github.com/apache/iceberg-python/issues/535, perhaps we can compare the two JSON resul

Re: [PR] Makefile clean utility to remove cython cached objects [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #881: URL: https://github.com/apache/iceberg-python/pull/881#discussion_r1667000258 ## mkdocs/docs/contributing.md: ## @@ -88,6 +88,16 @@ In contrast to the name suggest, it doesn't run the checks on the commit. If thi You can bump the integrati

Re: [I] Move `_determine_partitions` to `pyarrow.py` [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on issue #896: URL: https://github.com/apache/iceberg-python/issues/896#issuecomment-2211166797 @soumya-ghosh Sure thing! 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Spark: support read of partition metadata column when table is over 1k [iceberg]

2024-07-05 Thread via GitHub
szehon-ho commented on PR #10547: URL: https://github.com/apache/iceberg/pull/10547#issuecomment-2211160855 Merged, thanks @dramaticlly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Move `_determine_partitions` to `pyarrow.py` [iceberg-python]

2024-07-05 Thread via GitHub
soumya-ghosh commented on issue #896: URL: https://github.com/apache/iceberg-python/issues/896#issuecomment-2211161771 @Fokko could I pick this up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Rename `data_sequence_number` to `sequence_number` [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on issue #893: URL: https://github.com/apache/iceberg-python/issues/893#issuecomment-2211165635 @soumya-ghosh Feel free to take a stab at it, let me know if you run into anything -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Spark: support read of partition metadata column when table is over 1k [iceberg]

2024-07-05 Thread via GitHub
szehon-ho merged PR #10547: URL: https://github.com/apache/iceberg/pull/10547 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spec: Make NDV blob metadata property required [iceberg]

2024-07-05 Thread via GitHub
findepi commented on PR #10549: URL: https://github.com/apache/iceberg/pull/10549#issuecomment-2211167108 Puffin fole format has place for versioning within the magic, but the Puffin format doesn't change, only its use changes. Puffin spec is not authoritative source of information of all

[PR] Core: improve DefaultErrorHandler message for unhandled codes [iceberg]

2024-07-05 Thread via GitHub
devinrsmith opened a new pull request, #10640: URL: https://github.com/apache/iceberg/pull/10640 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[I] ugi not correct in WORKER_POOL [iceberg]

2024-07-05 Thread via GitHub
lurnagao-dahua opened a new issue, #10639: URL: https://github.com/apache/iceberg/issues/10639 ### Apache Iceberg version 1.4.3 ### Query engine Hive ### Please describe the bug 🐞 A user execution query `select * from iceberg_tb`.This is a simple grab will

Re: [I] Read manifest file Permission denied [iceberg]

2024-07-05 Thread via GitHub
lurnagao-dahua closed issue #10634: Read manifest file Permission denied URL: https://github.com/apache/iceberg/issues/10634 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Missing Artifacts for Flink v1.19 [iceberg]

2024-07-05 Thread via GitHub
ajantha-bhat commented on issue #10638: URL: https://github.com/apache/iceberg/issues/10638#issuecomment-2211106077 Iceberg doesn't support Flink 1.19 with the last release (1.5.x). Only the upcoming release (1.6.0) will have it. 1.6.0 will be released in a week or two. -- Th

Re: [I] Missing Artifacts for Flink v1.19 [iceberg]

2024-07-05 Thread via GitHub
ajantha-bhat closed issue #10638: Missing Artifacts for Flink v1.19 URL: https://github.com/apache/iceberg/issues/10638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Rename IO traits to `IcebergFileRead` or `IcebergRead`? [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #368: URL: https://github.com/apache/iceberg-rust/issues/368#issuecomment-2211095592 No actionable task yet, let's close. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] api: Metastore Catalog API design [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #61: URL: https://github.com/apache/iceberg-rust/issues/61#issuecomment-2211094023 Catalog API has been added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] api: Metastore Catalog API design [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo closed issue #61: api: Metastore Catalog API design URL: https://github.com/apache/iceberg-rust/issues/61 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Rename IO traits to `IcebergFileRead` or `IcebergRead`? [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo closed issue #368: Rename IO traits to `IcebergFileRead` or `IcebergRead`? URL: https://github.com/apache/iceberg-rust/issues/368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Tracking: Reading iceberg tables. [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #123: URL: https://github.com/apache/iceberg-rust/issues/123#issuecomment-2211093154 Is this tracking issue still relevant? Should we start a new one to better reflect the current status? -- This is an automated message from the Apache Git Service. To respond to

Re: [I] How to release [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #81: URL: https://github.com/apache/iceberg-rust/issues/81#issuecomment-2211091671 All work have been done, let's close. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] How to release [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo closed issue #81: How to release URL: https://github.com/apache/iceberg-rust/issues/81 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-uns

Re: [I] idea: RestCatalog::new should not be async [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo closed issue #422: idea: RestCatalog::new should not be async URL: https://github.com/apache/iceberg-rust/issues/422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] idea: RestCatalog::new should not be async [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #422: URL: https://github.com/apache/iceberg-rust/issues/422#issuecomment-2211085058 Fixed by https://github.com/apache/iceberg-rust/issues/430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Spec: Make NDV blob metadata property required [iceberg]

2024-07-05 Thread via GitHub
amogh-jahagirdar commented on PR #10549: URL: https://github.com/apache/iceberg/pull/10549#issuecomment-2211066525 Hey everyone, sorry for the delay just want to make sure we're following the standard improvement proposal process here: 1.) https://iceberg.apache.org/contribute/#what-i

Re: [PR] Spec: Make NDV blob metadata property required [iceberg]

2024-07-05 Thread via GitHub
amogh-jahagirdar closed pull request #10549: Spec: Make NDV blob metadata property required URL: https://github.com/apache/iceberg/pull/10549 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Support partial deletes [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1666905217 ## pyiceberg/table/__init__.py: ## @@ -498,7 +524,10 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) update

Re: [PR] Support partial deletes [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1666901857 ## pyiceberg/table/__init__.py: ## @@ -538,6 +566,74 @@ def overwrite( for data_file in data_files: update_snapshot.append_data

Re: [PR] Support partial deletes [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1666900390 ## pyiceberg/table/__init__.py: ## @@ -517,9 +546,6 @@ def overwrite( if not isinstance(df, pa.Table): raise ValueError(f"Expected PyArrow tabl

Re: [PR] Support create multiple element ns together for nessie [iceberg]

2024-07-05 Thread via GitHub
snazy commented on code in PR #10630: URL: https://github.com/apache/iceberg/pull/10630#discussion_r1666899521 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -290,6 +291,14 @@ public void renameTable(TableIdentifier from, TableIdentifier to) {

Re: [PR] Build: Downgrade Gradle from 8.8 to 8.7 due to bug with older OSX versions [iceberg]

2024-07-05 Thread via GitHub
snazy commented on PR #10637: URL: https://github.com/apache/iceberg/pull/10637#issuecomment-2211013790 [Looks like macOS 11 is already EOL](https://endoflife.date/macos) and this issue doesn't happen on macOS 12 + 13 + 14. Not sure whether it's worth to adopt for an EOL operating system.

[PR] fix: Fix build while no-default-features enabled [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo opened a new pull request, #442: URL: https://github.com/apache/iceberg-rust/pull/442 This PR will add feature gates for different storage services. Also fix some feature related build error while `no-default-features` has been set. -- This is an automated message from the Apache G

Re: [PR] Support partial deletes [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1666880488 ## pyiceberg/table/__init__.py: ## @@ -3882,7 +4161,7 @@ def _get_table_partitions( return table_partitions -def _determine_partitions(spec: PartitionSpec,

Re: [PR] Support partial deletes [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1666876690 ## tests/integration/test_deletes.py: ## @@ -0,0 +1,368 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] Support partial deletes [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1666875275 ## pyiceberg/table/__init__.py: ## @@ -237,6 +245,10 @@ class TableProperties: WRITE_PARTITION_SUMMARY_LIMIT = "write.summary.partition-limit" WRITE_PARTIT

Re: [PR] Support partial deletes [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1666868594 ## pyiceberg/table/__init__.py: ## @@ -454,6 +482,74 @@ def overwrite( for data_file in data_files: update_snapshot.append_data

Re: [PR] Add Snowflake catalog [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on PR #687: URL: https://github.com/apache/iceberg-python/pull/687#issuecomment-2210949950 @prabodh1194 Thanks for working on this. What are your thoughts on Snowflake adding REST catalog support, would this still be needed? Could you also make sure that you add this c

Re: [PR] Add Snowflake catalog [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #687: URL: https://github.com/apache/iceberg-python/pull/687#discussion_r1666855752 ## pyiceberg/catalog/snowflake_catalog.py: ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreemen

Re: [PR] Add Snowflake catalog [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #687: URL: https://github.com/apache/iceberg-python/pull/687#discussion_r1666852105 ## pyiceberg/catalog/snowflake_catalog.py: ## @@ -0,0 +1,289 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreemen

Re: [PR] Forward Compatible large_* type support: read as large, write as small [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #890: URL: https://github.com/apache/iceberg-python/pull/890#discussion_r1666835301 ## pyiceberg/table/__init__.py: ## @@ -1866,7 +1866,7 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] -

Re: [I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
TennyZhuang commented on issue #437: URL: https://github.com/apache/iceberg-rust/issues/437#issuecomment-2210912928 It's acceptable, but not as expected. When a client request a token, the server may expect it will be used for several hours. -- This is an automated message from the Apache

Re: [I] Discussion: Use unstable rust to run ci tool. [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 commented on issue #440: URL: https://github.com/apache/iceberg-rust/issues/440#issuecomment-2210906309 > I propose using the exact same nightly toolchain date as the stable Rust release in our `rust-toolchain` file. For example, if Rust 1.79 is released on `2024-06-13`, we wi

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666586779 ## mkdocs/docs/configuration.md: ## @@ -299,4 +299,8 @@ PyIceberg uses multiple threads to parallelize operations. The number of workers # Backward Compatibility

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666824972 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.t

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666824972 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.t

Re: [I] Discussion: Use unstable rust to run ci tool. [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #440: URL: https://github.com/apache/iceberg-rust/issues/440#issuecomment-2210900507 I propose using the exact same nightly toolchain date as the stable Rust release in our `rust-toolchain` file. For example, if Rust 1.79 is released on `2024-06-13`, we will use `n

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666823874 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.t

Re: [I] Add `cargo udeps` to check unused dependencies. [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #439: URL: https://github.com/apache/iceberg-rust/issues/439#issuecomment-2210896036 I will suggest to use [cargo-machete](https://github.com/bnjbvr/cargo-machete) for it's super fast speed. -- This is an automated message from the Apache Git Service. To respond

[I] Add `cargo udeps` to check unused dependencies. [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 opened a new issue, #439: URL: https://github.com/apache/iceberg-rust/issues/439 [cargo udeps](https://crates.io/crates/cargo-udeps) is a tool for discovering unused dependencies. It requires unstable rust to run, but I think it's ok to use unstable rust in ci. -- This is a

Re: [PR] refactor(io): Split io into smaller mods [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 merged PR #438: URL: https://github.com/apache/iceberg-rust/pull/438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 commented on issue #437: URL: https://github.com/apache/iceberg-rust/issues/437#issuecomment-2210873053 > Then we need https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html, still depends on tokio :( > > This is rust How about [futures Mutex](https://docs.rs

Re: [PR] reuse docker container to save compute resources [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 commented on code in PR #428: URL: https://github.com/apache/iceberg-rust/pull/428#discussion_r1666803055 ## Cargo.toml: ## @@ -82,6 +82,7 @@ serde_repr = "0.1.16" serde_with = "3.4.0" tempfile = "3.8" tokio = { version = "1", features = ["macros"] } +tokio-shar

Re: [I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
TennyZhuang commented on issue #437: URL: https://github.com/apache/iceberg-rust/issues/437#issuecomment-2210859980 Then we need https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html, still depends on tokio :( This is rust -- This is an automated message from the Apache Git Se

Re: [PR] REST: refactor OAuth logic into AuthManager Interface [iceberg]

2024-07-05 Thread via GitHub
adutra commented on code in PR #10621: URL: https://github.com/apache/iceberg/pull/10621#discussion_r1665281650 ## core/src/main/java/org/apache/iceberg/rest/auth/AuthManager.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

Re: [PR] Core: Defer reading Avro metadata until ManifestFile is read [iceberg]

2024-07-05 Thread via GitHub
lurnagao-dahua commented on PR #5206: URL: https://github.com/apache/iceberg/pull/5206#issuecomment-2210826870 > `ManifestFiles.read` setup into the `ParallelIterator` that is used to plan tasks. Hi, may I ask if my question is related to what you said? https://github.com/apache/ic

Re: [I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 commented on issue #437: URL: https://github.com/apache/iceberg-rust/issues/437#issuecomment-2210796681 > 3. Store `(expired_at, token)` pair in an async `Mutex`, and check whether it's close to expiration when needed. If it's close to expiration, lock the mutex, update the pa

Re: [I] bug: use s3 endpoint fail to read manifest list with s3a location [iceberg-rust]

2024-07-05 Thread via GitHub
chenzl25 commented on issue #434: URL: https://github.com/apache/iceberg-rust/issues/434#issuecomment-2210689394 > Thanks @chenzl25 for reporting this. I've checked the code and think this is expected. iceberg stores absolute path in manifests, and currently it's illegal to mix use s3 and s

Re: [I] bug: use s3 endpoint fail to read manifest list with s3a location [iceberg-rust]

2024-07-05 Thread via GitHub
chenzl25 closed issue #434: bug: use s3 endpoint fail to read manifest list with s3a location URL: https://github.com/apache/iceberg-rust/issues/434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Hive Permission denied [iceberg]

2024-07-05 Thread via GitHub
lurnagao-dahua commented on issue #10634: URL: https://github.com/apache/iceberg/issues/10634#issuecomment-2210672341 > sorry, I confuse the problem. `CachedClientPool` is used for connection to hms, not hdfs file. Hi, I find the ugi with a72f8bf5-5d93-405b-953e-a8fed8bfa6b6-m0.av

Re: [I] Hive Permission denied [iceberg]

2024-07-05 Thread via GitHub
lurnagao-dahua commented on issue #10634: URL: https://github.com/apache/iceberg/issues/10634#issuecomment-2210669085 I find the ugi with a72f8bf5-5d93-405b-953e-a8fed8bfa6b6-m0.avro is B and other file is A -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r144991 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.t

Re: [I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
TennyZhuang commented on issue #437: URL: https://github.com/apache/iceberg-rust/issues/437#issuecomment-2210604365 3. Store `(fetched_at, token)` pair in an async `Mutex`, and check whether it's close to expiration when needed. If it's close to expiration, lock the mutex, update the pair,

Re: [PR] Migrate source package in Flink [iceberg]

2024-07-05 Thread via GitHub
nastra merged PR #10632: URL: https://github.com/apache/iceberg/pull/10632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 commented on issue #437: URL: https://github.com/apache/iceberg-rust/issues/437#issuecomment-2210591962 I also prefer 2. The iceberg community may deprecate current auth approach in future, and introduce true oauth client in future. I prefer to have a simpler solution without

Re: [I] bug: use s3 endpoint fail to read manifest list with s3a location [iceberg-rust]

2024-07-05 Thread via GitHub
liurenjie1024 commented on issue #434: URL: https://github.com/apache/iceberg-rust/issues/434#issuecomment-2210586119 Thanks @chenzl25 for reporting this. I've checked the code and think this is expected. iceberg stores absolute path in manifests, and currently it's illegal to mix use s3 an

Re: [I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
Xuanwo commented on issue #437: URL: https://github.com/apache/iceberg-rust/issues/437#issuecomment-2210582035 > 2\. Call every methods with a retry wrapper. When meeting an unauthorized error, refetch the token and retry the method. Our rest catalog client now has a `authenticate` fu

Re: [PR] Migrate source package in Flink [iceberg]

2024-07-05 Thread via GitHub
tomtongue commented on PR #10632: URL: https://github.com/apache/iceberg/pull/10632#issuecomment-2210581746 Thanks for the review @nastra and @ebyhr. Could you review the new changes again? @nastra (if possible, @ebyhr) -- This is an automated message from the Apache Git Service. To respo

[I] discussion: token refresh mechanism for rest client [iceberg-rust]

2024-07-05 Thread via GitHub
TennyZhuang opened a new issue, #437: URL: https://github.com/apache/iceberg-rust/issues/437 Background: #301 The token fetched from the token server may have a TTL, see `TokenResponse::expires_in`. In most cases, it's about several hours. Our catalog client is a long-lived object,

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666586779 ## mkdocs/docs/configuration.md: ## @@ -299,4 +299,8 @@ PyIceberg uses multiple threads to parallelize operations. The number of workers # Backward Compatibility

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-05 Thread via GitHub
Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666586081 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.t

Re: [I] Hive Permission denied [iceberg]

2024-07-05 Thread via GitHub
lurnagao-dahua commented on issue #10634: URL: https://github.com/apache/iceberg/issues/10634#issuecomment-2210521048 when i roll back snapshot before a72f8bf5-5d93-405b-953e-a8fed8bfa6b6-m0.avro generation, it can be queried -- This is an automated message from the Apache Git Service. To

Re: [PR] Migrate source package in Flink [iceberg]

2024-07-05 Thread via GitHub
tomtongue commented on code in PR #10632: URL: https://github.com/apache/iceberg/pull/10632#discussion_r1666558981 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/assigner/SplitAssignerTestBase.java: ## @@ -101,29 +103,29 @@ private void assertAvailableFuture(

Re: [I] ci: Add macos runner for ci. [iceberg-rust]

2024-07-05 Thread via GitHub
QuakeWang commented on issue #436: URL: https://github.com/apache/iceberg-rust/issues/436#issuecomment-2210503520 > Hi, @QuakeWang I see that it's only enabled for build, I mean to have tests also enabled. Okay, I get it~ -- This is an automated message from the Apache Git Service.

Re: [PR] AWS: Retain Glue Catalog column comment after updating Iceberg table [iceberg]

2024-07-05 Thread via GitHub
lawofcycles commented on PR #10276: URL: https://github.com/apache/iceberg/pull/10276#issuecomment-2210497935 Thank you for merging @Fokko ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] AWS: Retain Glue Catalog column comment after updating Iceberg table [iceberg]

2024-07-05 Thread via GitHub
lawofcycles commented on PR #10276: URL: https://github.com/apache/iceberg/pull/10276#issuecomment-2210496283 Thanks for reviewing @amogh-jahagirdar , @aajisaka , @geruh and @rahil-c !! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] Create table properties does not support boolean value [iceberg-python]

2024-07-05 Thread via GitHub
tlegrave opened a new issue, #895: URL: https://github.com/apache/iceberg-python/issues/895 ### Apache Iceberg version main (development) ### Please describe the bug 🐞 Hello there, When creating a table, I can pass properties like bloom filters like so: ```p

  1   2   >