Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
ggershinsky commented on PR #7770: URL: https://github.com/apache/iceberg/pull/7770#issuecomment-2354704091 Ok. We don't have clear guidelines on key caching in memory (key copies are spread all over the process memory - cache, plug-in KMS client code, an HTTP library in the KMS client code

Re: [PR] OpenAPI: Standardize credentials in loadTable/loadView responses [iceberg]

2024-09-16 Thread via GitHub
nastra commented on code in PR #10722: URL: https://github.com/apache/iceberg/pull/10722#discussion_r1762478329 ## open-api/rest-catalog-open-api.yaml: ## @@ -3103,6 +3103,81 @@ components: uuid: type: string +ADLSCredentials: + type: object +

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
ggershinsky commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1762461922 ## core/src/main/java/org/apache/iceberg/ManifestListWriter.java: ## @@ -19,26 +19,63 @@ package org.apache.iceberg; import java.io.IOException; +import java.ni

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
ggershinsky commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1762446655 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -160,4 +160,10 @@ private CatalogProperties() {} public static final String ENCRYPTION_K

Re: [PR] AWS: Set better defaults for S3 retry behaviour [iceberg]

2024-09-16 Thread via GitHub
nastra commented on code in PR #11052: URL: https://github.com/apache/iceberg/pull/11052#discussion_r1762443878 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -393,6 +403,21 @@ public class S3FileIOProperties implements Serializable { */ pri

[PR] Use ArrowScan.to_table to replace project_table [iceberg-python]

2024-09-16 Thread via GitHub
JE-Chen opened a new pull request, #1180: URL: https://github.com/apache/iceberg-python/pull/1180 PR #1119 - Use ArrowScan.to_table to replace project_table on these file: - pyiceberg\table\__init__.py - pyiceberg\io\pyarrow.py - pyiceberg\test_pyarrow.py -- This is an

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
ggershinsky commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1762381570 ## core/src/main/java/org/apache/iceberg/BaseManifestListFile.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-16 Thread via GitHub
pvary commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1762374234 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/stream/ExpireSnapshots.java: ## @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-16 Thread via GitHub
pvary commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1762367350 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/stream/ExpireSnapshots.java: ## @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-16 Thread via GitHub
pvary commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1762366316 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/stream/ExpireSnapshots.java: ## @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: Maintenance - TableManager + ExpireSnapshots [iceberg]

2024-09-16 Thread via GitHub
pvary commented on code in PR #11144: URL: https://github.com/apache/iceberg/pull/11144#discussion_r1762360714 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/maintenance/stream/MaintenanceTaskBuilder.java: ## @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-16 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1762263665 ## core/src/test/java/org/apache/iceberg/TestBaseIncrementalChangelogScan.java: ## @@ -132,6 +131,139 @@ public void testFileDeletes() { assertThat(t1.existingDel

Re: [I] Inconsistent row count across versions [iceberg-python]

2024-09-16 Thread via GitHub
dev-goyal commented on issue #1132: URL: https://github.com/apache/iceberg-python/issues/1132#issuecomment-2354409555 Hi @sungwy absolutely - give me a couple days please, but I will prioritize testing this ASAP. Thank you so much for prioritizing the fix, we much appreciate it! --

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1762163236 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3InputStream.java: ## @@ -178,18 +183,23 @@ private void positionStream() throws IOException { } priv

Re: [I] Add ability to pickle a `Table` [iceberg-python]

2024-09-16 Thread via GitHub
github-actions[bot] commented on issue #513: URL: https://github.com/apache/iceberg-python/issues/513#issuecomment-2354248809 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [I] Ability to pickle the `Catalog` [iceberg-python]

2024-09-16 Thread via GitHub
github-actions[bot] commented on issue #514: URL: https://github.com/apache/iceberg-python/issues/514#issuecomment-2354248782 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [PR] Core: Avoid NPE when getting updateEvent in FastAppend [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on PR #8507: URL: https://github.com/apache/iceberg/pull/8507#issuecomment-2354247557 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] Can sparksql ddl define primary key now? [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on issue #8508: URL: https://github.com/apache/iceberg/issues/8508#issuecomment-2354247584 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Cache filesToImport variable in importSparkPartitions to avoid duplicated compute [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on PR #8505: URL: https://github.com/apache/iceberg/pull/8505#issuecomment-2354247539 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Add metricsConfig when build writer [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on PR #8498: URL: https://github.com/apache/iceberg/pull/8498#issuecomment-2354247511 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] unify streaming and batch, combining FLink and iceberg.In case In pipeline, Is kafka necessary? [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on issue #8468: URL: https://github.com/apache/iceberg/issues/8468#issuecomment-2354247458 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] While decrypting iceberg table data using aws encyption sdk getting unsupported version error [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on issue #8497: URL: https://github.com/apache/iceberg/issues/8497#issuecomment-2354247485 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Spark 3.4: Remove unused parameters [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on PR #8463: URL: https://github.com/apache/iceberg/pull/8463#issuecomment-2354247439 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] IcebergParseException.getMessage does not show the below line [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on issue #8462: URL: https://github.com/apache/iceberg/issues/8462#issuecomment-2354247424 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Added documentation on getting started with GCS [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on PR #8171: URL: https://github.com/apache/iceberg/pull/8171#issuecomment-2354247142 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Added documentation on getting started with GCS [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] closed pull request #8171: Added documentation on getting started with GCS URL: https://github.com/apache/iceberg/pull/8171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Spark: support use-table-distribution-and-ordering in session conf [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] commented on PR #8164: URL: https://github.com/apache/iceberg/pull/8164#issuecomment-2354247119 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Spark: support use-table-distribution-and-ordering in session conf [iceberg]

2024-09-16 Thread via GitHub
github-actions[bot] closed pull request #8164: Spark: support use-table-distribution-and-ordering in session conf URL: https://github.com/apache/iceberg/pull/8164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable [iceberg]

2024-09-16 Thread via GitHub
szehon-ho commented on code in PR #9335: URL: https://github.com/apache/iceberg/pull/9335#discussion_r1762009524 ## .palantir/revapi.yml: ## @@ -1136,6 +1136,78 @@ acceptedBreaks: new: "method org.apache.iceberg.BaseMetastoreOperations.CommitStatus org.apache.iceberg.Bas

Re: [PR] Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable [iceberg]

2024-09-16 Thread via GitHub
szehon-ho commented on code in PR #9335: URL: https://github.com/apache/iceberg/pull/9335#discussion_r1762097553 ## .palantir/revapi.yml: ## @@ -1136,6 +1136,78 @@ acceptedBreaks: new: "method org.apache.iceberg.BaseMetastoreOperations.CommitStatus org.apache.iceberg.Bas

[PR] Bump sqlalchemy from 2.0.34 to 2.0.35 [iceberg-python]

2024-09-16 Thread via GitHub
dependabot[bot] opened a new pull request, #1179: URL: https://github.com/apache/iceberg-python/pull/1179 Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.34 to 2.0.35. Release notes Sourced from https://github.com/sqlalchemy/sqlalchemy/releases";>sqlalchemy's

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-09-16 Thread via GitHub
puchengy commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2354141232 @danielcweeks We had some workload happens very frequently and how we solved it is by using HadoopFileIO instead. Just for sharing a data point. -- This is an automated message

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rcjverhoef commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1762019167 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rcjverhoef commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1762019167 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rcjverhoef commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1762020148 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rcjverhoef commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1762020148 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rcjverhoef commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1762020148 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] Spark 3.5: Don't change table distribution when only altering local order [iceberg]

2024-09-16 Thread via GitHub
szehon-ho commented on code in PR #10774: URL: https://github.com/apache/iceberg/pull/10774#discussion_r1762019985 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -226,11 +226,13 @@ class I

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rcjverhoef commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1762019167 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rcjverhoef commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1762016389 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2347,7 +2347,7 @@ public void testPaginationForListNamespaces() { RESTCatalog catal

Re: [PR] Spark 3.5: Don't change table distribution when only altering local order [iceberg]

2024-09-16 Thread via GitHub
RussellSpitzer commented on code in PR #10774: URL: https://github.com/apache/iceberg/pull/10774#discussion_r1762011235 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -226,11 +226,13 @@ cl

Re: [PR] Spark 3.5: Don't change table distribution when only altering local order [iceberg]

2024-09-16 Thread via GitHub
szehon-ho commented on code in PR #10774: URL: https://github.com/apache/iceberg/pull/10774#discussion_r1762006080 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -226,11 +226,13 @@ class I

Re: [PR] Spark 3.5: Don't change table distribution when only altering local order [iceberg]

2024-09-16 Thread via GitHub
szehon-ho commented on code in PR #10774: URL: https://github.com/apache/iceberg/pull/10774#discussion_r1762006080 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -226,11 +226,13 @@ class I

Re: [PR] Spark 3.4: Add utility to load table state reliably [iceberg]

2024-09-16 Thread via GitHub
szehon-ho commented on PR #5: URL: https://github.com/apache/iceberg/pull/5#issuecomment-2354083965 Merged, thanks @dramaticlly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Spark 3.4: Add utility to load table state reliably [iceberg]

2024-09-16 Thread via GitHub
szehon-ho merged PR #5: URL: https://github.com/apache/iceberg/pull/5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[PR] ci(release): add release scripts and workflow [iceberg-go]

2024-09-16 Thread via GitHub
zeroshade opened a new pull request, #150: URL: https://github.com/apache/iceberg-go/pull/150 Adds release and verify scripts along with a README for how to run them in order to perform releases of the iceberg-go project. -- This is an automated message from the Apache Git Service. To res

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-09-16 Thread via GitHub
danielcweeks commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2354065673 @SandeepSinghGahir I'm really surprised that you're hitting this issue so frequently. Is there something specific about this workload that you think might be triggering this

Re: [PR] Flink: Avoid metaspace memory leak by not registering ShutdownHook for ExecutorService in Flink [iceberg]

2024-09-16 Thread via GitHub
danielcweeks commented on code in PR #11073: URL: https://github.com/apache/iceberg/pull/11073#discussion_r1761979260 ## core/src/main/java/org/apache/iceberg/util/ThreadPools.java: ## @@ -86,9 +86,18 @@ public static ExecutorService newWorkerPool(String namePrefix) { }

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-09-16 Thread via GitHub
sungwy commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1761801698 ## pyiceberg/table/__init__.py: ## @@ -456,6 +461,85 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) for data_

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-09-16 Thread via GitHub
sungwy commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1761797345 ## pyiceberg/table/__init__.py: ## @@ -456,6 +461,85 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) for data_

Re: [PR] fix: support MonthTransform for partitioning [iceberg-python]

2024-09-16 Thread via GitHub
kevinjqliu merged PR #1176: URL: https://github.com/apache/iceberg-python/pull/1176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-09-16 Thread via GitHub
SandeepSinghGahir commented on PR #10433: URL: https://github.com/apache/iceberg/pull/10433#issuecomment-2353901444 > > > > The S3 team (@ookumuso) just published what they have developed internally and is now used in Amazon EMR, Athena, GlueETL distributions of Iceberg: #11052, is there a

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
singhpk234 commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1761813653 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-09-16 Thread via GitHub
SandeepSinghGahir commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2353801275 > I can't speak for the S3FileIO developers; S3AFS is where I code and while there's a lot of work there for recovery [here](https://github.com/apache/hadoop/blob/trunk/h

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rahil-c commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1761783152 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
rahil-c commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1761778931 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2347,7 +2347,7 @@ public void testPaginationForListNamespaces() { RESTCatalog catalog

Re: [PR] Bump pypa/cibuildwheel from 2.20.0 to 2.21.0 [iceberg-python]

2024-09-16 Thread via GitHub
sungwy merged PR #1175: URL: https://github.com/apache/iceberg-python/pull/1175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Bump griffe from 1.3.0 to 1.3.1 [iceberg-python]

2024-09-16 Thread via GitHub
sungwy merged PR #1170: URL: https://github.com/apache/iceberg-python/pull/1170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Remove unnecessary _ensure_tables_exist method [iceberg-python]

2024-09-16 Thread via GitHub
sungwy commented on code in PR #1155: URL: https://github.com/apache/iceberg-python/pull/1155#discussion_r1761710974 ## tests/catalog/test_sql.py: ## @@ -225,6 +237,93 @@ def test_creation_from_impl(catalog_name: str, warehouse: Path) -> None: ) +def confirm_no_tables_

Re: [PR] Kafka Connect: separate CI workflow [iceberg]

2024-09-16 Thread via GitHub
manuzhang commented on code in PR #11075: URL: https://github.com/apache/iceberg/pull/11075#discussion_r1761686615 ## kafka-connect/kafka-connect-runtime/src/integration/java/org/apache/iceberg/connect/TestContext.java: ## @@ -51,6 +52,7 @@ public class TestContext { private

Re: [I] Do not deprecate Botocore Session in upcoming release (0.8) [iceberg-python]

2024-09-16 Thread via GitHub
cshenrik commented on issue #1104: URL: https://github.com/apache/iceberg-python/issues/1104#issuecomment-2353667112 > > The glue catalog picks up the session correctly, but it doesn't use it for accessing S3. > > you can either set glue and s3 credentials separately or use the unifi

Re: [I] DOCS: Report CSS and styling issues on the new site. [iceberg]

2024-09-16 Thread via GitHub
manuzhang commented on issue #9643: URL: https://github.com/apache/iceberg/issues/9643#issuecomment-2353660066 @dyfrgi there is a fix at https://github.com/apache/iceberg/pull/11067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Mixed usage of snapshotCreationTs, metadataCommitTs & tableAccessTs when using REST Catalog [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on issue #11103: URL: https://github.com/apache/iceberg/issues/11103#issuecomment-2353618049 > The problem really is: the current RESTCatalog client implementation will **NOT** honor the `last-updated-ms` field passed back by REST server, and instead override the `last-upda

Re: [I] Schema: Allow field name `foo.bar` even if struct foo->bar is present [iceberg-rust]

2024-09-16 Thread via GitHub
rdblue commented on issue #591: URL: https://github.com/apache/iceberg-rust/issues/591#issuecomment-2353612190 This is an implementation detail that is not part of the spec. But I don't think it is worth bothering to enable both `["foo.bar"]` and `["foo", "bar"]` identifiers in the same sch

Re: [I] Modify SQLCatalog initialization so that classes are not always created and update how these classes are created to be more open to tother DB's [iceberg-python]

2024-09-16 Thread via GitHub
isc-patrick commented on issue #1148: URL: https://github.com/apache/iceberg-python/issues/1148#issuecomment-2353610223 Sounds good. I added ensure_tables_exist back to the code along with the new flag and the tests in the branch with the PR. -- This is an automated message from the Apac

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on PR #7770: URL: https://github.com/apache/iceberg/pull/7770#issuecomment-2353594379 @ggershinsky, I'm not too concerned with the size of the cache. I'm okay with 1 day, but that seems like a long time to have unencrypted key material in memory. I'll defer to your judgemen

Re: [I] to_pandas(), to_arrow() fail because case_sensitive doesn't work if column in row_filter doesn't match the case even if case_sensitive is set to False in scan [iceberg-python]

2024-09-16 Thread via GitHub
kevinjqliu commented on issue #1177: URL: https://github.com/apache/iceberg-python/issues/1177#issuecomment-2353577521 thanks for reporting this. it might be due to a bug we recently fixed in #1147. can you try it against the latest main branch? -- This is an automated message fr

Re: [PR] feat: SQL Catalog - Tables [iceberg-rust]

2024-09-16 Thread via GitHub
callum-ryan commented on code in PR #610: URL: https://github.com/apache/iceberg-rust/pull/610#discussion_r1761619195 ## crates/catalog/sql/src/error.rs: ## @@ -32,3 +32,20 @@ pub fn no_such_namespace_err(namespace: &NamespaceIdent) -> Result { format!("No such namespa

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1761615316 ## core/src/main/java/org/apache/iceberg/encryption/WrappedEncryptionKey.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1761612251 ## core/src/main/java/org/apache/iceberg/encryption/NativeEncryptionKeyMetadata.java: ## @@ -27,4 +27,15 @@ public interface NativeEncryptionKeyMetadata extends Encrypt

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1761603137 ## core/src/main/java/org/apache/iceberg/ManifestListWriter.java: ## @@ -19,26 +19,63 @@ package org.apache.iceberg; import java.io.IOException; +import java.nio.Byt

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1761602204 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -160,4 +160,10 @@ private CatalogProperties() {} public static final String ENCRYPTION_KMS_TY

Re: [PR] Docs: Backport fixes for remove_orphan_files procedure [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar merged PR #11133: URL: https://github.com/apache/iceberg/pull/11133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1761600853 ## core/src/main/java/org/apache/iceberg/BaseManifestListFile.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

Re: [PR] Manifest list encryption [iceberg]

2024-09-16 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1761599946 ## .palantir/revapi.yml: ## @@ -1091,6 +1096,12 @@ acceptedBreaks: - code: "java.class.removed" old: "enum org.apache.iceberg.BaseMetastoreTableOperations.Co

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1761598959 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -153,6 +154,12 @@ void caseSensitive(boolean newCaseSensitive) { void delete(F

Re: [I] Modify SQLCatalog initialization so that classes are not always created and update how these classes are created to be more open to tother DB's [iceberg-python]

2024-09-16 Thread via GitHub
sungwy commented on issue #1148: URL: https://github.com/apache/iceberg-python/issues/1148#issuecomment-2353497016 Hi @isc-patrick - thank you very much for taking the time to bring these points to discussiong. I think these are important questions for us to answer and take a stance on as

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-09-16 Thread via GitHub
RussellSpitzer commented on PR #11130: URL: https://github.com/apache/iceberg/pull/11130#issuecomment-2353483475 > Is there a path for upgrading an existing Iceberg table to use row-lineage? Turning on row-lineage would start tracking for all rows added after that point, i'm not sure

Re: [PR] REST: Handle Requests with Page Sizes Exceeding Available Number of Namespaces /Tables/Views [iceberg]

2024-09-16 Thread via GitHub
singhpk234 commented on code in PR #11143: URL: https://github.com/apache/iceberg/pull/11143#discussion_r1761541678 ## core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java: ## @@ -2409,7 +2409,7 @@ public void testPaginationForListTables() { RESTCatalog catalog =

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-09-16 Thread via GitHub
dyfrgi commented on PR #11130: URL: https://github.com/apache/iceberg/pull/11130#issuecomment-2353438763 Is there a path for upgrading an existing Iceberg table to use row-lineage? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large [iceberg-python]

2024-09-16 Thread via GitHub
kevinjqliu commented on code in PR #1141: URL: https://github.com/apache/iceberg-python/pull/1141#discussion_r1761505071 ## pyiceberg/io/pyarrow.py: ## @@ -1238,10 +1238,13 @@ def _task_to_record_batches( for batch in batches: next_index = next_index + len(

Re: [I] Do not deprecate Botocore Session in upcoming release (0.8) [iceberg-python]

2024-09-16 Thread via GitHub
kevinjqliu commented on issue #1104: URL: https://github.com/apache/iceberg-python/issues/1104#issuecomment-2353430765 > The glue catalog picks up the session correctly, but it doesn't use it for accessing S3. you can either set glue and s3 credentials separately or use the unified A

Re: [I] `project_table` is deprecated, remove references [iceberg-python]

2024-09-16 Thread via GitHub
kevinjqliu commented on issue #1119: URL: https://github.com/apache/iceberg-python/issues/1119#issuecomment-2353428523 You can create an `ArrowScan` instance like so https://github.com/apache/iceberg-python/blob/0dc54080aa287dc8e920da128d7f4b335965f1df/pyiceberg/table/__init__.py#L1398-

Re: [I] Create table format version constants [iceberg-python]

2024-09-16 Thread via GitHub
kevinjqliu commented on issue #851: URL: https://github.com/apache/iceberg-python/issues/851#issuecomment-2353425819 @tanmayrauth Yes! The enum will be easier to work with. There are a lot of raw string comparisons like this one https://github.com/search?q=repo%3Aapache%2Ficeberg-python+pat

Re: [PR] Flink: Increase the number of checkpoints from 4 to 6 to fix flakiness. [iceberg]

2024-09-16 Thread via GitHub
stevenzwu merged PR #11121: URL: https://github.com/apache/iceberg/pull/11121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1761484377 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -308,11 +316,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manife

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1761479745 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -308,11 +316,15 @@ private ManifestFile filterManifest(Schema tableSchema, Manife

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1761476386 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -421,7 +433,7 @@ private ManifestFile filterManifestWithDeletedFiles(

Re: [PR] AWS: Set better defaults for S3 retry behaviour [iceberg]

2024-09-16 Thread via GitHub
ookumuso commented on code in PR #11052: URL: https://github.com/apache/iceberg/pull/11052#discussion_r1761455914 ## docs/docs/aws.md: ## @@ -378,6 +378,19 @@ However, for the older versions up to 0.12.0, the logic is as follows: For more details, please refer to the [Locati

Re: [PR] OpenAPI: Standardize credentials in loadTable/loadView responses [iceberg]

2024-09-16 Thread via GitHub
flyrain commented on code in PR #10722: URL: https://github.com/apache/iceberg/pull/10722#discussion_r1761451799 ## open-api/rest-catalog-open-api.yaml: ## @@ -3103,6 +3103,81 @@ components: uuid: type: string +ADLSCredentials: + type: object +

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1761439874 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -153,6 +154,12 @@ void caseSensitive(boolean newCaseSensitive) { void delete(F

[I] procedure add_files parallelism > 1 -> NotSerializableException [iceberg]

2024-09-16 Thread via GitHub
zzeekk opened a new issue, #11147: URL: https://github.com/apache/iceberg/issues/11147 ### Apache Iceberg version 1.6.1 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Problem: Executing "system.add_files(... parallelism => 2)" resul

Re: [I] Iceberg to configure AWS S3 configuration with the Hadoop and Hive4 setup is hanging without giving ant error [iceberg]

2024-09-16 Thread via GitHub
pvary commented on issue #11145: URL: https://github.com/apache/iceberg/issues/11145#issuecomment-2353315522 If this is a Hive4 issue, could you please try to talk to the Hive team, as the Hive4 integration is owned by them. Thanks, Peter -- This is an automated message from the Apa

Re: [I] Modify SQLCatalog initialization so that classes are not always created and update how these classes are created to be more open to tother DB's [iceberg-python]

2024-09-16 Thread via GitHub
isc-patrick commented on issue #1148: URL: https://github.com/apache/iceberg-python/issues/1148#issuecomment-2353288492 The problem I see with moving into the direction of creating integration tests for specific DB's(outside of SQLite) is that you are opening up to an enormous amount of po

Re: [I] Table maintenace procedure(expire_snapshots) not work as expceted [iceberg]

2024-09-16 Thread via GitHub
SanjayKhoros commented on issue #10907: URL: https://github.com/apache/iceberg/issues/10907#issuecomment-2353234819 Thanks for the quick reply @RussellSpitzer Sharing little more details, Flink version - 1.20.0 Iceberg version - 1.6.1 ` long cutoffDateMillis = LocalDateTi

Re: [PR] Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large [iceberg-python]

2024-09-16 Thread via GitHub
sungwy commented on code in PR #1141: URL: https://github.com/apache/iceberg-python/pull/1141#discussion_r1761335479 ## pyiceberg/io/pyarrow.py: ## @@ -1251,10 +1253,17 @@ def _task_to_record_batches( arrow_table = arrow_table.filter(pyarrow_filter)

Re: [I] Table maintenace procedure(expire_snapshots) not work as expceted [iceberg]

2024-09-16 Thread via GitHub
RussellSpitzer commented on issue #10907: URL: https://github.com/apache/iceberg/issues/10907#issuecomment-2353182041 My gut feeling there is that your cutoffDateMillis is not what you think it is. That looks correct to me though. -- This is an automated message from the Apache Git Servic

Re: [PR] Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large [iceberg-python]

2024-09-16 Thread via GitHub
sungwy commented on code in PR #1141: URL: https://github.com/apache/iceberg-python/pull/1141#discussion_r1761334749 ## pyiceberg/io/pyarrow.py: ## @@ -1238,10 +1238,13 @@ def _task_to_record_batches( for batch in batches: next_index = next_index + len(batc

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-09-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1761333739 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -421,7 +433,7 @@ private ManifestFile filterManifestWithDeletedFiles(

Re: [I] DOCS: Report CSS and styling issues on the new site. [iceberg]

2024-09-16 Thread via GitHub
dyfrgi commented on issue #9643: URL: https://github.com/apache/iceberg/issues/9643#issuecomment-2353121293 Fairly recently, the table of contents on the spec page https://iceberg.apache.org/spec/ stopped scrolling. This happens in both Chrome and Firefox. It now cuts off at "File System Op

  1   2   >