Re: [PR] Add clang format [iceberg-cpp]
zhjwpku commented on code in PR #4: URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1879216891 ## .clang-format: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +BasedOnStyle: Google +ColumnLimit: 90 +DerivePointerAlignment: false +IncludeBlocks: Preserve Review Comment: I plan to make the following changes after we reached a consensus. ``` -IncludeBlocks: Preserve +IncludeBlocks: Regroup +IncludeCategories: +- Regex: '^<.*/' # Like + Priority: 3 +- Regex: '^<.*\.' # Like + Priority: 1 +- Regex: '^<' # Like + Priority: 2 +- Regex: '^"' # Like "util/auth-util.h" + Priority: 4 ``` Before I push a new commit, I'd like to explain why I think this makes sense. I did a experiment, `Regroup` seems to be the default setting if we don't specify a `IncludeBlocks` setting, I put it here explicitly because I think it's more clear for the readers. With the `IncludeCategories` settings, the following: ``` #include #include #include #include "util/auth-util.h" #include #include #include "demo_table.h" ``` goes to: ``` #include "demo_table.h" #include #include #include #include #include #include "util/auth-util.h" ``` It's slightly different from Impala's setting because I think putting the `Regex` as the first line of each rule is more readable, "If match this regex, then goes priority xxx". What do you think about this proposal? @gaborkaszab @pitrou @Fokko @wgtmac @raulcd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
pan3793 commented on code in PR #11731: URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879240394 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,11 @@ public void close() { accessor.close(); } + // If a column vector is writable or constant, it should override this method and do nothing. + // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4) + @Override + public void closeIfNotWritable() {} Review Comment: will rename to `closeIfFreeable` after RC2 available -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark3.4,3.5: Fix the BUG of iceberg views when resolved "group/order… [iceberg]
Ppei-Wang commented on code in PR #11729: URL: https://github.com/apache/iceberg/pull/11729#discussion_r1879246100 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws NoSuchTableException { assertThat(sql("SELECT * FROM %s", viewName)).hasSize(1).containsExactly(row(10, 1L)); } + @TestTemplate + public void readFromViewWithGroupByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("viewWithGroupByOrdinal"); +String sql = String.format("SELECT id FROM %s group by 1", tableName); + +ViewCatalog viewCatalog = viewCatalog(); + +viewCatalog +.buildView(TableIdentifier.of(NAMESPACE, viewName)) +.withQuery("spark", sql) +.withDefaultNamespace(NAMESPACE) +.withDefaultCatalog(catalogName) +.withSchema(schema(sql)) +.create(); + +List expected = +IntStream.rangeClosed(1, 10).mapToObj(this::row).collect(Collectors.toList()); + +assertThat(sql("SELECT * FROM %s", viewName)) +.hasSize(10) +.containsExactlyInAnyOrderElementsOf(expected); + } + + @TestTemplate + public void readFromViewWithOrderByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("viewWithOrderByOrdinal"); +String sql = String.format("SELECT id FROM %s order by 1", tableName); + +ViewCatalog viewCatalog = viewCatalog(); + +viewCatalog +.buildView(TableIdentifier.of(NAMESPACE, viewName)) +.withQuery("spark", sql) +.withDefaultNamespace(NAMESPACE) +.withDefaultCatalog(catalogName) +.withSchema(schema(sql)) +.create(); + +List expected = +IntStream.rangeClosed(1, 10).mapToObj(this::row).collect(Collectors.toList()); + +assertThat(sql("SELECT * FROM %s", viewName)) +.hasSize(10) +.containsExactlyInAnyOrderElementsOf(expected); + } + + @TestTemplate + public void createViewWithGroupByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("createViewWithGroupByOrdinal"); +sql("CREATE VIEW %s AS SELECT id FROM %s GROUP BY 1", viewName, tableName); +List expected = +IntStream.rangeClosed(1, 10).mapToObj(this::row).collect(Collectors.toList()); + +assertThat(sql("SELECT * FROM %s", viewName)) +.hasSize(10) +.containsExactlyInAnyOrderElementsOf(expected); Review Comment: @ebyhr Very good advice, modified. And Could you please help approve workflow runs?thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add s3tables catalog support [iceberg-rust]
liurenjie1024 commented on issue #754: URL: https://github.com/apache/iceberg-rust/issues/754#issuecomment-2533507061 > please assign this to me if it's not a feature that needs to be released soon. I can work on it part-time, but I cannot guarantee the completion time Done, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add ignore-invalid-options to RewriteDataFilesSparkAction and RewritePositionDeleteFilesSparkAction [iceberg]
huaxingao commented on code in PR #11737: URL: https://github.com/apache/iceberg/pull/11737#discussion_r1879274285 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1214,6 +1214,15 @@ public void testInvalidOptions() { .hasMessageContaining("requires enabling Iceberg Spark session extensions"); } + @TestTemplate + public void testIgnoreInvalidOptions() { +Table table = createTable(20); +basicRewrite(table) +.option("ignore-invalid-options", "true") +.option("foobarity", "-5") +.execute(); Review Comment: Should we also test the default behavior of `ignore-invalid-options`, similar to what you did in `TestRewritePositionDeleteFilesAction.testIgnoreInvalidOptions`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
pan3793 commented on code in PR #11731: URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879272199 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,11 @@ public void close() { accessor.close(); } + // If a column vector is writable or constant, it should override this method and do nothing. + // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4) + @Override + public void closeIfNotWritable() {} Review Comment: I checked the codebase, other vectors overwrite `void close() {}`, so they are not affected -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Partition Spec Evolution API [iceberg-rust]
liurenjie1024 commented on issue #732: URL: https://github.com/apache/iceberg-rust/issues/732#issuecomment-2533556499 Thanks for @c-thiel for raising this. I agree that we should focus on v2 first, and add v1 support later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
singhpk234 commented on code in PR #11731: URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879265358 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,11 @@ public void close() { accessor.close(); } + // If a column vector is writable or constant, it should override this method and do nothing. + // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4) + @Override + public void closeIfNotWritable() {} Review Comment: > If a column vector is writable or constant, it should override this method and do nothing. we have ConstantColumnVector as well, should we do it vector classes which absolutely requires it ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
pan3793 commented on code in PR #11731: URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879272199 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java: ## @@ -59,6 +59,11 @@ public void close() { accessor.close(); } + // If a column vector is writable or constant, it should override this method and do nothing. + // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4) + @Override + public void closeIfNotWritable() {} Review Comment: I checked the codebase, other vectors overwrite `void close() {}`, so they won't be affected -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add ignore-invalid-options to RewriteDataFilesSparkAction and RewritePositionDeleteFilesSparkAction [iceberg]
manuzhang commented on code in PR #11737: URL: https://github.com/apache/iceberg/pull/11737#discussion_r1879291362 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -1214,6 +1214,15 @@ public void testInvalidOptions() { .hasMessageContaining("requires enabling Iceberg Spark session extensions"); } + @TestTemplate + public void testIgnoreInvalidOptions() { +Table table = createTable(20); +basicRewrite(table) +.option("ignore-invalid-options", "true") +.option("foobarity", "-5") +.execute(); Review Comment: It's covered by `testInvalidOptions()` above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
pan3793 commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2533597688 FYI @LuciferYang @viirya @dongjoon-hyun @wForget CI is green now, and I think the current Spark `branch-3.5` is in good shape for Iceberg. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix failure when reading files table with branch [iceberg]
dramaticlly commented on PR #11719: URL: https://github.com/apache/iceberg/pull/11719#issuecomment-2533650691 > This makes sense to me. How about other metadata tables? @dramaticlly do you also want take a look? Happy to help with other metadata tables experienced similar problem if it's not addressed in this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Use apache/spark image in the quickstart [iceberg]
ajantha-bhat opened a new issue, #11746: URL: https://github.com/apache/iceberg/issues/11746 ### Feature Request / Improvement more information https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4 ### Query engine None ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [ ] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Use apache/spark image in the quickstart [iceberg]
ajantha-bhat commented on issue #11746: URL: https://github.com/apache/iceberg/issues/11746#issuecomment-2533652976 Will keep it open if anyone wants to take it up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Bug in `PartialEq` for `Struct` [iceberg-rust]
sungwy commented on issue #706: URL: https://github.com/apache/iceberg-rust/issues/706#issuecomment-2531897325 Hi @Sl1mb0 - Thank you for raising this issue! I took a look at this issue in order to resolve all issues blocking the 0.4.0 release. I put up the PR #772 to replicate your issue, and I was able to get the exact error using the code you shared above. I did notice that you had an unpartitioned PartitionSpec, which doesn't align with the partition value you are inserting as you are expecting to have one partition value persisted into your DataFile entry. In order to preserve that value, you will need to use a PartitionSpec that aligns with your values. Please take a look at this PR to see that the partition value is being recorded properly when the PartitionSpec has a field: https://github.com/apache/iceberg-rust/pull/772/files#diff-8389535350ef7cddc266dfd18d589a978643da0334c23e16646e62e8d6a0892eR2115 ``` // Create a partition spec with a single partition field. let partition_spec = BoundPartitionSpec::builder(schema.clone()) .with_spec_id(0) .add_partition_field("field_0", "field_0", Transform::Identity) .unwrap() .build() .unwrap(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]
gaborkaszab commented on code in PR #11738: URL: https://github.com/apache/iceberg/pull/11738#discussion_r1878337162 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -144,14 +144,16 @@ public Table loadTable(TableIdentifier ident) { return cached; } -if (MetadataTableUtils.hasMetadataTableName(canonicalized)) { +Table table = tableCache.get(canonicalized, catalog::loadTable); + +if (table instanceof BaseMetadataTable) { + // Cache underlying table TableIdentifier originTableIdentifier = TableIdentifier.of(canonicalized.namespace().levels()); Table originTable = tableCache.get(originTableIdentifier, catalog::loadTable); - // share TableOperations instance of origin table for all metadata tables, so that metadata - // table instances are - // also refreshed as well when origin table instance is refreshed. + // Share TableOperations instance of origin table for all metadata tables, so that metadata Review Comment: It's needed to share the underlying TableOperations between the originTable and the metadata table. We basically create another Table object for the table metadata with a different TableOperations and then update the cache with that object. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]
amogh-jahagirdar commented on code in PR #11565: URL: https://github.com/apache/iceberg/pull/11565#discussion_r1878222555 ## gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java: ## @@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream blobIdsToDelete) { Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), gcpProperties.deleteBatchSize())) .forEach(batch -> client().delete(batch)); } + + @Override + public boolean recoverFile(String path) { +Preconditions.checkArgument( +!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be null or empty"); + +try { + BlobId blobId = BlobId.fromGsUtilUri(path); + + // first attempt to restore with soft-delete + if (recoverSoftDeletedObject(blobId)) { +return true; + } + + // fallback to restoring by copying the latest version + if (recoverLatestVersion(blobId)) { +return true; + } + +} catch (IllegalArgumentException e) { + LOG.warn("Invalid GCS path format: {}", path, e); +} Review Comment: Actually just noticed this, not sure it makes sense to swallow the IllegalArgumentException in case the path format is incorrect? I consider this just like the null or empty case where `recoverFile` can just throw. The best effort really is for the calls to attempt recovery, not the other validation imo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]
amogh-jahagirdar commented on code in PR #11565: URL: https://github.com/apache/iceberg/pull/11565#discussion_r1878222555 ## gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java: ## @@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream blobIdsToDelete) { Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), gcpProperties.deleteBatchSize())) .forEach(batch -> client().delete(batch)); } + + @Override + public boolean recoverFile(String path) { +Preconditions.checkArgument( +!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be null or empty"); + +try { + BlobId blobId = BlobId.fromGsUtilUri(path); + + // first attempt to restore with soft-delete + if (recoverSoftDeletedObject(blobId)) { +return true; + } + + // fallback to restoring by copying the latest version + if (recoverLatestVersion(blobId)) { +return true; + } + +} catch (IllegalArgumentException e) { + LOG.warn("Invalid GCS path format: {}", path, e); +} Review Comment: Actually just noticed this, do we really need the explicit catch for IllegalArgumentException in case the path format is incorrect? I consider this just like the null or empty case where `recoverFile` can just throw. The best effort really is for the calls to attempt recovery, not the other validation imo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]
raulcd commented on code in PR #7: URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1877940138 ## .github/workflows/test.yml: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: Test + +on: + push: +branches: + - '**' + - '!dependabot/**' +tags: + - '**' + pull_request: + +concurrency: + group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} + cancel-in-progress: true + +permissions: + contents: read + +env: + ICEBERG_HOME: /tmp/iceberg + +jobs: + ubuntu: +name: AMD64 Ubuntu latest +runs-on: ubuntu-latest Review Comment: I am in two minds with this one but I think I prefer to have `latest` so if a new OS (GitHub runner) is released we find early that there are issues with it. The other approach might leave us with silent failures on newer systems if we don't update them periodically but both points are valid, I am ok with both approaches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] DataScan doesn't respect casesensitive argument [iceberg-python]
jiakai-li opened a new issue, #1421: URL: https://github.com/apache/iceberg-python/issues/1421 ### Apache Iceberg version None ### Please describe the bug 🐞 When I was learning the code, I noticed currently the `DataScan` class doesn't respect `case_sensitive` when it plans files. Here is [reference](https://github.com/apache/iceberg-python/blob/ede363bd63640f3db6c3b3d6904a5765dafd2600/pyiceberg/table/__init__.py#L1314) This means we can't do this: ```python # case_sensitive is not taking effect data_scan = table.scan(row_filter=row_filter, case_sensitive=False) ``` ### Willingness to contribute - [X] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
viirya commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532346946 Please check for some discussion there https://github.com/apache/spark/pull/49131#issuecomment-2532341673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] DataScan doesn't respect casesensitive argument [iceberg-python]
jiakai-li commented on issue #1421: URL: https://github.com/apache/iceberg-python/issues/1421#issuecomment-2532341957 Can I work on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]
raulcd commented on code in PR #7: URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878524143 ## .github/workflows/test.yml: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: Test + +on: + push: +branches: + - '**' + - '!dependabot/**' +tags: + - '**' + pull_request: + +concurrency: + group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} + cancel-in-progress: true + +permissions: + contents: read + +env: + ICEBERG_HOME: /tmp/iceberg + +jobs: + ubuntu: +name: AMD64 Ubuntu latest +runs-on: ubuntu-latest Review Comment: `ubuntu-latest` is about to be migrated to `24.04` and it was migrated to `22.04` from `20.04` around 2 years ago, see: https://github.com/actions/runner-images/commit/9c544f47011f3fb7f13ab122927f15939d950b49 `macos-latest` has been having a higher rotation, around once a year (it went from macos-11, to macos-12 and then to macos-14) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]
wmoustafa commented on code in PR #11365: URL: https://github.com/apache/iceberg/pull/11365#discussion_r1878540920 ## format/view-spec.md: ## @@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view version. Common metad View definitions can be represented in multiple ways. Representations are documented ways to express a view definition. -A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. +A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. For the table identifiers in the view definitions to be interoperable across engines, the following must be true: +* All engines must resolve a fully specified SQL identifier to the same table in the same catalog. Review Comment: > then we should not change the resolution of table references. Changing the table reference resolution would also be a breaking change to the view spec. @JanKaul Could you clarify what you meant by this, preferably through an example? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Core, Flink, Spark, KafkaConnect: Remove remaining usage of deprecated path API [iceberg]
amogh-jahagirdar opened a new pull request, #11744: URL: https://github.com/apache/iceberg/pull/11744 Same as https://github.com/apache/iceberg/pull/11563, removing the remaining usages of the path API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]
pvary merged PR #11662: URL: https://github.com/apache/iceberg/pull/11662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]
pvary commented on PR #11662: URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2532940822 Thanks @Guosmilesmile for the PR, and @mxm and @stevenzwu for the review! @Guosmilesmile please prepare the backport commits to the other Flink versions. Thanks, Peter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add scan planning apis to REST Catalog [iceberg]
rahil-c commented on PR #11180: URL: https://github.com/apache/iceberg/pull/11180#issuecomment-2532789316 @rdblue @danielcweeks Was wondering if you guys can take a look whenever you get a chance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]
dramaticlly commented on PR #11597: URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2533061166 @rdblue @danielcweeks can you help take anther look to see if we can move forward on this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark3.4,3.5: Fix the BUG of iceberg views when resolved "group/order… [iceberg]
ebyhr commented on code in PR #11729: URL: https://github.com/apache/iceberg/pull/11729#discussion_r1879148802 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws NoSuchTableException { assertThat(sql("SELECT * FROM %s", viewName)).hasSize(1).containsExactly(row(10, 1L)); } + @TestTemplate + public void readFromViewWithGroupByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("viewWithGroupByOrdinal"); +String sql = String.format("SELECT id FROM %s group by 1", tableName); + +ViewCatalog viewCatalog = viewCatalog(); + +viewCatalog +.buildView(TableIdentifier.of(NAMESPACE, viewName)) +.withQuery("spark", sql) +.withDefaultNamespace(NAMESPACE) +.withDefaultCatalog(catalogName) +.withSchema(schema(sql)) +.create(); + +List expected = +IntStream.rangeClosed(1, 10).mapToObj(this::row).collect(Collectors.toList()); + +assertThat(sql("SELECT * FROM %s", viewName)) +.hasSize(10) +.containsExactlyInAnyOrderElementsOf(expected); + } + + @TestTemplate + public void readFromViewWithOrderByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("viewWithOrderByOrdinal"); +String sql = String.format("SELECT id FROM %s order by 1", tableName); Review Comment: nit: Uppercase `order by`. ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws NoSuchTableException { assertThat(sql("SELECT * FROM %s", viewName)).hasSize(1).containsExactly(row(10, 1L)); } + @TestTemplate + public void readFromViewWithGroupByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("viewWithGroupByOrdinal"); +String sql = String.format("SELECT id FROM %s group by 1", tableName); Review Comment: nit: Uppercase `group by`. ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws NoSuchTableException { assertThat(sql("SELECT * FROM %s", viewName)).hasSize(1).containsExactly(row(10, 1L)); } + @TestTemplate + public void readFromViewWithGroupByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("viewWithGroupByOrdinal"); +String sql = String.format("SELECT id FROM %s group by 1", tableName); + +ViewCatalog viewCatalog = viewCatalog(); + +viewCatalog +.buildView(TableIdentifier.of(NAMESPACE, viewName)) +.withQuery("spark", sql) +.withDefaultNamespace(NAMESPACE) +.withDefaultCatalog(catalogName) +.withSchema(schema(sql)) +.create(); + +List expected = +IntStream.rangeClosed(1, 10).mapToObj(this::row).collect(Collectors.toList()); + +assertThat(sql("SELECT * FROM %s", viewName)) +.hasSize(10) +.containsExactlyInAnyOrderElementsOf(expected); + } + + @TestTemplate + public void readFromViewWithOrderByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("viewWithOrderByOrdinal"); +String sql = String.format("SELECT id FROM %s order by 1", tableName); + +ViewCatalog viewCatalog = viewCatalog(); + +viewCatalog +.buildView(TableIdentifier.of(NAMESPACE, viewName)) +.withQuery("spark", sql) +.withDefaultNamespace(NAMESPACE) +.withDefaultCatalog(catalogName) +.withSchema(schema(sql)) +.create(); + +List expected = +IntStream.rangeClosed(1, 10).mapToObj(this::row).collect(Collectors.toList()); + +assertThat(sql("SELECT * FROM %s", viewName)) +.hasSize(10) +.containsExactlyInAnyOrderElementsOf(expected); + } + + @TestTemplate + public void createViewWithGroupByOrdinal() throws NoSuchTableException { +insertRows(10); +String viewName = viewName("createViewWithGroupByOrdinal"); +sql("CREATE VIEW %s AS SELECT id FROM %s GROUP BY 1", viewName, tableName); +List expected = +IntStream.rangeClosed(1, 10).mapToObj(this::row).collect(Collectors.toList()); + +assertThat(sql("SELECT * FROM %s", viewName)) +.hasSize(10) +.containsExactlyInAnyOrderElementsOf(expected); Review Comment: This test doesn't ensure whether `GROUP BY` is effective or not because it specifies id column. Ideally, it should specify non-id column and verify the duplicate value is eliminated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this serv
Re: [I] Rest Catalog: spark catalog api fails to work with rest based catalog [iceberg]
dramaticlly commented on issue #11741: URL: https://github.com/apache/iceberg/issues/11741#issuecomment-2533365254 I think use this we can reproduce the problem in a unit test from `org.apache.iceberg.spark.sql.TestCreateTable` as it support all of hive/spark/hadoop/rest catalog, I think rest and hadoop catalog will fail while hive and spark will pass ```java @TestTemplate public void testCreateTable() { assumeThat(catalogName).isEqualTo(SparkCatalogConfig.REST.catalogName()); assertThat(validationCatalog.tableExists(tableIdent)) .as("Table should not already exist") .isFalse(); sql("CREATE TABLE %s (id BIGINT NOT NULL, data STRING) USING iceberg", tableName); Table table = validationCatalog.loadTable(tableIdent); assertThat(table).as("Should load the new table").isNotNull(); StructType expectedSchema = StructType.of( NestedField.required(1, "id", Types.LongType.get()), NestedField.optional(2, "data", Types.StringType.get())); assertThat(table.schema().asStruct()) .as("Should have the expected schema") .isEqualTo(expectedSchema); assertThat(table.spec().fields()).as("Should not be partitioned").hasSize(0); assertThat(table.properties().get(TableProperties.DEFAULT_FILE_FORMAT)) .as("Should not have the default format set") .isNull(); spark.sessionState().catalogManager().setCurrentCatalog(catalogName); assertThat(spark.catalog().tableExists(tableIdent.toString())).isTrue(); // success assertThat(spark.catalog().tableExists(tableIdent.namespace().toString(), tableIdent.name())).isTrue(); //failure } ``` I am wondering if anyone run into such when using REST base catalog? CC @RussellSpitzer @flyrain -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]
danielcweeks commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1879074770 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,43 @@ private void validateTableIsIcebergTableOrView( } } + /** + * Check whether table or metadata table exists. + * + * Note: If a hive table with the same identifier exists in catalog, this method will return + * {@code false}. + * + * @param identifier a table identifier + * @return true if the table exists, false otherwise + */ + @Override + public boolean tableExists(TableIdentifier identifier) { +TableIdentifier baseTableIdentifier = identifier; +if (!isValidIdentifier(identifier)) { + if (!isValidMetadataIdentifier(identifier)) { +return false; Review Comment: Minor behavior issue here. Currently if it's an invalid identifier, it will throw as opposed to returning false. ``` throw new NoSuchTableException("Invalid table identifier: %s", identifier); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]
danielcweeks commented on PR #11597: URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2533223559 @dramaticlly minor comment, but other than that LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]
dramaticlly commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1879086639 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,43 @@ private void validateTableIsIcebergTableOrView( } } + /** + * Check whether table or metadata table exists. + * + * Note: If a hive table with the same identifier exists in catalog, this method will return + * {@code false}. + * + * @param identifier a table identifier + * @return true if the table exists, false otherwise + */ + @Override + public boolean tableExists(TableIdentifier identifier) { +TableIdentifier baseTableIdentifier = identifier; +if (!isValidIdentifier(identifier)) { + if (!isValidMetadataIdentifier(identifier)) { +return false; Review Comment: thank you @danielcweeks , I think you might refer to [this line in loadTable()](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L66) which throw exceptions, for the [`tableExists` API defined in catalog interface](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/catalog/Catalog.java#L283-L285) actually catch this NoSuchTableException and return false. I also added a explicit step in unit test for invalid identifier to ensure the behavior is same as before -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Support geo type [iceberg]
jiayuasu commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1879088058 ## format/spec.md: ## @@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. +## Appendix G: Geospatial Notes + +The Geometry class hierarchy and its WKT and WKB serializations (ISO supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation Specification for Geographic information – Simple feature access – Part 1: Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from [OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/). + +The version of the OGC standard first used here is 1.2.1, but future versions may also used if the WKB representation remains wire-compatible. + +Possible values for edge-interpolation algorithm (A) for `geography` types are: +* `spherical`: edges are interpolated as geodesics on a sphere. The radius of the underlying sphere is the mean radius of the spheroid defined by the CRS, defined as (2 * major_axis_length + minor_axis_length / 3). +* `vincenty`: [https://en.wikipedia.org/wiki/Vincenty%27s_formulae](https://en.wikipedia.org/wiki/Vincenty%27s_formulae) +* `andoyer-lambert-thomas` +* `andoyer-lambert` +* `karney` Review Comment: ```suggestion * `karney`: [Karney, Charles FF. "Algorithms for geodesics." Journal of Geodesy 87 (2013): 43-55](https://link.springer.com/content/pdf/10.1007/s00190-012-0578-z.pdf), and [GeographicLib](https://geographiclib.sourceforge.io/) ``` ## format/spec.md: ## @@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. +## Appendix G: Geospatial Notes + +The Geometry class hierarchy and its WKT and WKB serializations (ISO supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation Specification for Geographic information – Simple feature access – Part 1: Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from [OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/). + +The version of the OGC standard first used here is 1.2.1, but future versions may also used if the WKB representation remains wire-compatible. + +Possible values for edge-interpolation algorithm (A) for `geography` types are: +* `spherical`: edges are interpolated as geodesics on a sphere. The radius of the underlying sphere is the mean radius of the spheroid defined by the CRS, defined as (2 * major_axis_length + minor_axis_length / 3). +* `vincenty`: [https://en.wikipedia.org/wiki/Vincenty%27s_formulae](https://en.wikipedia.org/wiki/Vincenty%27s_formulae) +* `andoyer-lambert-thomas` Review Comment: ```suggestion * `andoyer-lambert-thomas`: Thomas, Paul D. Spheroidal geodesics, reference systems, & local geometry. US Naval Oceanographic Office, 1970. ``` ## format/spec.md: ## @@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the sca
Re: [PR] Spec: Support geo type [iceberg]
jiayuasu commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1879080772 ## format/spec.md: ## @@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. +## Appendix G: Geospatial Notes + +The Geometry class hierarchy and its WKT and WKB serializations (ISO supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation Specification for Geographic information – Simple feature access – Part 1: Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from [OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/). + +The version of the OGC standard first used here is 1.2.1, but future versions may also used if the WKB representation remains wire-compatible. + Review Comment: I suggest we put this back as we all agreed during the emails and meetings. This will make things much easier. E.g., readers of the geometry data can filter bbox based on min/max directly without needing to know the lon/lat order in CRS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Support geo type [iceberg]
jiayuasu commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r187907 ## format/spec.md: ## @@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. +## Appendix G: Geospatial Notes + +The Geometry class hierarchy and its WKT and WKB serializations (ISO supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation Specification for Geographic information – Simple feature access – Part 1: Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from [OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/). + +The version of the OGC standard first used here is 1.2.1, but future versions may also used if the WKB representation remains wire-compatible. + Review Comment: ```suggestion Coordinate axis order is always (x, y) where x is easting or longitude, and y is northing or latitude. This ordering explicitly overrides the axis order specified in the CRS. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Fix `Table.scan` to enable case sensitivity argument [iceberg-python]
jiakai-li opened a new pull request, #1423: URL: https://github.com/apache/iceberg-python/pull/1423 **This pull request fixes below issue:** - #1421 The change modified the `DataScan._build_partition_projection` method to pass `case_sensitive` argument when calling `inclusive_projection`. So that `Table.scan` method respects this configuration. Also added related tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support for loading different hive-metastore versions at Runtime [iceberg]
github-actions[bot] commented on issue #10401: URL: https://github.com/apache/iceberg/issues/10401#issuecomment-2533325043 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Can't add column with the same name as a deleted and previously partitioned key column [iceberg]
github-actions[bot] commented on issue #10487: URL: https://github.com/apache/iceberg/issues/10487#issuecomment-2533325230 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support for loading different hive-metastore versions at Runtime [iceberg]
github-actions[bot] closed issue #10401: Support for loading different hive-metastore versions at Runtime URL: https://github.com/apache/iceberg/issues/10401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Flink] the precondition judgment is incorrect in the FlinkSink class. [iceberg]
github-actions[bot] commented on PR #10669: URL: https://github.com/apache/iceberg/pull/10669#issuecomment-2533325348 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@iceberg.apache.org list. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] core: support support move a column with same name after rename column [iceberg]
github-actions[bot] commented on PR #10862: URL: https://github.com/apache/iceberg/pull/10862#issuecomment-2533325429 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@iceberg.apache.org list. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, Flink, Spark, KafkaConnect: Remove remaining usage of deprecated path API [iceberg]
ebyhr commented on code in PR #11744: URL: https://github.com/apache/iceberg/pull/11744#discussion_r1879109181 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/TestHelpers.java: ## @@ -602,7 +602,7 @@ public static void assertEquals(ContentFile expected, ContentFile actual) assertThat(actual).isNotNull(); assertThat(actual.specId()).as("SpecId").isEqualTo(expected.specId()); assertThat(actual.content()).as("Content").isEqualTo(expected.content()); -assertThat(actual.path()).as("Path").isEqualTo(expected.path()); +assertThat(actual.location()).as("Path").isEqualTo(expected.location()); Review Comment: ```suggestion assertThat(actual.location()).as("Location").isEqualTo(expected.location()); ``` ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestExpireSnapshotsProcedure.java: ## @@ -300,7 +300,7 @@ public void testExpireDeleteFiles() throws Exception { Assert.assertEquals("Should have 1 delete file", 1, TestHelpers.deleteFiles(table).size()); Path deleteManifestPath = new Path(TestHelpers.deleteManifests(table).iterator().next().path()); DeleteFile deleteFile = TestHelpers.deleteFiles(table).iterator().next(); -Path deleteFilePath = new Path(String.valueOf(deleteFile.path())); +Path deleteFilePath = new Path(String.valueOf(deleteFile.location())); Review Comment: Why do we need `String.valueOf` even though `deleteFile.location()` is String? ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java: ## @@ -81,7 +81,7 @@ private static String taskInfo() { * @param files a list of files to delete */ public static void deleteFiles(String context, FileIO io, List> files) { -List paths = Lists.transform(files, file -> file.path().toString()); +List paths = Lists.transform(files, file -> file.location()); Review Comment: Spark v3.5 uses method reference here. ## spark/v3.3/spark/src/test/java/org/apache/iceberg/ValidationHelpers.java: ## @@ -42,7 +42,7 @@ public static List snapshotIds(Long... ids) { } public static List files(ContentFile... files) { -return Arrays.stream(files).map(file -> file.path().toString()).collect(Collectors.toList()); +return Arrays.stream(files).map(file -> file.location()).collect(Collectors.toList()); Review Comment: nit: We could use method reference. Feel free to ignore this comment. I know Spark v3.5 uses this style. ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java: ## @@ -81,7 +81,7 @@ private static String taskInfo() { * @param files a list of files to delete */ public static void deleteFiles(String context, FileIO io, List> files) { -List paths = Lists.transform(files, file -> file.path().toString()); +List paths = Lists.transform(files, file -> file.location()); Review Comment: Spark v3.5 uses method reference here. ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -635,7 +635,7 @@ private static void importUnpartitionedSparkTable( if (checkDuplicateFiles) { Dataset importedFiles = spark -.createDataset(Lists.transform(files, f -> f.path().toString()), Encoders.STRING()) +.createDataset(Lists.transform(files, f -> f.location()), Encoders.STRING()) Review Comment: Spark v3.5 uses method reference here. ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -153,7 +153,7 @@ public boolean next() throws IOException { if (currentTask != null && !currentTask.isDataTask()) { String filePaths = referencedFiles(currentTask) -.map(file -> file.path().toString()) +.map(file -> file.location()) Review Comment: Spark v3.5 uses method reference here. ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -148,7 +148,7 @@ public boolean next() throws IOException { if (currentTask != null && !currentTask.isDataTask()) { String filePaths = referencedFiles(currentTask) -.map(file -> file.path().toString()) +.map(file -> file.location()) Review Comment: Spark v3.5 uses method reference here. Feel free to ignore this and other comments about method reference if it's intentional. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-u
Re: [PR] Add Strict projection [iceberg-python]
Fokko commented on code in PR #539: URL: https://github.com/apache/iceberg-python/pull/539#discussion_r1879010689 ## pyiceberg/transforms.py: ## @@ -766,6 +858,47 @@ def _truncate_number( return None +def _truncate_number_strict( +name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]] +) -> Optional[UnboundPredicate[Any]]: +boundary = pred.literal + +if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimestampLiteral)): +raise ValueError(f"Expected a numeric literal, got: {type(boundary)}") + +if isinstance(pred, BoundLessThan): +return LessThan(Reference(name), _transform_literal(transform, boundary)) +elif isinstance(pred, BoundLessThanOrEqual): +return LessThan(Reference(name), _transform_literal(transform, boundary.increment())) # type: ignore +elif isinstance(pred, BoundGreaterThan): +return GreaterThan(Reference(name), _transform_literal(transform, boundary)) +elif isinstance(pred, BoundGreaterThanOrEqual): +return GreaterThan(Reference(name), _transform_literal(transform, boundary.decrement())) # type: ignore Review Comment: Hey @ZENOTME Again, thanks for pointing this out. It turns out that most of these oddities are caused by a bug that was part of Iceberg `≤0.10.0`. We did not port this to PyIceberg: https://github.com/apache/iceberg/blob/ac6509a4e469f808bebb8b713a5c4213f98ff4a5/api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java#L275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Support geo type [iceberg]
szehon-ho commented on PR #10981: URL: https://github.com/apache/iceberg/pull/10981#issuecomment-2533103710 Update, there was a sync with @jiayuasu @flyrain @dmitrykoval @paleolimbot @rdblue and Menelaos, it was decided the following (meeting notes): My summary is that we decided to have two types: * geometry(crs_id) always uses linear edges but can have a geographic CRS * geography(crs_id, algorithm) always uses geodesic edges defined by a geographic CRS * A geography’s algorithm approximates the edges and must be used consistently. A spherical approximation is considered a different algorithm. * The crs_id is opaque, but could be srid: to select a specific SRID, or projjson: to select a JSON CRS in a table property * Neither Parquet nor Iceberg is responsible for providing CRS definitions, but may include them for convenience (if they can considering copyright or other legal considerations) Here are the specific points I think we decided on: * Planar/linear edges are always associated with the geometry type. Geometry should always use linear edges. * Parquet and Iceberg should have a geometry type because users already expect the linear behavior * Geometry needs to support geographic CRS * Geometry needs a CRS parameter, but not an edge parameter * Geography never uses linear edges * Geography edges are always interpreted as edges on the spheroid defined by the geographic CRS (geodesics) * An exception here, which is that if the algorithm specified is spherical, then we are talking about geodesics (great circle arcs) on a sphere. I think it is important to notice (and specify/require) that if the algorithm is spherical, then the radius of the underlying sphere is assumed/expected to be the mean radius of the spheroid specified by the CRS, where the mean radius is always defined as (2 * major_axis_length + minor_axis_length) / 3. * Geography bounding boxes must include the northmost/southmost points on edges * Geography edge calculations use a particular algorithm, which may introduce either approximation errors (for instance, Vincenty) or may simplify the problem and introduce representation errors (i.e. Spherical) * The edge calculation algorithm must be a parameter of the geography type (i.e. spherical, andoyer, vincenty, etc.) * The algorithm is set by what the writer creating the table can produce (vs having a default in the format) * Writers must not write if they cannot produce bounding boxes using the correct algorithm * Engines should reject non-geographic CRS for geography columns * we decided that coordinates should be limited to [-180, 180] and [-90, 90] for geography. updating the pr based on the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Strict projection [iceberg-python]
Fokko commented on code in PR #539: URL: https://github.com/apache/iceberg-python/pull/539#discussion_r1879028369 ## pyiceberg/transforms.py: ## @@ -766,6 +858,47 @@ def _truncate_number( return None +def _truncate_number_strict( +name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]] +) -> Optional[UnboundPredicate[Any]]: +boundary = pred.literal + +if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimestampLiteral)): +raise ValueError(f"Expected a numeric literal, got: {type(boundary)}") + +if isinstance(pred, BoundLessThan): +return LessThan(Reference(name), _transform_literal(transform, boundary)) +elif isinstance(pred, BoundLessThanOrEqual): +return LessThan(Reference(name), _transform_literal(transform, boundary.increment())) # type: ignore +elif isinstance(pred, BoundGreaterThan): +return GreaterThan(Reference(name), _transform_literal(transform, boundary)) +elif isinstance(pred, BoundGreaterThanOrEqual): +return GreaterThan(Reference(name), _transform_literal(transform, boundary.decrement())) # type: ignore Review Comment: Created a PR to refactor the tests: https://github.com/apache/iceberg-python/pull/1422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add CMake format [iceberg-cpp]
Fokko commented on code in PR #5: URL: https://github.com/apache/iceberg-cpp/pull/5#discussion_r1879041334 ## cmake-format.py: ## @@ -0,0 +1,74 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# cmake-format configuration file + +# - +# Options affecting formatting. +# - +with section("format"): +# How wide to allow formatted cmake files +line_width = 90 Review Comment: `iceberg-python` is at 130: https://github.com/apache/iceberg-python/blob/ede363bd63640f3db6c3b3d6904a5765dafd2600/pyproject.toml#L916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add clang format [iceberg-cpp]
zhjwpku commented on code in PR #4: URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1879216891 ## .clang-format: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +BasedOnStyle: Google +ColumnLimit: 90 +DerivePointerAlignment: false +IncludeBlocks: Preserve Review Comment: I plan to make the following change about the `IncludeBlocks` ``` -IncludeBlocks: Preserve +IncludeBlocks: Regroup +IncludeCategories: +- Regex: '^<.*/' # Like + Priority: 3 +- Regex: '^<.*\.' # Like + Priority: 1 +- Regex: '^<' # Like + Priority: 2 +- Regex: '^"' # Like "util/auth-util.h" + Priority: 4 ``` Before I push a new commit, I'd like to explain why I think this makes sense. I did a experiment, `Regroup` seems to be the default setting if we don't specify a `IncludeBlocks` setting, I put it here explicitly because I think it's more clear for the readers. With the `IncludeCategories` settings, the following: ``` #include #include #include #include "util/auth-util.h" #include #include #include "demo_table.h" ``` goes to: ``` #include "demo_table.h" #include #include #include #include #include #include "util/auth-util.h" ``` It's slightly different from Impala's setting because I think putting the `Regex` as the first line of each rule is more readable, "If match this regex, then goes priority xxx". What do you about this proposal? @gaborkaszab @pitrou @Fokko @wgtmac @raulcd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add clang format [iceberg-cpp]
zhjwpku commented on code in PR #4: URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1879216891 ## .clang-format: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +BasedOnStyle: Google +ColumnLimit: 90 +DerivePointerAlignment: false +IncludeBlocks: Preserve Review Comment: I plan to make the following changes after we reached a consensus. ``` -IncludeBlocks: Preserve +IncludeBlocks: Regroup +IncludeCategories: +- Regex: '^<.*/' # Like + Priority: 3 +- Regex: '^<.*\.' # Like + Priority: 1 +- Regex: '^<' # Like + Priority: 2 +- Regex: '^"' # Like "util/auth-util.h" + Priority: 4 ``` Before I push a new commit, I'd like to explain why I think this makes sense. I did a experiment, `Regroup` seems to be the default setting if we don't specify a `IncludeBlocks` setting, I put it here explicitly because I think it's more clear for the readers. With the `IncludeCategories` settings, the following: ``` #include #include #include #include "util/auth-util.h" #include #include #include "demo_table.h" ``` goes to: ``` #include "demo_table.h" #include #include #include #include #include #include "util/auth-util.h" ``` It's slightly different from Impala's setting because I think putting the `Regex` as the first line of each rule is more readable, "If match this regex, then goes priority xxx". What do you about this proposal? @gaborkaszab @pitrou @Fokko @wgtmac @raulcd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
pan3793 commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2533468089 @viirya @LuciferYang thank you for the guidance, let me try -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] AWS: Enable RetryMode for AWS KMS client [iceberg]
amogh-jahagirdar merged PR #11420: URL: https://github.com/apache/iceberg/pull/11420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] AWS: Enable Adaptive Retries for AWS KMS client [iceberg]
hsiang-c commented on PR #11420: URL: https://github.com/apache/iceberg/pull/11420#issuecomment-2533470077 Thank you @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] doc: add RisingWave to users [iceberg-rust]
xxchan opened a new pull request, #775: URL: https://github.com/apache/iceberg-rust/pull/775 Signed-off-by: xxchan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Comment and assertion mismatch in PartitionedWritesTestBase [iceberg]
wzx140 opened a new issue, #11747: URL: https://github.com/apache/iceberg/issues/11747 ### Apache Iceberg version main (development) ### Query engine Spark ### Please describe the bug 🐞 There is a mismatch between the test comment and the assertion in the following unit test, which might lead to confusion or logical errors. Here is the relevant code snippet: ```java @TestTemplate public void testInsertAppend() { assertThat(scalarSql("SELECT count(*) FROM %s", selectTarget())) .as("Should have 5 rows after insert") .isEqualTo(3L); ``` - The comment "Should have 5 rows after insert" indicates the expectation of 5 rows in the table. - However, the actual assertion isEqualTo(3L) checks for 3 rows instead. ### Willingness to contribute - [X] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Table Scan Delete File Handling: Positional and Equality Delete Support [iceberg-rust]
sdd commented on code in PR #652: URL: https://github.com/apache/iceberg-rust/pull/652#discussion_r1879516829 ## crates/iceberg/src/scan.rs: ## @@ -951,6 +1077,82 @@ impl FileScanTask { } } +type DeleteFileManagerResult = Result>>>; + +/// Manages async retrieval of all the delete files from FileIO that are +/// applicable to the scan. Provides references to them for inclusion within FileScanTasks +#[derive(Debug, Clone)] +struct DeleteFileManager { Review Comment: Refactored to a separate module and rebased back on latest `main` as it was getting a bit stale. I'll be working to update this PR to bring it closer to `DeleteFileIndex`, ideally in a way that allows me to split this into smaller PRs as well ## crates/iceberg/src/spec/delete_file.rs: ## @@ -0,0 +1,780 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use std::collections::HashMap; + +use arrow_array::{Int64Array, RecordBatch, StringArray}; +use arrow_schema::SchemaRef; +use futures::StreamExt; +use parquet::arrow::PARQUET_FIELD_ID_META_KEY; + +use crate::scan::ArrowRecordBatchStream; +use crate::{Error, ErrorKind, Result}; + +pub(crate) const FIELD_ID_DELETE_FILE_PATH: i32 = i32::MAX - 101; +pub(crate) const FIELD_ID_DELETE_POS: i32 = i32::MAX - 102; + +// Represents a parsed Delete file that can be safely stored +// in the Object Cache. +pub(crate) enum Deletes { +// Positional delete files are parsed into a map of +// filename to a sorted list of row indices. +// TODO: Ignoring the stored rows that are present in +// positional deletes for now. I think they only used for statistics? +Positional(HashMap>), + +// Equality delete files are initially parsed solely as an +// unprocessed list of `RecordBatch`es from the equality +// delete files. +// I don't think we can do better than this by +// storing a Predicate (because the equality deletes use the +// field_id rather than the field name, so if we keep this as +// a Predicate then a field name change would break it). +// Similarly, I don't think we can store this as a BoundPredicate +// as the column order could be different across different data +// files and so the accessor in the bound predicate could be invalid). +Equality(Vec), +} + +enum PosDelSchema { +WithRow, +WithoutRow, +} + +fn validate_schema(schema: SchemaRef) -> Result { +let fields = schema.flattened_fields(); +match fields.len() { +2 | 3 => { +let Some(path_field_id_raw) = fields[0].metadata().get(PARQUET_FIELD_ID_META_KEY) +else { +return Err(Error::new( +ErrorKind::DataInvalid, +"Positional Delete file did not have the expected schema (missing field ID for field 0)", +)); +}; + +let Some(pos_field_id_raw) = fields[1].metadata().get(PARQUET_FIELD_ID_META_KEY) else { +return Err(Error::new( +ErrorKind::DataInvalid, +"Positional Delete file did not have the expected schema (missing field ID for field 1)", +)); +}; + +let Ok(path_field_id_val) = path_field_id_raw.parse::() else { +return Err(Error::new( +ErrorKind::DataInvalid, +"Positional Delete file did not have the expected schema (field ID 0 not parseable as an int)", +)); +}; + +let Ok(pos_field_id_val) = pos_field_id_raw.parse::() else { +return Err(Error::new( +ErrorKind::DataInvalid, +"Positional Delete file did not have the expected schema (field ID 1 not parseable as an int)", +)); +}; + +if path_field_id_val != FIELD_ID_DELETE_FILE_PATH +|| pos_field_id_val != FIELD_ID_DELETE_POS +{ +Err(Error::new( +ErrorKind::DataInvalid, +"Positional Delete file did not have the expected schema (unexpected field ID)", +)) +} else if fields.len() == 2 { +
Re: [PR] WIP proposal for fixing issue #11742 [iceberg]
jkolash commented on PR #11743: URL: https://github.com/apache/iceberg/pull/11743#issuecomment-2532509896 Closing this as I now see I can just set the spark.sql.catalog.default_iceberg property will add more detail to the issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]
jkolash closed issue #11742: org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog URL: https://github.com/apache/iceberg/issues/11742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]
JanKaul commented on code in PR #11365: URL: https://github.com/apache/iceberg/pull/11365#discussion_r1878651163 ## format/view-spec.md: ## @@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view version. Common metad View definitions can be represented in multiple ways. Representations are documented ways to express a view definition. -A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. +A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. For the table identifiers in the view definitions to be interoperable across engines, the following must be true: +* All engines must resolve a fully specified SQL identifier to the same table in the same catalog. Review Comment: I would assume the resolution of the identifiers to work something like this: 0. Get list of available catalogs 1. Set the table-name equal to the last element. 2. Check if the second to last element exists. - If it does, set the namespace equal to it. - If it doesn't, use the default-namespace 3. Check if the next element in reverse direction exists - if the element is equal to an available catalog, set catalog-name to it - if the element exists and is not an available catalog, prepend it to the namespace and continue with step 3. - if the element doesn't exist, use the default-catalog 4. Check if the resolved catalog, namespace, and table exist The important thing is that if the namespace or catalog are not present, the default-namespace and default-catalog from the view version are used. The resolution currently doesn't fallback to use the view catalog. If we would now opt to use the view catalog instead of the default-catalog if the catalog is not present, the behavior would be different compared to the current view spec. This is what I mean with breaking change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, Flink, Spark: Drop deprecated APIs scheduled for removal in 1.8.0 [iceberg]
nastra merged PR #11721: URL: https://github.com/apache/iceberg/pull/11721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Make fiel-id of name-mapping optional [iceberg-python]
barronw commented on issue #1420: URL: https://github.com/apache/iceberg-python/issues/1420#issuecomment-2531604338 Can I pick this up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] REST: AuthManager API [iceberg]
adutra commented on code in PR #10753: URL: https://github.com/apache/iceberg/pull/10753#discussion_r1878163433 ## core/src/main/java/org/apache/iceberg/rest/auth/AuthSession.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest.auth; + +import org.apache.iceberg.rest.HTTPRequest; + +/** + * An authentication session that can be used to authenticate outgoing HTTP requests. + * + * Authentication sessions are usually immutable, but may hold resources that need to be released + * when the session is no longer needed. Implementations should override {@link #close()} to release + * any resources. + */ +public interface AuthSession extends AutoCloseable { Review Comment: Do you mind if we do the rename in an immediate follow-up PR? I am a bit concerned that as soon as I rename to `SupportsAuthentication`, someone else would chime in and suggest something different. Naming is always a bit controversial :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]
aihuaxu commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1878440484 ## core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.variants; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.util.Map; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.util.Pair; +import org.apache.iceberg.util.SortedMerge; + +/** + * A variant Object that handles full or partial shredding. + * + * Metadata stored for an object must be the same regardless of whether the object is shredded. + * This class assumes that the metadata from the unshredded object can be used for the shredded + * fields. This also does not allow updating or replacing the metadata for the unshredded object, + * which could require recursively rewriting field IDs. + */ +class ShreddedObject implements VariantObject { Review Comment: Not an issue. I was thinking that we can review the original change without shredding faster so we can merge quickly. I haven't reviewed this ShreddedObject yet. Let me check out then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]
aihuaxu commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1878469751 ## core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java: ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.variants; + +import java.math.BigDecimal; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.charset.StandardCharsets; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.variants.Variants.Primitives; + +class PrimitiveWrapper implements VariantPrimitive { + private static final byte NULL_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_NULL); + private static final byte TRUE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_TRUE); + private static final byte FALSE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_FALSE); + private static final byte INT8_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT8); + private static final byte INT16_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT16); + private static final byte INT32_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT32); + private static final byte INT64_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT64); + private static final byte FLOAT_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_FLOAT); + private static final byte DOUBLE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DOUBLE); + private static final byte DATE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DATE); + private static final byte TIMESTAMPTZ_HEADER = + VariantUtil.primitiveHeader(Primitives.TYPE_TIMESTAMPTZ); + private static final byte TIMESTAMPNTZ_HEADER = + VariantUtil.primitiveHeader(Primitives.TYPE_TIMESTAMPNTZ); + private static final byte DECIMAL4_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL4); + private static final byte DECIMAL8_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL8); + private static final byte DECIMAL16_HEADER = + VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL16); + private static final byte BINARY_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_BINARY); + private static final byte STRING_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_STRING); + + private final Variants.PhysicalType type; + private final T value; + private ByteBuffer buffer = null; + + PrimitiveWrapper(Variants.PhysicalType type, T value) { +this.type = type; +this.value = value; + } + + @Override + public Variants.PhysicalType type() { +return type; + } + + @Override + public T get() { +return value; + } + + @Override + public int sizeInBytes() { +switch (type()) { + case NULL: + case BOOLEAN_TRUE: + case BOOLEAN_FALSE: +return 1; // 1 header only + case INT8: +return 2; // 1 header + 1 value + case INT16: +return 3; // 1 header + 2 value + case INT32: + case DATE: + case FLOAT: +return 5; // 1 header + 4 value + case INT64: + case DOUBLE: + case TIMESTAMPTZ: + case TIMESTAMPNTZ: +return 9; // 1 header + 8 value + case DECIMAL4: +return 6; // 1 header + 1 scale + 4 unscaled value + case DECIMAL8: +return 10; // 1 header + 1 scale + 8 unscaled value + case DECIMAL16: +return 18; // 1 header + 1 scale + 16 unscaled value + case BINARY: +return 5 + ((ByteBuffer) value).remaining(); // 1 header + 4 length + value length Review Comment: Let's say we have a PrimitiveWrapper of a binary `b = new PrimitiveWrapper( binary, 0x1234)`, would we always expect b.sizeInBytes() always to be 5 + 2 and we shouldn't change that for this method contract? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To
Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]
amogh-jahagirdar commented on code in PR #11565: URL: https://github.com/apache/iceberg/pull/11565#discussion_r1878209991 ## gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java: ## @@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream blobIdsToDelete) { Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), gcpProperties.deleteBatchSize())) .forEach(batch -> client().delete(batch)); } + + @Override + public boolean recoverFile(String path) { +Preconditions.checkArgument( +!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be null or empty"); + +try { + BlobId blobId = BlobId.fromGsUtilUri(path); + + // first attempt to restore with soft-delete + if (recoverSoftDeletedObject(blobId)) { +return true; + } + + // fallback to restoring by copying the latest version + if (recoverLatestVersion(blobId)) { +return true; + } + +} catch (IllegalArgumentException e) { + LOG.warn("Invalid GCS path format: {}", path, e); +} + +return false; + } + + /** + * Attempts to restore a soft-deleted object. + * + * Requires {@code storage.objects.restore} permission + * + * See https://cloud.google.com/storage/docs/use-soft-deleted-objects#restore";>docs + * + * @param blobId the blob identifier + * @return {@code true} if blob was recovered, {@code false} if not + */ + protected boolean recoverSoftDeletedObject(BlobId blobId) { +try { + Optional latestSoftDeletedBlob = + client() + .list( + blobId.getBucket(), + Storage.BlobListOption.prefix(blobId.getName()), + Storage.BlobListOption.softDeleted(true)) + .streamAll() + .filter(blob -> blob.getName().equals(blobId.getName())) + .max(Comparator.comparing(Blob::getSoftDeleteTime)); + + if (latestSoftDeletedBlob.isPresent()) { +client().restore(latestSoftDeletedBlob.get().getBlobId()); +LOG.info("Soft delete object restored file {}", blobId); +return true; + } + LOG.warn("No soft deleted object was found"); + +} catch (StorageException e) { + LOG.warn("Failed to restore", e); +} + +return false; + } + + /** + * Attempts to restore the latest deleted object version. + * + * See https://cloud.google.com/storage/docs/using-versioned-objects#restore";>docs + * + * @param blobId the blob identifier + * @return {@code true} if blob was recovered, {@code false} if not + */ + protected boolean recoverLatestVersion(BlobId blobId) { +try { + Optional latestDeletedVersion = + client() + .list( + blobId.getBucket(), + Storage.BlobListOption.prefix(blobId.getName()), + Storage.BlobListOption.versions(true)) + .streamAll() + .filter(blob -> blob.getName().equals(blobId.getName())) + .max(Comparator.comparing(Blob::getUpdateTimeOffsetDateTime)) + .filter(blob -> blob.getDeleteTimeOffsetDateTime() != null); + + if (latestDeletedVersion.isPresent()) { +Storage.CopyRequest copyRequest = +Storage.CopyRequest.newBuilder() +.setSource(latestDeletedVersion.get().getBlobId()) +.setTarget(blobId) +.build(); +Blob blob = client().copy(copyRequest).getResult(); +LOG.info("Latest deleted version was restored for {}", blob.getBlobId()); +return true; + } + LOG.warn("No latest deleted version was found"); + +} catch (StorageException e) { + LOG.warn("Failed to restore latest deleted version", e); Review Comment: Same as above ## gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java: ## @@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream blobIdsToDelete) { Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), gcpProperties.deleteBatchSize())) .forEach(batch -> client().delete(batch)); } + + @Override + public boolean recoverFile(String path) { +Preconditions.checkArgument( +!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be null or empty"); + +try { + BlobId blobId = BlobId.fromGsUtilUri(path); + + // first attempt to restore with soft-delete + if (recoverSoftDeletedObject(blobId)) { +return true; + } + + // fallback to restoring by copying the latest version + if (recoverLatestVersion(blobId)) { +return true; + } + +} catch (IllegalArgumentException e) { + LOG.warn("Invalid GCS path format: {}", path, e); +} + +return false; + } + + /** + * Attempts to restore a soft-deleted object. + * + * Requires {@code storage.objects.restore} permission + * + * See https://cloud.google.com/storage/docs/use-soft-deleted-ob
Re: [I] Make field-id of name-mapping optional [iceberg-python]
Fokko commented on issue #1420: URL: https://github.com/apache/iceberg-python/issues/1420#issuecomment-2531944719 @barronw certainly, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]
jkolash opened a new issue, #11742: URL: https://github.com/apache/iceberg/issues/11742 ### Feature Request / Improvement https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java#L239 It should default to another catalog implementation as hive is a bigger dependency than other catalog implementations. HadoopCatalog does work when I manually in the debugger set spark.sql.catalog.default_iceberg.type=hadoop spark.sql.catalog.default_iceberg.warehouse=/tmp/testing/warehouse/ I encountered this issue because I would like the ability to load a metadata.json directly via spark ```scala spark.read.format("iceberg").load("s3://bucket/path/metadata/metadata.json") ``` and encounter missing hive dependencies ### Query engine Spark ### Willingness to contribute - [X] I can contribute this improvement/feature independently - [X] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
viirya commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532312814 Does Iceberg implement ColumnVector interface which is similar to writable column vector or constant column vector? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]
bennychow commented on code in PR #11365: URL: https://github.com/apache/iceberg/pull/11365#discussion_r1878496657 ## format/view-spec.md: ## @@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view version. Common metad View definitions can be represented in multiple ways. Representations are documented ways to express a view definition. -A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. +A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. For the table identifiers in the view definitions to be interoperable across engines, the following must be true: +* All engines must resolve a fully specified SQL identifier to the same table in the same catalog. Review Comment: So, I was imagining that table resolution in the query engine would work like this: 1. Assume SQL identifier is already fully qualified and so first segment is the catalog name. Query engine would see if this catalog exists and can resolve the remaining segments into a table/view. 2. Prepend the view's catalog and default-namespace (optional) to the SQL identifier. Now, try to resolve with the query engine's registered catalogs. 3. If default-catalog is set, prepend the default-catalog and default-namespace (optional) to the SQL identifier. Again, try to resolve with query engine's registered catalogs. I don't think there is anything new with item 1 and 3 but item 2 is interesting because it allows two different engines to reference the same table using different catalog names. This works only when the table and view are stored in the same catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]
jkolash commented on issue #11742: URL: https://github.com/apache/iceberg/issues/11742#issuecomment-2532329867 This is probably the simplest fix https://github.com/apache/iceberg/pull/11743 allow users to set these properties directly a bigger fix may be to use another catalog implementation but given that HadoopCatalog and HiveCatalog are scheduled to be maybe outside of this project it isn't clear which one should be chosen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] AWS: adding s3 tables catalog [iceberg]
stubz151 closed pull request #11739: AWS: adding s3 tables catalog URL: https://github.com/apache/iceberg/pull/11739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]
raulcd commented on code in PR #7: URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878388213 ## .github/workflows/test.yml: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: Test + +on: + push: +branches: + - '**' + - '!dependabot/**' +tags: + - '**' + pull_request: + +concurrency: + group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} + cancel-in-progress: true + +permissions: + contents: read + +env: + ICEBERG_HOME: /tmp/iceberg + +jobs: + ubuntu: +name: AMD64 Ubuntu latest +runs-on: ubuntu-latest Review Comment: Do we want to pin macOS too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]
pitrou commented on code in PR #7: URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878396289 ## .github/workflows/test.yml: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: Test + +on: + push: +branches: + - '**' + - '!dependabot/**' +tags: + - '**' + pull_request: + +concurrency: + group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} + cancel-in-progress: true + +permissions: + contents: read + +env: + ICEBERG_HOME: /tmp/iceberg + +jobs: + ubuntu: +name: AMD64 Ubuntu latest +runs-on: ubuntu-latest Review Comment: It also depends how often these images change versions. If it's once every 2 years, it's less contentious to use "latest" than if GH bumps their versions every month. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark : Derive Stats From Manifest on the Fly [iceberg]
saitharun15 commented on PR #11615: URL: https://github.com/apache/iceberg/pull/11615#issuecomment-2532200924 Hi @RussellSpitzer ,@huaxingao can u please review the pr once, Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Rest Catalog: spark catalog api fails working rest based catalog is used [iceberg]
sunny1154 opened a new issue, #11741: URL: https://github.com/apache/iceberg/issues/11741 ### Apache Iceberg version 1.5.0 ### Query engine Spark ### Please describe the bug 🐞 Hi, I am observing issues when working with rest based catalog. my spark session has default catalog defined which is based of REST based catalog. SparkSession.catalog api fails to work with rest based catalog. tested with Spark 3.4. ${SPARK_HOME}/bin/spark-shell --master local[*] \ --driver-memory 2g \ --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider \ --conf spark.sql.catalog.iceberg.uri=https://xx.xxx..domain.com \ --conf spark.sql.warehouse.dir=$SQL_WAREHOUSE_DIR \ --conf spark.sql.defaultCatalog=iceberg \ --conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ scala> spark.catalog.currentCatalog res1: String = iceberg scala> spark.sql("select * from restDb.restTable").show +---+--+ | id| data| +---+--+ | 1|some_value| +---+--+ scala> spark.catalog.tableExists("restDb.restTable") **res3: Boolean = true** scala> spark.catalog.tableExists("restDb", "restTable") **res4: Boolean = false** other API also fail like spark.catalog.getTable("restDb", "restTable") -- fails with database not found spark.catalog.getTable("restDb.restTable") -- returns table object spark.catalog.tableExists("restDb", "restTable") -- return false (even though table exists) spark.catalog.tableExists("restDb.restTable") -- return true (if table exists and registered with rest catalog) spark.catalog.listColumns("restDb", "restTable") -- fails with database not found spark.catalog.listColumns("restDb.restTable") -- return list of columns ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
pan3793 commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532122346 update: the failure is related to [SPARK-50235](https://github.com/apache/spark/pull/48767), and the test passed on Spark 3.5.4 RC0 with reverting that patch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix caching table with metadata table names [iceberg]
manuzhang commented on code in PR #11123: URL: https://github.com/apache/iceberg/pull/11123#discussion_r1877751060 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -145,22 +146,26 @@ public Table loadTable(TableIdentifier ident) { } if (MetadataTableUtils.hasMetadataTableName(canonicalized)) { Review Comment: Please go ahead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, Spark3.5: Fix tests failure due to timeout [iceberg]
manuzhang commented on code in PR #11654: URL: https://github.com/apache/iceberg/pull/11654#discussion_r185553 ## core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCommits.java: ## @@ -421,7 +423,13 @@ public void testConcurrentFastAppends(@TempDir File dir) throws Exception { TABLES.create( SCHEMA, SPEC, -ImmutableMap.of(COMMIT_NUM_RETRIES, String.valueOf(threadsCount)), +ImmutableMap.of( +COMMIT_NUM_RETRIES, +String.valueOf(threadsCount), +COMMIT_MIN_RETRY_WAIT_MS, +"10", +LOCK_ACQUIRE_TIMEOUT_MS, Review Comment: Disable extra retry on lock acquire failure since commit will fail anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]
JanKaul commented on code in PR #11365: URL: https://github.com/apache/iceberg/pull/11365#discussion_r186853 ## format/view-spec.md: ## @@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view version. Common metad View definitions can be represented in multiple ways. Representations are documented ways to express a view definition. -A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. +A view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use. For the table identifiers in the view definitions to be interoperable across engines, the following must be true: +* All engines must resolve a fully specified SQL identifier to the same table in the same catalog. Review Comment: There are two parts to consider: 1. Resolution of table references 3. Portability constraints I think @bennychow 's suggestion conflates the two. In addition to the portability constraint, it describes how a query engine should resolve a partial table reference using either the default-catalog or the view catalog. This conflicts with how table references are currently resolved in the view spec. Currently only the default-catalog is used. If the main point of this PR is to define the Portability constraints, then we should not change the resolution of table references. Changing the table reference resolution would also be a breaking change to the view spec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] WIP proposal for fixing issue #11742 [iceberg]
jkolash closed pull request #11743: WIP proposal for fixing issue #11742 URL: https://github.com/apache/iceberg/pull/11743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]
jkolash commented on issue #11742: URL: https://github.com/apache/iceberg/issues/11742#issuecomment-2532512695 ok setting spark.sql.catalog.default_iceberg=org.apache.iceberg.spark.SparkCatalog will allow you to override the default values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]
LuciferYang commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532512457 > Please check for some discussion there [apache/spark#49131 (comment)](https://github.com/apache/spark/pull/49131#issuecomment-2532341673) Yes, I tested it locally and found that after adding a no-op `closeIfFreeable` method to the `IcebergArrowColumnVector.java` in version 3.5, all the test cases in `TestRewriteDataFilesAction` were able to pass. We can further test it after the release of 3.5.4 RC2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark3.4,3.5: Fix the BUG of iceberg views when resolved "group/order… [iceberg]
Ppei-Wang commented on code in PR #11729: URL: https://github.com/apache/iceberg/pull/11729#discussion_r1877165202 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -562,6 +562,58 @@ public void readFromViewWithCTE() throws NoSuchTableException { assertThat(sql("SELECT * FROM %s", viewName)).hasSize(1).containsExactly(row(10, 1L)); } + @TestTemplate + public void readFromViewWithGroupByOrdinal() throws NoSuchTableException { Review Comment: @nastra Thanks for your suggestion, modified. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Materialized View Spec [iceberg]
JanKaul commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1877723130 ## format/view-spec.md: ## @@ -160,6 +179,56 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) | | _required_ | `version-id` | ID that `current-version-id` was set to | + Full identifier + +The full identifier holds a reference, containing a namespace and a name, of a table or view in the catalog. + +| Requirement | Field name | Description | +|-||-| +| _optional_ | `catalog` | A string specifying the name of the catalog. If set to `null`, the catalog is the same as the views' catalog | Review Comment: Thank you! ## format/view-spec.md: ## @@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer's commit. Once a writer has created an update, it commits by swapping the view's metadata file pointer from the base location to the new location. +### Materialized Views + +Materialized views are a type of view that precompute the data from the view query. +When queried, engines may return the precomputed data for the materialized views, shifting the cost of query execution to the precomputation step. + +Iceberg materialized views are implemented as a combination of an Iceberg view and an underlying Iceberg table, known as the storage table, which stores the precomputed data. +The metadata for a materialized view extends the common view metadata, adding a pointer to the precomputed data and refresh information to determine if the data is still fresh. +The refresh information is composed of data about the so-called "source tables", which are the tables referenced in the query definition of the materialized view. +The storage table can be in the states of "fresh", "stale" or "invalid", which are determined from the following situations: +* **fresh** -- The `snapshot_id`'s of the last refresh operation match the current `snapshot_id`'s of the source tables. Review Comment: Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark3.4,3.5,Api,Hive: Fix using NullType in View. [iceberg]
Fokko commented on code in PR #11728: URL: https://github.com/apache/iceberg/pull/11728#discussion_r1877861454 ## api/src/main/java/org/apache/iceberg/types/Types.java: ## @@ -412,6 +413,24 @@ public String toString() { } } + public static class NullType extends PrimitiveType { Review Comment: @Ppei-Wang this would entail extending the [Iceberg table specification](https://iceberg.apache.org/spec/), by adding a new NULL type. Hive might have a NullType, but other query engines might not, so we need to make sure that it is adopted correctly. For changes to the specification, we should have a discussion in the community, it should go through a [Iceberg improvement proposal](https://iceberg.apache.org/contribute/#apache-iceberg-improvement-proposals). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: add `DataFileWriter` tests for schema and partition [iceberg-rust]
Fokko commented on PR #768: URL: https://github.com/apache/iceberg-rust/pull/768#issuecomment-2531216814 @jonathanc-n Could you rebase the PR? I'll merge it right away after that 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Simplify partition structures [iceberg-rust]
Fokko closed pull request #763: Simplify partition structures URL: https://github.com/apache/iceberg-rust/pull/763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Simplify partition structures [iceberg-rust]
Fokko commented on PR #763: URL: https://github.com/apache/iceberg-rust/pull/763#issuecomment-2531222156 Closed in favor of https://github.com/apache/iceberg-rust/pull/771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]
manuzhang commented on code in PR #11738: URL: https://github.com/apache/iceberg/pull/11738#discussion_r1878295947 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -144,14 +144,16 @@ public Table loadTable(TableIdentifier ident) { return cached; } -if (MetadataTableUtils.hasMetadataTableName(canonicalized)) { +Table table = tableCache.get(canonicalized, catalog::loadTable); + +if (table instanceof BaseMetadataTable) { + // Cache underlying table TableIdentifier originTableIdentifier = TableIdentifier.of(canonicalized.namespace().levels()); Table originTable = tableCache.get(originTableIdentifier, catalog::loadTable); - // share TableOperations instance of origin table for all metadata tables, so that metadata - // table instances are - // also refreshed as well when origin table instance is refreshed. + // Share TableOperations instance of origin table for all metadata tables, so that metadata Review Comment: Do we still need code below when we already cache the metadata table above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Compatibility Issue with pydantic and annotated-types in pyiceberg 0.8.1 [iceberg-python]
Fokko commented on issue #1418: URL: https://github.com/apache/iceberg-python/issues/1418#issuecomment-2531608924 @djouallah do you know which version of Pydantic you're using? You can easily check it using: ``` Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pydantic >>> pydantic.__version__ '2.10.3' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump mkdocs-material from 9.5.47 to 9.5.48 [iceberg-python]
Fokko merged PR #1419: URL: https://github.com/apache/iceberg-python/pull/1419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]
gaborkaszab commented on code in PR #7: URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878384745 ## .github/workflows/test.yml: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: Test + +on: + push: +branches: + - '**' + - '!dependabot/**' +tags: + - '**' + pull_request: + +concurrency: + group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} + cancel-in-progress: true + +permissions: + contents: read + +env: + ICEBERG_HOME: /tmp/iceberg + +jobs: + ubuntu: +name: AMD64 Ubuntu latest +runs-on: ubuntu-latest Review Comment: With pinning to a specific version we could document that as the supported version and moving on to a later version could be performed in a more controlled fashion. With using latest we might have surprise breaks even without changing code in this project. This wouldn't be that easy to investigate either. Also merging code would be blocked as long as we don't fix the incompatibility with a new Ubuntu version even though we could still deliver code using the release before. I'm +1 to pin this to a specific version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] REST: AuthManager API [iceberg]
nastra commented on code in PR #10753: URL: https://github.com/apache/iceberg/pull/10753#discussion_r1878738170 ## core/src/main/java/org/apache/iceberg/rest/auth/AuthSession.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest.auth; + +import org.apache.iceberg.rest.HTTPRequest; + +/** + * An authentication session that can be used to authenticate outgoing HTTP requests. + * + * Authentication sessions are usually immutable, but may hold resources that need to be released + * when the session is no longer needed. Implementations should override {@link #close()} to release + * any resources. + */ +public interface AuthSession extends AutoCloseable { Review Comment: I would prefer to do the rename here. @danielcweeks do you have any suggestions in terms of a better name for this interface so that we don't clash with the `AuthSession` in `OAuth2Util`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add clang format [iceberg-cpp]
raulcd commented on code in PR #4: URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1877791436 ## .clang-format: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +BasedOnStyle: Google +ColumnLimit: 90 Review Comment: From my side, both 90 and 100 sound reasonable. I don't have a strong preference. With current screen widths and resolutions, I don't see 100 being a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add clang format [iceberg-cpp]
pitrou commented on code in PR #4: URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1877805402 ## .clang-format: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +BasedOnStyle: Google +ColumnLimit: 90 Review Comment: I still think we should be mindful of diverse situations: * people with a smaller screen (such as a laptop) * side-by-side diffs * various windows arrangements * people who need a larger font because of eyesight issues There is also the issue that the longer the line, the farther the eye has to scan horizontally. It's not always very comfortable to do so. This is why newspapers typically use several narrow columns side-by-side. In the end, of course, the difference between 90 and 100 will not be fundamental. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add clang format [iceberg-cpp]
gaborkaszab commented on code in PR #4: URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1877769090 ## .clang-format: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +BasedOnStyle: Google +ColumnLimit: 90 Review Comment: > I just checked impala's .clang-format, it has ColumnLimit: 90 for both Cpp and Java. Yes, and I'm involved with the Impala project and can tell you that whoever I talked to on the community hates the 90 char limit and would prefer to raise it to 100. > We don't. It's in the case where we decrease the value that adjustments are needed. I don't think this statement is correct. If you increase the line length then there are lines what are broken into 2 but with the new limit will fit into one. In case you have auto formatting those will be rewritten, loosing the git blame info. I don't want to block this PR, don't get me wrong. I just want to be more careful than simply rushing commits like this as these will have an effect on the project forever. > someone does not agree with others This gives me the impression that you say it's me against everyone else :) Fokko also politely mentioned the char limits for other projects too. I see that there is support for the 90 char limit by people coming from the Arrow community, so let's just simply give other set of people a bit more time to chime in and then we can resolve this comment if nothing changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] AWS: Enable RetryMode for AWS KMS client [iceberg]
hsiang-c commented on code in PR #11420: URL: https://github.com/apache/iceberg/pull/11420#discussion_r1877559085 ## aws/src/test/java/org/apache/iceberg/aws/kms/TestKmsClientProperties.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.aws.kms; + +import static org.assertj.core.api.Assertions.assertThat; + +import org.apache.iceberg.aws.AwsClientProperties; +import org.junit.jupiter.api.Test; +import software.amazon.awssdk.core.retry.RetryMode; +import software.amazon.awssdk.services.kms.KmsClient; +import software.amazon.awssdk.services.kms.KmsClientBuilder; + +public class TestKmsClientProperties { + @Test + public void testApplyRetryConfiguration() { +AwsClientProperties awsClientProperties = new AwsClientProperties(); + +KmsClientBuilder builder = KmsClient.builder(); +awsClientProperties.applyRetryConfigurations(builder); +RetryMode retryPolicy = builder.overrideConfiguration().retryMode().get(); + +assertThat(retryPolicy).as("retry mode was not set").isEqualTo(RetryMode.ADAPTIVE_V2); Review Comment: Sorry I missed this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core, Flink, Spark: Drop deprecated APIs scheduled for removal in 1.8.0 [iceberg]
findepi commented on code in PR #11721: URL: https://github.com/apache/iceberg/pull/11721#discussion_r1877655052 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/data/TestRowProjection.java: ## @@ -369,19 +346,7 @@ public void testMapProjection() throws IOException { assertThat(projected.getMap(0)).isEqualTo(properties); } - private Map toStringMap(Map map) { Review Comment: the ide didn't flag it as unused there... turns out it was too lazy. thanks for catching! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat!: Remove `BoundPartitionSpec` (WIP) [iceberg-rust]
Fokko commented on code in PR #771: URL: https://github.com/apache/iceberg-rust/pull/771#discussion_r1877888621 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -119,9 +119,11 @@ pub struct TableMetadata { /// ID of the table’s current schema. pub(crate) current_schema_id: i32, /// A list of partition specs, stored as full partition spec objects. -pub(crate) partition_specs: HashMap, +pub(crate) partition_specs: HashMap, /// ID of the “current” spec that writers should use by default. -pub(crate) default_spec: BoundPartitionSpecRef, +pub(crate) default_spec: PartitionSpecRef, +/// Partition type of the default partition spec. +pub(crate) default_partition_type: StructType, Review Comment: I had to think about this one, but I think I like it 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat!: Remove `BoundPartitionSpec` (WIP) [iceberg-rust]
Fokko commented on code in PR #771: URL: https://github.com/apache/iceberg-rust/pull/771#discussion_r1877890713 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -776,17 +782,12 @@ pub(super) mod _serde { .map(|x| (x.spec_id(), Arc::new(x))), ); let default_spec_id = value.default_spec_id; -let default_spec = partition_specs +let default_spec: PartitionSpecRef = partition_specs .get(&value.default_spec_id) -.map(|schemaless_spec| { -(*schemaless_spec.clone()) -.clone() -.bind(current_schema.clone()) -}) -.transpose()? +.map(|schemaless_spec| (**schemaless_spec).clone()) Review Comment: ```suggestion .map(|spec| (**spec).clone()) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org