date:20241210

Re: [PR] Add clang format [iceberg-cpp]

2024-12-10 Thread via GitHub



zhjwpku commented on code in PR #4:
URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1879216891


##
.clang-format:
##
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+BasedOnStyle: Google
+ColumnLimit: 90
+DerivePointerAlignment: false
+IncludeBlocks: Preserve

Review Comment:
   I plan to make the following changes after we reached a consensus.
   
   ```
   -IncludeBlocks: Preserve
   +IncludeBlocks: Regroup
   +IncludeCategories:
   +- Regex: '^<.*/' # Like 
   +  Priority: 3
   +- Regex: '^<.*\.' # Like 
   +  Priority: 1
   +- Regex: '^<' # Like 
   +  Priority: 2
   +- Regex: '^"' # Like "util/auth-util.h"
   +  Priority: 4
   ```
   
   Before I push a new commit, I'd like to explain why I think this makes sense.
   
   I did a experiment, `Regroup` seems to be the default setting if we don't 
specify a `IncludeBlocks` setting, I put it here explicitly because I think 
it's more clear for the readers. With the `IncludeCategories` settings, the 
following:
   
   ```
   #include 
   #include 
   
   #include 
   
   #include "util/auth-util.h"
   
   #include 
   
   #include 
   
   #include "demo_table.h"
   ```
   
   goes to:
   
   ```
   #include "demo_table.h"
   
   #include 
   
   #include 
   #include 
   #include 
   
   #include 
   
   #include "util/auth-util.h"
   ```
   
   It's slightly different from Impala's setting because I think putting the 
`Regex` as the first line of each rule is more readable, "If match this regex, 
then goes priority xxx".
   
   What do you think about this proposal? @gaborkaszab @pitrou @Fokko @wgtmac 
@raulcd 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



pan3793 commented on code in PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879240394


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java:
##
@@ -59,6 +59,11 @@ public void close() {
 accessor.close();
   }
 
+  // If a column vector is writable or constant, it should override this 
method and do nothing.
+  // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4)
+  @Override
+  public void closeIfNotWritable() {}

Review Comment:
   will rename to `closeIfFreeable` after RC2 available



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark3.4,3.5: Fix the BUG of iceberg views when resolved "group/order… [iceberg]

2024-12-10 Thread via GitHub



Ppei-Wang commented on code in PR #11729:
URL: https://github.com/apache/iceberg/pull/11729#discussion_r1879246100


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws 
NoSuchTableException {
 assertThat(sql("SELECT * FROM %s", 
viewName)).hasSize(1).containsExactly(row(10, 1L));
   }
 
+  @TestTemplate
+  public void readFromViewWithGroupByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("viewWithGroupByOrdinal");
+String sql = String.format("SELECT id FROM %s group by 1", tableName);
+
+ViewCatalog viewCatalog = viewCatalog();
+
+viewCatalog
+.buildView(TableIdentifier.of(NAMESPACE, viewName))
+.withQuery("spark", sql)
+.withDefaultNamespace(NAMESPACE)
+.withDefaultCatalog(catalogName)
+.withSchema(schema(sql))
+.create();
+
+List expected =
+IntStream.rangeClosed(1, 
10).mapToObj(this::row).collect(Collectors.toList());
+
+assertThat(sql("SELECT * FROM %s", viewName))
+.hasSize(10)
+.containsExactlyInAnyOrderElementsOf(expected);
+  }
+
+  @TestTemplate
+  public void readFromViewWithOrderByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("viewWithOrderByOrdinal");
+String sql = String.format("SELECT id FROM %s order by 1", tableName);
+
+ViewCatalog viewCatalog = viewCatalog();
+
+viewCatalog
+.buildView(TableIdentifier.of(NAMESPACE, viewName))
+.withQuery("spark", sql)
+.withDefaultNamespace(NAMESPACE)
+.withDefaultCatalog(catalogName)
+.withSchema(schema(sql))
+.create();
+
+List expected =
+IntStream.rangeClosed(1, 
10).mapToObj(this::row).collect(Collectors.toList());
+
+assertThat(sql("SELECT * FROM %s", viewName))
+.hasSize(10)
+.containsExactlyInAnyOrderElementsOf(expected);
+  }
+
+  @TestTemplate
+  public void createViewWithGroupByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("createViewWithGroupByOrdinal");
+sql("CREATE VIEW %s AS SELECT id FROM %s GROUP BY 1", viewName, tableName);
+List expected =
+IntStream.rangeClosed(1, 
10).mapToObj(this::row).collect(Collectors.toList());
+
+assertThat(sql("SELECT * FROM %s", viewName))
+.hasSize(10)
+.containsExactlyInAnyOrderElementsOf(expected);

Review Comment:
   @ebyhr  Very good advice， modified. And Could you please help approve 
workflow runs？thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Add s3tables catalog support [iceberg-rust]

2024-12-10 Thread via GitHub



liurenjie1024 commented on issue #754:
URL: https://github.com/apache/iceberg-rust/issues/754#issuecomment-2533507061

   > please assign this to me if it's not a feature that needs to be released 
soon. I can work on it part-time, but I cannot guarantee the completion time
   
   Done, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Add ignore-invalid-options to RewriteDataFilesSparkAction and RewritePositionDeleteFilesSparkAction [iceberg]

2024-12-10 Thread via GitHub



huaxingao commented on code in PR #11737:
URL: https://github.com/apache/iceberg/pull/11737#discussion_r1879274285


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##
@@ -1214,6 +1214,15 @@ public void testInvalidOptions() {
 .hasMessageContaining("requires enabling Iceberg Spark session 
extensions");
   }
 
+  @TestTemplate
+  public void testIgnoreInvalidOptions() {
+Table table = createTable(20);
+basicRewrite(table)
+.option("ignore-invalid-options", "true")
+.option("foobarity", "-5")
+.execute();

Review Comment:
   Should we also test the default behavior of `ignore-invalid-options`, 
similar to what you did in 
`TestRewritePositionDeleteFilesAction.testIgnoreInvalidOptions`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



pan3793 commented on code in PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879272199


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java:
##
@@ -59,6 +59,11 @@ public void close() {
 accessor.close();
   }
 
+  // If a column vector is writable or constant, it should override this 
method and do nothing.
+  // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4)
+  @Override
+  public void closeIfNotWritable() {}

Review Comment:
   I checked the codebase, other vectors overwrite `void close() {}`, so they 
are not affected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Partition Spec Evolution API [iceberg-rust]

2024-12-10 Thread via GitHub



liurenjie1024 commented on issue #732:
URL: https://github.com/apache/iceberg-rust/issues/732#issuecomment-2533556499

   Thanks for @c-thiel for raising this. I agree that we should focus on v2 
first, and add v1 support later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



singhpk234 commented on code in PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879265358


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java:
##
@@ -59,6 +59,11 @@ public void close() {
 accessor.close();
   }
 
+  // If a column vector is writable or constant, it should override this 
method and do nothing.
+  // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4)
+  @Override
+  public void closeIfNotWritable() {}

Review Comment:
   > If a column vector is writable or constant, it should override this method 
and do nothing.
   
   we have ConstantColumnVector as well, should we do it vector classes which 
absolutely requires it ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



pan3793 commented on code in PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#discussion_r1879272199


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/IcebergArrowColumnVector.java:
##
@@ -59,6 +59,11 @@ public void close() {
 accessor.close();
   }
 
+  // If a column vector is writable or constant, it should override this 
method and do nothing.
+  // See more details at SPARK-50235, SPARK-50463 (Fixed in Spark 3.5.4)
+  @Override
+  public void closeIfNotWritable() {}

Review Comment:
   I checked the codebase, other vectors overwrite `void close() {}`, so they 
won't be affected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Add ignore-invalid-options to RewriteDataFilesSparkAction and RewritePositionDeleteFilesSparkAction [iceberg]

2024-12-10 Thread via GitHub



manuzhang commented on code in PR #11737:
URL: https://github.com/apache/iceberg/pull/11737#discussion_r1879291362


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##
@@ -1214,6 +1214,15 @@ public void testInvalidOptions() {
 .hasMessageContaining("requires enabling Iceberg Spark session 
extensions");
   }
 
+  @TestTemplate
+  public void testIgnoreInvalidOptions() {
+Table table = createTable(20);
+basicRewrite(table)
+.option("ignore-invalid-options", "true")
+.option("foobarity", "-5")
+.execute();

Review Comment:
   It's covered by `testInvalidOptions()` above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



pan3793 commented on PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2533597688

   FYI @LuciferYang @viirya @dongjoon-hyun @wForget
   
   CI is green now, and I think the current Spark `branch-3.5` is in good shape 
for Iceberg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix failure when reading files table with branch [iceberg]

2024-12-10 Thread via GitHub



dramaticlly commented on PR #11719:
URL: https://github.com/apache/iceberg/pull/11719#issuecomment-2533650691

   > This makes sense to me. How about other metadata tables? @dramaticlly do 
you also want take a look?
   
   Happy to help with other metadata tables experienced similar problem if it's 
not addressed in this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Use apache/spark image in the quickstart [iceberg]

2024-12-10 Thread via GitHub



ajantha-bhat opened a new issue, #11746:
URL: https://github.com/apache/iceberg/issues/11746

   ### Feature Request / Improvement
   
   more information
   https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4
   
   ### Query engine
   
   None
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Use apache/spark image in the quickstart [iceberg]

2024-12-10 Thread via GitHub



ajantha-bhat commented on issue #11746:
URL: https://github.com/apache/iceberg/issues/11746#issuecomment-2533652976

   Will keep it open if anyone wants to take it up. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Bug in `PartialEq` for `Struct` [iceberg-rust]

2024-12-10 Thread via GitHub



sungwy commented on issue #706:
URL: https://github.com/apache/iceberg-rust/issues/706#issuecomment-2531897325

   Hi @Sl1mb0 - Thank you for raising this issue!
   
   I took a look at this issue in order to resolve all issues blocking the 
0.4.0 release. I put up the PR #772 to replicate your issue, and I was able to 
get the exact error using the code you shared above.
   
   I did notice that you had an unpartitioned PartitionSpec, which doesn't 
align with the partition value you are inserting as you are expecting to have 
one partition value persisted into your DataFile entry.
   
   In order to preserve that value, you will need to use a PartitionSpec that 
aligns with your values. Please take a look at this PR to see that the 
partition value is being recorded properly when the PartitionSpec has a field:
   
   
https://github.com/apache/iceberg-rust/pull/772/files#diff-8389535350ef7cddc266dfd18d589a978643da0334c23e16646e62e8d6a0892eR2115
   
   ```
   // Create a partition spec with a single partition field.
   let partition_spec = BoundPartitionSpec::builder(schema.clone())
   .with_spec_id(0)
   .add_partition_field("field_0", "field_0", Transform::Identity)
   .unwrap()
   .build()
   .unwrap();
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]

2024-12-10 Thread via GitHub



gaborkaszab commented on code in PR #11738:
URL: https://github.com/apache/iceberg/pull/11738#discussion_r1878337162


##
core/src/main/java/org/apache/iceberg/CachingCatalog.java:
##
@@ -144,14 +144,16 @@ public Table loadTable(TableIdentifier ident) {
   return cached;
 }
 
-if (MetadataTableUtils.hasMetadataTableName(canonicalized)) {
+Table table = tableCache.get(canonicalized, catalog::loadTable);
+
+if (table instanceof BaseMetadataTable) {
+  // Cache underlying table
   TableIdentifier originTableIdentifier =
   TableIdentifier.of(canonicalized.namespace().levels());
   Table originTable = tableCache.get(originTableIdentifier, 
catalog::loadTable);
 
-  // share TableOperations instance of origin table for all metadata 
tables, so that metadata
-  // table instances are
-  // also refreshed as well when origin table instance is refreshed.
+  // Share TableOperations instance of origin table for all metadata 
tables, so that metadata

Review Comment:
   It's needed to share the underlying TableOperations between the originTable 
and the metadata table. We basically create another Table object for the table 
metadata with a different TableOperations and then update the cache with that 
object.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]

2024-12-10 Thread via GitHub



amogh-jahagirdar commented on code in PR #11565:
URL: https://github.com/apache/iceberg/pull/11565#discussion_r1878222555


##
gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java:
##
@@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream 
blobIdsToDelete) {
 Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), 
gcpProperties.deleteBatchSize()))
 .forEach(batch -> client().delete(batch));
   }
+
+  @Override
+  public boolean recoverFile(String path) {
+Preconditions.checkArgument(
+!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be 
null or empty");
+
+try {
+  BlobId blobId = BlobId.fromGsUtilUri(path);
+
+  // first attempt to restore with soft-delete
+  if (recoverSoftDeletedObject(blobId)) {
+return true;
+  }
+
+  // fallback to restoring by copying the latest version
+  if (recoverLatestVersion(blobId)) {
+return true;
+  }
+
+} catch (IllegalArgumentException e) {
+  LOG.warn("Invalid GCS path format: {}", path, e);
+}

Review Comment:
   Actually just noticed this, not sure it makes sense to swallow the 
IllegalArgumentException in case the path format is incorrect? I consider this 
just like the null or empty case where `recoverFile` can just throw. The best 
effort really is for the calls to attempt recovery, not the other validation 
imo.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]

2024-12-10 Thread via GitHub



amogh-jahagirdar commented on code in PR #11565:
URL: https://github.com/apache/iceberg/pull/11565#discussion_r1878222555


##
gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java:
##
@@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream 
blobIdsToDelete) {
 Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), 
gcpProperties.deleteBatchSize()))
 .forEach(batch -> client().delete(batch));
   }
+
+  @Override
+  public boolean recoverFile(String path) {
+Preconditions.checkArgument(
+!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be 
null or empty");
+
+try {
+  BlobId blobId = BlobId.fromGsUtilUri(path);
+
+  // first attempt to restore with soft-delete
+  if (recoverSoftDeletedObject(blobId)) {
+return true;
+  }
+
+  // fallback to restoring by copying the latest version
+  if (recoverLatestVersion(blobId)) {
+return true;
+  }
+
+} catch (IllegalArgumentException e) {
+  LOG.warn("Invalid GCS path format: {}", path, e);
+}

Review Comment:
   Actually just noticed this, do we really need the explicit catch for 
IllegalArgumentException in case the path format is incorrect? I consider this 
just like the null or empty case where `recoverFile` can just throw. The best 
effort really is for the calls to attempt recovery, not the other validation 
imo.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]

2024-12-10 Thread via GitHub



raulcd commented on code in PR #7:
URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1877940138


##
.github/workflows/test.yml:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Test
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+env:
+  ICEBERG_HOME: /tmp/iceberg
+
+jobs:
+  ubuntu:
+name: AMD64 Ubuntu latest
+runs-on: ubuntu-latest

Review Comment:
   I am in two minds with this one but I think I prefer to have `latest` so if 
a new OS (GitHub runner) is released we find early that there are issues with 
it.
   The other approach might leave us with silent failures on newer systems if 
we don't update them periodically but both points are valid, I am ok with both 
approaches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] DataScan doesn't respect casesensitive argument [iceberg-python]

2024-12-10 Thread via GitHub



jiakai-li opened a new issue, #1421:
URL: https://github.com/apache/iceberg-python/issues/1421

   ### Apache Iceberg version
   
   None
   
   ### Please describe the bug 🐞
   
   When I was learning the code, I noticed currently the `DataScan` class 
doesn't respect `case_sensitive` when it plans files.
   Here is 
[reference](https://github.com/apache/iceberg-python/blob/ede363bd63640f3db6c3b3d6904a5765dafd2600/pyiceberg/table/__init__.py#L1314)
   
   This means we can't do this:
   ```python
   # case_sensitive is not taking effect
   data_scan = table.scan(row_filter=row_filter, case_sensitive=False)
   ```
   
   ### Willingness to contribute
   
   - [X] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



viirya commented on PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532346946

   Please check for some discussion there 
https://github.com/apache/spark/pull/49131#issuecomment-2532341673


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] DataScan doesn't respect casesensitive argument [iceberg-python]

2024-12-10 Thread via GitHub



jiakai-li commented on issue #1421:
URL: 
https://github.com/apache/iceberg-python/issues/1421#issuecomment-2532341957

   Can I work on this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]

2024-12-10 Thread via GitHub



raulcd commented on code in PR #7:
URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878524143


##
.github/workflows/test.yml:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Test
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+env:
+  ICEBERG_HOME: /tmp/iceberg
+
+jobs:
+  ubuntu:
+name: AMD64 Ubuntu latest
+runs-on: ubuntu-latest

Review Comment:
   `ubuntu-latest` is about to be migrated to `24.04` and it was migrated to 
`22.04` from `20.04` around 2 years ago, see: 
https://github.com/actions/runner-images/commit/9c544f47011f3fb7f13ab122927f15939d950b49
   `macos-latest` has been having a higher rotation, around once a year (it 
went from macos-11, to macos-12 and then to macos-14)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]

2024-12-10 Thread via GitHub



wmoustafa commented on code in PR #11365:
URL: https://github.com/apache/iceberg/pull/11365#discussion_r1878540920


##
format/view-spec.md:
##
@@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view 
version. Common metad
 
 View definitions can be represented in multiple ways. Representations are 
documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use.
+A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use. For the table identifiers in the view 
definitions to be interoperable across engines, the following must be true:
+* All engines must resolve a fully specified SQL identifier to the same table 
in the same catalog.

Review Comment:
   > then we should not change the resolution of table references. Changing the 
table reference resolution would also be a breaking change to the view spec.
   
   @JanKaul Could you clarify what you meant by this, preferably through an 
example?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Core, Flink, Spark, KafkaConnect: Remove remaining usage of deprecated path API [iceberg]

2024-12-10 Thread via GitHub



amogh-jahagirdar opened a new pull request, #11744:
URL: https://github.com/apache/iceberg/pull/11744

   Same as https://github.com/apache/iceberg/pull/11563, removing the remaining 
usages of the path API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-10 Thread via GitHub



pvary merged PR #11662:
URL: https://github.com/apache/iceberg/pull/11662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Fix range distribution npe when value is null [iceberg]

2024-12-10 Thread via GitHub



pvary commented on PR #11662:
URL: https://github.com/apache/iceberg/pull/11662#issuecomment-2532940822

   Thanks @Guosmilesmile for the PR, and @mxm and @stevenzwu for the review!
   
   @Guosmilesmile please prepare the backport commits to the other Flink 
versions.
   
   Thanks, Peter 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core: Add scan planning apis to REST Catalog [iceberg]

2024-12-10 Thread via GitHub



rahil-c commented on PR #11180:
URL: https://github.com/apache/iceberg/pull/11180#issuecomment-2532789316

   @rdblue @danielcweeks Was wondering if you guys can take a look whenever you 
get a chance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-10 Thread via GitHub



dramaticlly commented on PR #11597:
URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2533061166

   @rdblue @danielcweeks can you help take anther look to see if we can move 
forward on this PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark3.4,3.5: Fix the BUG of iceberg views when resolved "group/order… [iceberg]

2024-12-10 Thread via GitHub



ebyhr commented on code in PR #11729:
URL: https://github.com/apache/iceberg/pull/11729#discussion_r1879148802


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws 
NoSuchTableException {
 assertThat(sql("SELECT * FROM %s", 
viewName)).hasSize(1).containsExactly(row(10, 1L));
   }
 
+  @TestTemplate
+  public void readFromViewWithGroupByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("viewWithGroupByOrdinal");
+String sql = String.format("SELECT id FROM %s group by 1", tableName);
+
+ViewCatalog viewCatalog = viewCatalog();
+
+viewCatalog
+.buildView(TableIdentifier.of(NAMESPACE, viewName))
+.withQuery("spark", sql)
+.withDefaultNamespace(NAMESPACE)
+.withDefaultCatalog(catalogName)
+.withSchema(schema(sql))
+.create();
+
+List expected =
+IntStream.rangeClosed(1, 
10).mapToObj(this::row).collect(Collectors.toList());
+
+assertThat(sql("SELECT * FROM %s", viewName))
+.hasSize(10)
+.containsExactlyInAnyOrderElementsOf(expected);
+  }
+
+  @TestTemplate
+  public void readFromViewWithOrderByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("viewWithOrderByOrdinal");
+String sql = String.format("SELECT id FROM %s order by 1", tableName);

Review Comment:
   nit: Uppercase `order by`. 



##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws 
NoSuchTableException {
 assertThat(sql("SELECT * FROM %s", 
viewName)).hasSize(1).containsExactly(row(10, 1L));
   }
 
+  @TestTemplate
+  public void readFromViewWithGroupByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("viewWithGroupByOrdinal");
+String sql = String.format("SELECT id FROM %s group by 1", tableName);

Review Comment:
   nit: Uppercase `group by`. 



##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -562,6 +562,67 @@ public void readFromViewWithCTE() throws 
NoSuchTableException {
 assertThat(sql("SELECT * FROM %s", 
viewName)).hasSize(1).containsExactly(row(10, 1L));
   }
 
+  @TestTemplate
+  public void readFromViewWithGroupByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("viewWithGroupByOrdinal");
+String sql = String.format("SELECT id FROM %s group by 1", tableName);
+
+ViewCatalog viewCatalog = viewCatalog();
+
+viewCatalog
+.buildView(TableIdentifier.of(NAMESPACE, viewName))
+.withQuery("spark", sql)
+.withDefaultNamespace(NAMESPACE)
+.withDefaultCatalog(catalogName)
+.withSchema(schema(sql))
+.create();
+
+List expected =
+IntStream.rangeClosed(1, 
10).mapToObj(this::row).collect(Collectors.toList());
+
+assertThat(sql("SELECT * FROM %s", viewName))
+.hasSize(10)
+.containsExactlyInAnyOrderElementsOf(expected);
+  }
+
+  @TestTemplate
+  public void readFromViewWithOrderByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("viewWithOrderByOrdinal");
+String sql = String.format("SELECT id FROM %s order by 1", tableName);
+
+ViewCatalog viewCatalog = viewCatalog();
+
+viewCatalog
+.buildView(TableIdentifier.of(NAMESPACE, viewName))
+.withQuery("spark", sql)
+.withDefaultNamespace(NAMESPACE)
+.withDefaultCatalog(catalogName)
+.withSchema(schema(sql))
+.create();
+
+List expected =
+IntStream.rangeClosed(1, 
10).mapToObj(this::row).collect(Collectors.toList());
+
+assertThat(sql("SELECT * FROM %s", viewName))
+.hasSize(10)
+.containsExactlyInAnyOrderElementsOf(expected);
+  }
+
+  @TestTemplate
+  public void createViewWithGroupByOrdinal() throws NoSuchTableException {
+insertRows(10);
+String viewName = viewName("createViewWithGroupByOrdinal");
+sql("CREATE VIEW %s AS SELECT id FROM %s GROUP BY 1", viewName, tableName);
+List expected =
+IntStream.rangeClosed(1, 
10).mapToObj(this::row).collect(Collectors.toList());
+
+assertThat(sql("SELECT * FROM %s", viewName))
+.hasSize(10)
+.containsExactlyInAnyOrderElementsOf(expected);

Review Comment:
   This test doesn't ensure whether `GROUP BY` is effective or not because it 
specifies id column. Ideally, it should specify non-id column and verify the 
duplicate value is eliminated. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this serv

Re: [I] Rest Catalog: spark catalog api fails to work with rest based catalog [iceberg]

2024-12-10 Thread via GitHub



dramaticlly commented on issue #11741:
URL: https://github.com/apache/iceberg/issues/11741#issuecomment-2533365254

   I think use this we can reproduce the problem in a unit test from 
`org.apache.iceberg.spark.sql.TestCreateTable` as it support all of 
hive/spark/hadoop/rest catalog, I think rest and hadoop catalog will fail while 
hive and spark will pass
   
   ```java
 @TestTemplate
 public void testCreateTable() {
   assumeThat(catalogName).isEqualTo(SparkCatalogConfig.REST.catalogName());
   assertThat(validationCatalog.tableExists(tableIdent))
   .as("Table should not already exist")
   .isFalse();
   
   sql("CREATE TABLE %s (id BIGINT NOT NULL, data STRING) USING iceberg", 
tableName);
   
   Table table = validationCatalog.loadTable(tableIdent);
   assertThat(table).as("Should load the new table").isNotNull();
   
   StructType expectedSchema =
   StructType.of(
   NestedField.required(1, "id", Types.LongType.get()),
   NestedField.optional(2, "data", Types.StringType.get()));
   assertThat(table.schema().asStruct())
   .as("Should have the expected schema")
   .isEqualTo(expectedSchema);
   assertThat(table.spec().fields()).as("Should not be 
partitioned").hasSize(0);
   assertThat(table.properties().get(TableProperties.DEFAULT_FILE_FORMAT))
   .as("Should not have the default format set")
   .isNull();
   
   spark.sessionState().catalogManager().setCurrentCatalog(catalogName);
   assertThat(spark.catalog().tableExists(tableIdent.toString())).isTrue(); 
 // success
   
assertThat(spark.catalog().tableExists(tableIdent.namespace().toString(), 
tableIdent.name())).isTrue(); //failure
 }
   ```
   
   I am wondering if anyone run into such when using REST base catalog?
   
   CC @RussellSpitzer @flyrain 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-10 Thread via GitHub



danielcweeks commented on code in PR #11597:
URL: https://github.com/apache/iceberg/pull/11597#discussion_r1879074770


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -412,6 +412,43 @@ private void validateTableIsIcebergTableOrView(
 }
   }
 
+  /**
+   * Check whether table or metadata table exists.
+   *
+   * Note: If a hive table with the same identifier exists in catalog, this 
method will return
+   * {@code false}.
+   *
+   * @param identifier a table identifier
+   * @return true if the table exists, false otherwise
+   */
+  @Override
+  public boolean tableExists(TableIdentifier identifier) {
+TableIdentifier baseTableIdentifier = identifier;
+if (!isValidIdentifier(identifier)) {
+  if (!isValidMetadataIdentifier(identifier)) {
+return false;

Review Comment:
   Minor behavior issue here.  Currently if it's an invalid identifier, it will 
throw as opposed to returning false.
   
   ```  
   throw new NoSuchTableException("Invalid table identifier: %s", identifier);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-10 Thread via GitHub



danielcweeks commented on PR #11597:
URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2533223559

   @dramaticlly minor comment, but other than that LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-12-10 Thread via GitHub



dramaticlly commented on code in PR #11597:
URL: https://github.com/apache/iceberg/pull/11597#discussion_r1879086639


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -412,6 +412,43 @@ private void validateTableIsIcebergTableOrView(
 }
   }
 
+  /**
+   * Check whether table or metadata table exists.
+   *
+   * Note: If a hive table with the same identifier exists in catalog, this 
method will return
+   * {@code false}.
+   *
+   * @param identifier a table identifier
+   * @return true if the table exists, false otherwise
+   */
+  @Override
+  public boolean tableExists(TableIdentifier identifier) {
+TableIdentifier baseTableIdentifier = identifier;
+if (!isValidIdentifier(identifier)) {
+  if (!isValidMetadataIdentifier(identifier)) {
+return false;

Review Comment:
   thank you @danielcweeks , I think you might refer to [this line in 
loadTable()](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L66)
 which throw exceptions, for the [`tableExists` API defined in catalog 
interface](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/catalog/Catalog.java#L283-L285)
 actually catch this NoSuchTableException and return false. 
   
   I also added a explicit step in unit test for invalid identifier to ensure 
the behavior is same as before
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spec: Support geo type [iceberg]

2024-12-10 Thread via GitHub



jiayuasu commented on code in PR #10981:
URL: https://github.com/apache/iceberg/pull/10981#discussion_r1879088058


##
format/spec.md:
##
@@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific 
timestamp. The discrepancie
 
 When processing point in time queries implementations should use 
"snapshot-log" metadata to lookup the table state at the given point in time. 
This ensures time-travel queries reflect the state of the table at the provided 
timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP 
AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table 
just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the 
metadata from that snapshot to perform the scan of the table. If no  snapshot 
exists prior to the timestamp given or "snapshot-log" is not populated (it is 
an optional field), then systems should raise an informative error message 
about the missing metadata.
 
+## Appendix G: Geospatial Notes
+
+The Geometry class hierarchy and its WKT and WKB serializations (ISO 
supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation 
Specification for Geographic information – Simple feature access – Part 1: 
Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from 
[OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/).
+
+The version of the OGC standard first used here is 1.2.1, but future versions 
may also used if the WKB representation remains wire-compatible.
+
+Possible values for edge-interpolation algorithm (A) for `geography` types are:
+* `spherical`: edges are interpolated as geodesics on a sphere. The radius of 
the underlying sphere is the mean radius of the spheroid defined by the CRS, 
defined as (2 * major_axis_length + minor_axis_length / 3).
+* `vincenty`: 
[https://en.wikipedia.org/wiki/Vincenty%27s_formulae](https://en.wikipedia.org/wiki/Vincenty%27s_formulae)
+* `andoyer-lambert-thomas`
+* `andoyer-lambert`
+* `karney`

Review Comment:
   ```suggestion
   * `karney`: [Karney, Charles FF. "Algorithms for geodesics." Journal of 
Geodesy 87 (2013): 
43-55](https://link.springer.com/content/pdf/10.1007/s00190-012-0578-z.pdf), 
and [GeographicLib](https://geographiclib.sourceforge.io/)
   ```



##
format/spec.md:
##
@@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific 
timestamp. The discrepancie
 
 When processing point in time queries implementations should use 
"snapshot-log" metadata to lookup the table state at the given point in time. 
This ensures time-travel queries reflect the state of the table at the provided 
timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP 
AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table 
just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the 
metadata from that snapshot to perform the scan of the table. If no  snapshot 
exists prior to the timestamp given or "snapshot-log" is not populated (it is 
an optional field), then systems should raise an informative error message 
about the missing metadata.
 
+## Appendix G: Geospatial Notes
+
+The Geometry class hierarchy and its WKT and WKB serializations (ISO 
supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation 
Specification for Geographic information – Simple feature access – Part 1: 
Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from 
[OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/).
+
+The version of the OGC standard first used here is 1.2.1, but future versions 
may also used if the WKB representation remains wire-compatible.
+
+Possible values for edge-interpolation algorithm (A) for `geography` types are:
+* `spherical`: edges are interpolated as geodesics on a sphere. The radius of 
the underlying sphere is the mean radius of the spheroid defined by the CRS, 
defined as (2 * major_axis_length + minor_axis_length / 3).
+* `vincenty`: 
[https://en.wikipedia.org/wiki/Vincenty%27s_formulae](https://en.wikipedia.org/wiki/Vincenty%27s_formulae)
+* `andoyer-lambert-thomas`

Review Comment:
   ```suggestion
   * `andoyer-lambert-thomas`: Thomas, Paul D. Spheroidal geodesics, reference 
systems, & local geometry. US Naval Oceanographic Office, 1970.
   ```



##
format/spec.md:
##
@@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific 
timestamp. The discrepancie
 
 When processing point in time queries implementations should use 
"snapshot-log" metadata to lookup the table state at the given point in time. 
This ensures time-travel queries reflect the state of the table at the provided 
timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP 
AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table 
just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the 
metadata from that snapshot to perform the sca

Re: [PR] Spec: Support geo type [iceberg]

2024-12-10 Thread via GitHub



jiayuasu commented on code in PR #10981:
URL: https://github.com/apache/iceberg/pull/10981#discussion_r1879080772


##
format/spec.md:
##
@@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific 
timestamp. The discrepancie
 
 When processing point in time queries implementations should use 
"snapshot-log" metadata to lookup the table state at the given point in time. 
This ensures time-travel queries reflect the state of the table at the provided 
timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP 
AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table 
just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the 
metadata from that snapshot to perform the scan of the table. If no  snapshot 
exists prior to the timestamp given or "snapshot-log" is not populated (it is 
an optional field), then systems should raise an informative error message 
about the missing metadata.
 
+## Appendix G: Geospatial Notes
+
+The Geometry class hierarchy and its WKT and WKB serializations (ISO 
supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation 
Specification for Geographic information – Simple feature access – Part 1: 
Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from 
[OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/).
+
+The version of the OGC standard first used here is 1.2.1, but future versions 
may also used if the WKB representation remains wire-compatible.
+

Review Comment:
   I suggest we put this back as we all agreed during the emails and meetings. 
This will make things much easier. E.g., readers of the geometry data can 
filter bbox based on min/max directly without needing to know the lon/lat order 
in CRS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spec: Support geo type [iceberg]

2024-12-10 Thread via GitHub



jiayuasu commented on code in PR #10981:
URL: https://github.com/apache/iceberg/pull/10981#discussion_r187907


##
format/spec.md:
##
@@ -1633,3 +1656,15 @@ might indicate different snapshot IDs for a specific 
timestamp. The discrepancie
 
 When processing point in time queries implementations should use 
"snapshot-log" metadata to lookup the table state at the given point in time. 
This ensures time-travel queries reflect the state of the table at the provided 
timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP 
AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table 
just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the 
metadata from that snapshot to perform the scan of the table. If no  snapshot 
exists prior to the timestamp given or "snapshot-log" is not populated (it is 
an optional field), then systems should raise an informative error message 
about the missing metadata.
 
+## Appendix G: Geospatial Notes
+
+The Geometry class hierarchy and its WKT and WKB serializations (ISO 
supporting XY, XYZ, XYM, XYZM) are defined by [OpenGIS Implementation 
Specification for Geographic information – Simple feature access – Part 1: 
Common architecture](https://portal.ogc.org/files/?artifact_id=25355), from 
[OGC (Open Geospatial Consortium)](https://www.ogc.org/standard/sfa/).
+
+The version of the OGC standard first used here is 1.2.1, but future versions 
may also used if the WKB representation remains wire-compatible.
+

Review Comment:
   ```suggestion
   
   Coordinate axis order is always (x, y) where x is easting or longitude, and 
y is northing or latitude. This ordering explicitly overrides the axis order 
specified in the CRS.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Fix `Table.scan` to enable case sensitivity argument [iceberg-python]

2024-12-10 Thread via GitHub



jiakai-li opened a new pull request, #1423:
URL: https://github.com/apache/iceberg-python/pull/1423

   **This pull request fixes below issue:**
   - #1421 
   
   The change modified the `DataScan._build_partition_projection` method to 
pass `case_sensitive` argument when calling `inclusive_projection`. So that 
`Table.scan` method respects this configuration. Also added related tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support for loading different hive-metastore versions at Runtime [iceberg]

2024-12-10 Thread via GitHub



github-actions[bot] commented on issue #10401:
URL: https://github.com/apache/iceberg/issues/10401#issuecomment-2533325043

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Can't add column with the same name as a deleted and previously partitioned key column [iceberg]

2024-12-10 Thread via GitHub



github-actions[bot] commented on issue #10487:
URL: https://github.com/apache/iceberg/issues/10487#issuecomment-2533325230

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support for loading different hive-metastore versions at Runtime [iceberg]

2024-12-10 Thread via GitHub



github-actions[bot] closed issue #10401: Support for loading different 
hive-metastore versions at Runtime
URL: https://github.com/apache/iceberg/issues/10401


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Flink] the precondition judgment is incorrect in the FlinkSink class. [iceberg]

2024-12-10 Thread via GitHub



github-actions[bot] commented on PR #10669:
URL: https://github.com/apache/iceberg/pull/10669#issuecomment-2533325348

   This pull request has been marked as stale due to 30 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@iceberg.apache.org list. Thank you for your 
contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] core: support support move a column with same name after rename column [iceberg]

2024-12-10 Thread via GitHub



github-actions[bot] commented on PR #10862:
URL: https://github.com/apache/iceberg/pull/10862#issuecomment-2533325429

   This pull request has been marked as stale due to 30 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@iceberg.apache.org list. Thank you for your 
contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core, Flink, Spark, KafkaConnect: Remove remaining usage of deprecated path API [iceberg]

2024-12-10 Thread via GitHub



ebyhr commented on code in PR #11744:
URL: https://github.com/apache/iceberg/pull/11744#discussion_r1879109181


##
flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/TestHelpers.java:
##
@@ -602,7 +602,7 @@ public static void assertEquals(ContentFile expected, 
ContentFile actual)
 assertThat(actual).isNotNull();
 assertThat(actual.specId()).as("SpecId").isEqualTo(expected.specId());
 assertThat(actual.content()).as("Content").isEqualTo(expected.content());
-assertThat(actual.path()).as("Path").isEqualTo(expected.path());
+assertThat(actual.location()).as("Path").isEqualTo(expected.location());

Review Comment:
   ```suggestion
   
assertThat(actual.location()).as("Location").isEqualTo(expected.location());
   ```



##
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestExpireSnapshotsProcedure.java:
##
@@ -300,7 +300,7 @@ public void testExpireDeleteFiles() throws Exception {
 Assert.assertEquals("Should have 1 delete file", 1, 
TestHelpers.deleteFiles(table).size());
 Path deleteManifestPath = new 
Path(TestHelpers.deleteManifests(table).iterator().next().path());
 DeleteFile deleteFile = TestHelpers.deleteFiles(table).iterator().next();
-Path deleteFilePath = new Path(String.valueOf(deleteFile.path()));
+Path deleteFilePath = new Path(String.valueOf(deleteFile.location()));

Review Comment:
   Why do we need `String.valueOf` even though `deleteFile.location()` is 
String?



##
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java:
##
@@ -81,7 +81,7 @@ private static String taskInfo() {
* @param files a list of files to delete
*/
   public static void deleteFiles(String context, FileIO io, List> files) {
-List paths = Lists.transform(files, file -> 
file.path().toString());
+List paths = Lists.transform(files, file -> file.location());

Review Comment:
   Spark v3.5 uses method reference here. 



##
spark/v3.3/spark/src/test/java/org/apache/iceberg/ValidationHelpers.java:
##
@@ -42,7 +42,7 @@ public static List snapshotIds(Long... ids) {
   }
 
   public static List files(ContentFile... files) {
-return Arrays.stream(files).map(file -> 
file.path().toString()).collect(Collectors.toList());
+return Arrays.stream(files).map(file -> 
file.location()).collect(Collectors.toList());

Review Comment:
   nit: We could use method reference. Feel free to ignore this comment. I know 
Spark v3.5 uses this style. 



##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java:
##
@@ -81,7 +81,7 @@ private static String taskInfo() {
* @param files a list of files to delete
*/
   public static void deleteFiles(String context, FileIO io, List> files) {
-List paths = Lists.transform(files, file -> 
file.path().toString());
+List paths = Lists.transform(files, file -> file.location());

Review Comment:
   Spark v3.5 uses method reference here. 



##
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java:
##
@@ -635,7 +635,7 @@ private static void importUnpartitionedSparkTable(
   if (checkDuplicateFiles) {
 Dataset importedFiles =
 spark
-.createDataset(Lists.transform(files, f -> 
f.path().toString()), Encoders.STRING())
+.createDataset(Lists.transform(files, f -> f.location()), 
Encoders.STRING())

Review Comment:
   Spark v3.5 uses method reference here. 



##
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java:
##
@@ -153,7 +153,7 @@ public boolean next() throws IOException {
   if (currentTask != null && !currentTask.isDataTask()) {
 String filePaths =
 referencedFiles(currentTask)
-.map(file -> file.path().toString())
+.map(file -> file.location())

Review Comment:
   Spark v3.5 uses method reference here. 



##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java:
##
@@ -148,7 +148,7 @@ public boolean next() throws IOException {
   if (currentTask != null && !currentTask.isDataTask()) {
 String filePaths =
 referencedFiles(currentTask)
-.map(file -> file.path().toString())
+.map(file -> file.location())

Review Comment:
   Spark v3.5 uses method reference here. Feel free to ignore this and other 
comments about method reference if it's intentional. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-u

Re: [PR] Add Strict projection [iceberg-python]

2024-12-10 Thread via GitHub



Fokko commented on code in PR #539:
URL: https://github.com/apache/iceberg-python/pull/539#discussion_r1879010689


##
pyiceberg/transforms.py:
##
@@ -766,6 +858,47 @@ def _truncate_number(
 return None
 
 
+def _truncate_number_strict(
+name: str, pred: BoundLiteralPredicate[L], transform: 
Callable[[Optional[L]], Optional[L]]
+) -> Optional[UnboundPredicate[Any]]:
+boundary = pred.literal
+
+if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, 
TimestampLiteral)):
+raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+if isinstance(pred, BoundLessThan):
+return LessThan(Reference(name), _transform_literal(transform, 
boundary))
+elif isinstance(pred, BoundLessThanOrEqual):
+return LessThan(Reference(name), _transform_literal(transform, 
boundary.increment()))  # type: ignore
+elif isinstance(pred, BoundGreaterThan):
+return GreaterThan(Reference(name), _transform_literal(transform, 
boundary))
+elif isinstance(pred, BoundGreaterThanOrEqual):
+return GreaterThan(Reference(name), _transform_literal(transform, 
boundary.decrement()))  # type: ignore

Review Comment:
   Hey @ZENOTME Again, thanks for pointing this out. It turns out that most of 
these oddities are caused by a bug that was part of Iceberg `≤0.10.0`. We did 
not port this to PyIceberg: 
https://github.com/apache/iceberg/blob/ac6509a4e469f808bebb8b713a5c4213f98ff4a5/api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java#L275



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spec: Support geo type [iceberg]

2024-12-10 Thread via GitHub



szehon-ho commented on PR #10981:
URL: https://github.com/apache/iceberg/pull/10981#issuecomment-2533103710

   Update, there was a sync with @jiayuasu @flyrain @dmitrykoval @paleolimbot 
@rdblue and Menelaos, it was decided the following (meeting notes):
   
   My summary is that we decided to have two types:
   
   * geometry(crs_id) always uses linear edges but can have a geographic CRS
   * geography(crs_id, algorithm) always uses geodesic edges defined by a 
geographic CRS
   * A geography’s algorithm approximates the edges and must be used 
consistently. A spherical approximation is considered a different algorithm.
   * The crs_id is opaque, but could be srid: to select a specific SRID, 
or projjson: to select a JSON CRS in a table property
   * Neither Parquet nor Iceberg is responsible for providing CRS definitions, 
but may include them for convenience (if they can considering copyright or 
other legal considerations)
   
   Here are the specific points I think we decided on:
   
   * Planar/linear edges are always associated with the geometry type. Geometry 
should always use linear edges.
   * Parquet and Iceberg should have a geometry type because users already 
expect the linear behavior
   * Geometry needs to support geographic CRS
   * Geometry needs a CRS parameter, but not an edge parameter
   * Geography never uses linear edges
   * Geography edges are always interpreted as edges on the spheroid defined by 
the geographic CRS (geodesics)
 * An exception here, which is that if the algorithm specified is 
spherical, then we are talking about geodesics (great circle arcs) on a sphere.
   I think it is important to notice (and specify/require) that if the 
algorithm is spherical, then the radius of the underlying sphere is 
assumed/expected to be the mean radius of the spheroid specified by the CRS, 
where the mean radius is always defined as (2 * major_axis_length + 
minor_axis_length) / 3.
   * Geography bounding boxes must include the northmost/southmost points on 
edges
   * Geography edge calculations use a particular algorithm, which may 
introduce either approximation errors (for instance, Vincenty) or may simplify 
the problem and introduce representation errors (i.e. Spherical)
   * The edge calculation algorithm must be a parameter of the geography type 
(i.e. spherical, andoyer, vincenty, etc.)
   * The algorithm is set by what the writer creating the table can produce (vs 
having a default in the format)
   * Writers must not write if they cannot produce bounding boxes using the 
correct algorithm
   * Engines should reject non-geographic CRS for geography columns
   * we decided that coordinates should be limited to [-180, 180] and [-90, 90] 
for geography.
   
   
   updating the pr based on the same.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Strict projection [iceberg-python]

2024-12-10 Thread via GitHub



Fokko commented on code in PR #539:
URL: https://github.com/apache/iceberg-python/pull/539#discussion_r1879028369


##
pyiceberg/transforms.py:
##
@@ -766,6 +858,47 @@ def _truncate_number(
 return None
 
 
+def _truncate_number_strict(
+name: str, pred: BoundLiteralPredicate[L], transform: 
Callable[[Optional[L]], Optional[L]]
+) -> Optional[UnboundPredicate[Any]]:
+boundary = pred.literal
+
+if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, 
TimestampLiteral)):
+raise ValueError(f"Expected a numeric literal, got: {type(boundary)}")
+
+if isinstance(pred, BoundLessThan):
+return LessThan(Reference(name), _transform_literal(transform, 
boundary))
+elif isinstance(pred, BoundLessThanOrEqual):
+return LessThan(Reference(name), _transform_literal(transform, 
boundary.increment()))  # type: ignore
+elif isinstance(pred, BoundGreaterThan):
+return GreaterThan(Reference(name), _transform_literal(transform, 
boundary))
+elif isinstance(pred, BoundGreaterThanOrEqual):
+return GreaterThan(Reference(name), _transform_literal(transform, 
boundary.decrement()))  # type: ignore

Review Comment:
   Created a PR to refactor the tests: 
https://github.com/apache/iceberg-python/pull/1422



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add CMake format [iceberg-cpp]

2024-12-10 Thread via GitHub



Fokko commented on code in PR #5:
URL: https://github.com/apache/iceberg-cpp/pull/5#discussion_r1879041334


##
cmake-format.py:
##
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# cmake-format configuration file
+
+# -
+# Options affecting formatting.
+# -
+with section("format"):
+# How wide to allow formatted cmake files
+line_width = 90

Review Comment:
   `iceberg-python` is at 130: 
https://github.com/apache/iceberg-python/blob/ede363bd63640f3db6c3b3d6904a5765dafd2600/pyproject.toml#L916



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add clang format [iceberg-cpp]

2024-12-10 Thread via GitHub



zhjwpku commented on code in PR #4:
URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1879216891


##
.clang-format:
##
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+BasedOnStyle: Google
+ColumnLimit: 90
+DerivePointerAlignment: false
+IncludeBlocks: Preserve

Review Comment:
   I plan to make the following change about the `IncludeBlocks`
   
   ```
   -IncludeBlocks: Preserve
   +IncludeBlocks: Regroup
   +IncludeCategories:
   +- Regex: '^<.*/' # Like 
   +  Priority: 3
   +- Regex: '^<.*\.' # Like 
   +  Priority: 1
   +- Regex: '^<' # Like 
   +  Priority: 2
   +- Regex: '^"' # Like "util/auth-util.h"
   +  Priority: 4
   ```
   
   Before I push a new commit, I'd like to explain why I think this makes sense.
   
   I did a experiment, `Regroup` seems to be the default setting if we don't 
specify a `IncludeBlocks` setting, I put it here explicitly because I think 
it's more clear for the readers. With the `IncludeCategories` settings, the 
following:
   
   ```
   #include 
   #include 
   
   #include 
   
   #include "util/auth-util.h"
   
   #include 
   
   #include 
   
   #include "demo_table.h"
   ```
   
   goes to:
   
   ```
   #include "demo_table.h"
   
   #include 
   
   #include 
   #include 
   #include 
   
   #include 
   
   #include "util/auth-util.h"
   ```
   
   It's slightly different from Impala's setting because I think putting the 
`Regex` as the first line of each rule is more readable, "If match this regex, 
then goes priority xxx".
   
   What do you about this proposal? @gaborkaszab @pitrou @Fokko @wgtmac @raulcd 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add clang format [iceberg-cpp]

2024-12-10 Thread via GitHub



zhjwpku commented on code in PR #4:
URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1879216891


##
.clang-format:
##
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+BasedOnStyle: Google
+ColumnLimit: 90
+DerivePointerAlignment: false
+IncludeBlocks: Preserve

Review Comment:
   I plan to make the following changes after we reached a consensus.
   
   ```
   -IncludeBlocks: Preserve
   +IncludeBlocks: Regroup
   +IncludeCategories:
   +- Regex: '^<.*/' # Like 
   +  Priority: 3
   +- Regex: '^<.*\.' # Like 
   +  Priority: 1
   +- Regex: '^<' # Like 
   +  Priority: 2
   +- Regex: '^"' # Like "util/auth-util.h"
   +  Priority: 4
   ```
   
   Before I push a new commit, I'd like to explain why I think this makes sense.
   
   I did a experiment, `Regroup` seems to be the default setting if we don't 
specify a `IncludeBlocks` setting, I put it here explicitly because I think 
it's more clear for the readers. With the `IncludeCategories` settings, the 
following:
   
   ```
   #include 
   #include 
   
   #include 
   
   #include "util/auth-util.h"
   
   #include 
   
   #include 
   
   #include "demo_table.h"
   ```
   
   goes to:
   
   ```
   #include "demo_table.h"
   
   #include 
   
   #include 
   #include 
   #include 
   
   #include 
   
   #include "util/auth-util.h"
   ```
   
   It's slightly different from Impala's setting because I think putting the 
`Regex` as the first line of each rule is more readable, "If match this regex, 
then goes priority xxx".
   
   What do you about this proposal? @gaborkaszab @pitrou @Fokko @wgtmac @raulcd 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



pan3793 commented on PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2533468089

   @viirya @LuciferYang thank you for the guidance, let me try


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: Enable RetryMode for AWS KMS client [iceberg]

2024-12-10 Thread via GitHub



amogh-jahagirdar merged PR #11420:
URL: https://github.com/apache/iceberg/pull/11420


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: Enable Adaptive Retries for AWS KMS client [iceberg]

2024-12-10 Thread via GitHub



hsiang-c commented on PR #11420:
URL: https://github.com/apache/iceberg/pull/11420#issuecomment-2533470077

   Thank you @amogh-jahagirdar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] doc: add RisingWave to users [iceberg-rust]

2024-12-10 Thread via GitHub



xxchan opened a new pull request, #775:
URL: https://github.com/apache/iceberg-rust/pull/775

   Signed-off-by: xxchan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Comment and assertion mismatch in PartitionedWritesTestBase [iceberg]

2024-12-10 Thread via GitHub



wzx140 opened a new issue, #11747:
URL: https://github.com/apache/iceberg/issues/11747

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   There is a mismatch between the test comment and the assertion in the 
following unit test, which might lead to confusion or logical errors.
   
   Here is the relevant code snippet:
   
   ```java
 @TestTemplate
 public void testInsertAppend() {
   assertThat(scalarSql("SELECT count(*) FROM %s", selectTarget()))
   .as("Should have 5 rows after insert")
   .isEqualTo(3L);
   ```
   - The comment "Should have 5 rows after insert" indicates the expectation of 
5 rows in the table.
   - However, the actual assertion isEqualTo(3L) checks for 3 rows instead.
   
   ### Willingness to contribute
   
   - [X] I can contribute a fix for this bug independently
   - [X] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Table Scan Delete File Handling: Positional and Equality Delete Support [iceberg-rust]

2024-12-10 Thread via GitHub



sdd commented on code in PR #652:
URL: https://github.com/apache/iceberg-rust/pull/652#discussion_r1879516829


##
crates/iceberg/src/scan.rs:
##
@@ -951,6 +1077,82 @@ impl FileScanTask {
 }
 }
 
+type DeleteFileManagerResult = 
Result>>>;
+
+/// Manages async retrieval of all the delete files from FileIO that are
+/// applicable to the scan. Provides references to them for inclusion within 
FileScanTasks
+#[derive(Debug, Clone)]
+struct DeleteFileManager {

Review Comment:
   Refactored to a separate module and rebased back on latest `main` as it was 
getting a bit stale. I'll be working to update this PR to bring it closer to 
`DeleteFileIndex`, ideally in a way that allows me to split this into smaller 
PRs as well



##
crates/iceberg/src/spec/delete_file.rs:
##
@@ -0,0 +1,780 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::HashMap;
+
+use arrow_array::{Int64Array, RecordBatch, StringArray};
+use arrow_schema::SchemaRef;
+use futures::StreamExt;
+use parquet::arrow::PARQUET_FIELD_ID_META_KEY;
+
+use crate::scan::ArrowRecordBatchStream;
+use crate::{Error, ErrorKind, Result};
+
+pub(crate) const FIELD_ID_DELETE_FILE_PATH: i32 = i32::MAX - 101;
+pub(crate) const FIELD_ID_DELETE_POS: i32 = i32::MAX - 102;
+
+// Represents a parsed Delete file that can be safely stored
+// in the Object Cache.
+pub(crate) enum Deletes {
+// Positional delete files are parsed into a map of
+// filename to a sorted list of row indices.
+// TODO: Ignoring the stored rows that are present in
+//   positional deletes for now. I think they only used for statistics?
+Positional(HashMap>),
+
+// Equality delete files are initially parsed solely as an
+// unprocessed list of `RecordBatch`es from the equality
+// delete files.
+// I don't think we can do better than this by
+// storing a Predicate (because the equality deletes use the
+// field_id rather than the field name, so if we keep this as
+// a Predicate then a field name change would break it).
+// Similarly, I don't think we can store this as a BoundPredicate
+// as the column order could be different across different data
+// files and so the accessor in the bound predicate could be invalid).
+Equality(Vec),
+}
+
+enum PosDelSchema {
+WithRow,
+WithoutRow,
+}
+
+fn validate_schema(schema: SchemaRef) -> Result {
+let fields = schema.flattened_fields();
+match fields.len() {
+2 | 3 => {
+let Some(path_field_id_raw) = 
fields[0].metadata().get(PARQUET_FIELD_ID_META_KEY)
+else {
+return Err(Error::new(
+ErrorKind::DataInvalid,
+"Positional Delete file did not have the expected schema 
(missing field ID for field 0)",
+));
+};
+
+let Some(pos_field_id_raw) = 
fields[1].metadata().get(PARQUET_FIELD_ID_META_KEY) else {
+return Err(Error::new(
+ErrorKind::DataInvalid,
+"Positional Delete file did not have the expected schema 
(missing field ID for field 1)",
+));
+};
+
+let Ok(path_field_id_val) = path_field_id_raw.parse::() else {
+return Err(Error::new(
+ErrorKind::DataInvalid,
+"Positional Delete file did not have the expected schema 
(field ID 0 not parseable as an int)",
+));
+};
+
+let Ok(pos_field_id_val) = pos_field_id_raw.parse::() else {
+return Err(Error::new(
+ErrorKind::DataInvalid,
+"Positional Delete file did not have the expected schema 
(field ID 1 not parseable as an int)",
+));
+};
+
+if path_field_id_val != FIELD_ID_DELETE_FILE_PATH
+|| pos_field_id_val != FIELD_ID_DELETE_POS
+{
+Err(Error::new(
+ErrorKind::DataInvalid,
+"Positional Delete file did not have the expected schema 
(unexpected field ID)",
+))
+} else if fields.len() == 2 {
+

Re: [PR] WIP proposal for fixing issue #11742 [iceberg]

2024-12-10 Thread via GitHub



jkolash commented on PR #11743:
URL: https://github.com/apache/iceberg/pull/11743#issuecomment-2532509896

   Closing this as I now see I can just set the 
spark.sql.catalog.default_iceberg property will add more detail to the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]

2024-12-10 Thread via GitHub



jkolash closed issue #11742: 
org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs 
shouldn't default to hive catalog
URL: https://github.com/apache/iceberg/issues/11742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]

2024-12-10 Thread via GitHub



JanKaul commented on code in PR #11365:
URL: https://github.com/apache/iceberg/pull/11365#discussion_r1878651163


##
format/view-spec.md:
##
@@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view 
version. Common metad
 
 View definitions can be represented in multiple ways. Representations are 
documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use.
+A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use. For the table identifiers in the view 
definitions to be interoperable across engines, the following must be true:
+* All engines must resolve a fully specified SQL identifier to the same table 
in the same catalog.

Review Comment:
   I would assume the resolution of the identifiers to work something like this:
   
   0. Get list of available catalogs
   1. Set the table-name equal to the last element.
   2. Check if the second to last element exists.
- If it does, set the namespace equal to it.
- If it doesn't, use the default-namespace
   3. Check if the next element in reverse direction exists
- if the element is equal to an available catalog, set catalog-name to it
- if the element exists and is not an available catalog, prepend it to the 
namespace and continue with step 3.
- if the element doesn't exist, use the default-catalog 
   4. Check if the resolved catalog, namespace, and table exist
   
   The important thing is that if the namespace or catalog are not present, the 
default-namespace and default-catalog from the view version are used. The 
resolution currently doesn't fallback to use the view catalog. 
   
   If we would now opt to use the view catalog instead of the default-catalog 
if the catalog is not present, the behavior would be different compared to the 
current view spec. This is what I mean with breaking change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core, Flink, Spark: Drop deprecated APIs scheduled for removal in 1.8.0 [iceberg]

2024-12-10 Thread via GitHub



nastra merged PR #11721:
URL: https://github.com/apache/iceberg/pull/11721


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Make fiel-id of name-mapping optional [iceberg-python]

2024-12-10 Thread via GitHub



barronw commented on issue #1420:
URL: 
https://github.com/apache/iceberg-python/issues/1420#issuecomment-2531604338

   Can I pick this up?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] REST: AuthManager API [iceberg]

2024-12-10 Thread via GitHub



adutra commented on code in PR #10753:
URL: https://github.com/apache/iceberg/pull/10753#discussion_r1878163433


##
core/src/main/java/org/apache/iceberg/rest/auth/AuthSession.java:
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest.auth;
+
+import org.apache.iceberg.rest.HTTPRequest;
+
+/**
+ * An authentication session that can be used to authenticate outgoing HTTP 
requests.
+ *
+ * Authentication sessions are usually immutable, but may hold resources 
that need to be released
+ * when the session is no longer needed. Implementations should override 
{@link #close()} to release
+ * any resources.
+ */
+public interface AuthSession extends AutoCloseable {

Review Comment:
   Do you mind if we do the rename in an immediate follow-up PR? I am a bit 
concerned that as soon as I rename to `SupportsAuthentication`, someone else 
would chime in and suggest something different. Naming is always a bit 
controversial :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-10 Thread via GitHub



aihuaxu commented on code in PR #11415:
URL: https://github.com/apache/iceberg/pull/11415#discussion_r1878440484


##
core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java:
##
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.variants;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Map;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.util.Pair;
+import org.apache.iceberg.util.SortedMerge;
+
+/**
+ * A variant Object that handles full or partial shredding.
+ *
+ * Metadata stored for an object must be the same regardless of whether the 
object is shredded.
+ * This class assumes that the metadata from the unshredded object can be used 
for the shredded
+ * fields. This also does not allow updating or replacing the metadata for the 
unshredded object,
+ * which could require recursively rewriting field IDs.
+ */
+class ShreddedObject implements VariantObject {

Review Comment:
   Not an issue. I was thinking that we can review the original change without 
shredding faster so we can merge quickly. I haven't reviewed this 
ShreddedObject yet. Let me check out then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-10 Thread via GitHub



aihuaxu commented on code in PR #11415:
URL: https://github.com/apache/iceberg/pull/11415#discussion_r1878469751


##
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java:
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.variants;
+
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.charset.StandardCharsets;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.variants.Variants.Primitives;
+
+class PrimitiveWrapper implements VariantPrimitive {
+  private static final byte NULL_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_NULL);
+  private static final byte TRUE_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_TRUE);
+  private static final byte FALSE_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_FALSE);
+  private static final byte INT8_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_INT8);
+  private static final byte INT16_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_INT16);
+  private static final byte INT32_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_INT32);
+  private static final byte INT64_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_INT64);
+  private static final byte FLOAT_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_FLOAT);
+  private static final byte DOUBLE_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_DOUBLE);
+  private static final byte DATE_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_DATE);
+  private static final byte TIMESTAMPTZ_HEADER =
+  VariantUtil.primitiveHeader(Primitives.TYPE_TIMESTAMPTZ);
+  private static final byte TIMESTAMPNTZ_HEADER =
+  VariantUtil.primitiveHeader(Primitives.TYPE_TIMESTAMPNTZ);
+  private static final byte DECIMAL4_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL4);
+  private static final byte DECIMAL8_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL8);
+  private static final byte DECIMAL16_HEADER =
+  VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL16);
+  private static final byte BINARY_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_BINARY);
+  private static final byte STRING_HEADER = 
VariantUtil.primitiveHeader(Primitives.TYPE_STRING);
+
+  private final Variants.PhysicalType type;
+  private final T value;
+  private ByteBuffer buffer = null;
+
+  PrimitiveWrapper(Variants.PhysicalType type, T value) {
+this.type = type;
+this.value = value;
+  }
+
+  @Override
+  public Variants.PhysicalType type() {
+return type;
+  }
+
+  @Override
+  public T get() {
+return value;
+  }
+
+  @Override
+  public int sizeInBytes() {
+switch (type()) {
+  case NULL:
+  case BOOLEAN_TRUE:
+  case BOOLEAN_FALSE:
+return 1; // 1 header only
+  case INT8:
+return 2; // 1 header + 1 value
+  case INT16:
+return 3; // 1 header + 2 value
+  case INT32:
+  case DATE:
+  case FLOAT:
+return 5; // 1 header + 4 value
+  case INT64:
+  case DOUBLE:
+  case TIMESTAMPTZ:
+  case TIMESTAMPNTZ:
+return 9; // 1 header + 8 value
+  case DECIMAL4:
+return 6; // 1 header + 1 scale + 4 unscaled value
+  case DECIMAL8:
+return 10; // 1 header + 1 scale + 8 unscaled value
+  case DECIMAL16:
+return 18; // 1 header + 1 scale + 16 unscaled value
+  case BINARY:
+return 5 + ((ByteBuffer) value).remaining(); // 1 header + 4 length + 
value length

Review Comment:
   Let's say we have a PrimitiveWrapper of a binary `b = new PrimitiveWrapper( 
binary, 0x1234)`, would we always expect b.sizeInBytes() always to be 5 + 2 and 
we shouldn't change that for this method contract? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To

Re: [PR] GCP: Implement SupportsRecoveryOperations for GCSFileIO [iceberg]

2024-12-10 Thread via GitHub



amogh-jahagirdar commented on code in PR #11565:
URL: https://github.com/apache/iceberg/pull/11565#discussion_r1878209991


##
gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java:
##
@@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream 
blobIdsToDelete) {
 Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), 
gcpProperties.deleteBatchSize()))
 .forEach(batch -> client().delete(batch));
   }
+
+  @Override
+  public boolean recoverFile(String path) {
+Preconditions.checkArgument(
+!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be 
null or empty");
+
+try {
+  BlobId blobId = BlobId.fromGsUtilUri(path);
+
+  // first attempt to restore with soft-delete
+  if (recoverSoftDeletedObject(blobId)) {
+return true;
+  }
+
+  // fallback to restoring by copying the latest version
+  if (recoverLatestVersion(blobId)) {
+return true;
+  }
+
+} catch (IllegalArgumentException e) {
+  LOG.warn("Invalid GCS path format: {}", path, e);
+}
+
+return false;
+  }
+
+  /**
+   * Attempts to restore a soft-deleted object.
+   *
+   * Requires {@code storage.objects.restore} permission
+   *
+   * See https://cloud.google.com/storage/docs/use-soft-deleted-objects#restore";>docs
+   *
+   * @param blobId the blob identifier
+   * @return {@code true} if blob was recovered, {@code false} if not
+   */
+  protected boolean recoverSoftDeletedObject(BlobId blobId) {
+try {
+  Optional latestSoftDeletedBlob =
+  client()
+  .list(
+  blobId.getBucket(),
+  Storage.BlobListOption.prefix(blobId.getName()),
+  Storage.BlobListOption.softDeleted(true))
+  .streamAll()
+  .filter(blob -> blob.getName().equals(blobId.getName()))
+  .max(Comparator.comparing(Blob::getSoftDeleteTime));
+
+  if (latestSoftDeletedBlob.isPresent()) {
+client().restore(latestSoftDeletedBlob.get().getBlobId());
+LOG.info("Soft delete object restored file {}", blobId);
+return true;
+  }
+  LOG.warn("No soft deleted object was found");
+
+} catch (StorageException e) {
+  LOG.warn("Failed to restore", e);
+}
+
+return false;
+  }
+
+  /**
+   * Attempts to restore the latest deleted object version.
+   *
+   * See https://cloud.google.com/storage/docs/using-versioned-objects#restore";>docs
+   *
+   * @param blobId the blob identifier
+   * @return {@code true} if blob was recovered, {@code false} if not
+   */
+  protected boolean recoverLatestVersion(BlobId blobId) {
+try {
+  Optional latestDeletedVersion =
+  client()
+  .list(
+  blobId.getBucket(),
+  Storage.BlobListOption.prefix(blobId.getName()),
+  Storage.BlobListOption.versions(true))
+  .streamAll()
+  .filter(blob -> blob.getName().equals(blobId.getName()))
+  .max(Comparator.comparing(Blob::getUpdateTimeOffsetDateTime))
+  .filter(blob -> blob.getDeleteTimeOffsetDateTime() != null);
+
+  if (latestDeletedVersion.isPresent()) {
+Storage.CopyRequest copyRequest =
+Storage.CopyRequest.newBuilder()
+.setSource(latestDeletedVersion.get().getBlobId())
+.setTarget(blobId)
+.build();
+Blob blob = client().copy(copyRequest).getResult();
+LOG.info("Latest deleted version was restored for {}", 
blob.getBlobId());
+return true;
+  }
+  LOG.warn("No latest deleted version was found");
+
+} catch (StorageException e) {
+  LOG.warn("Failed to restore latest deleted version", e);

Review Comment:
   Same as above



##
gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java:
##
@@ -242,4 +248,106 @@ private void internalDeleteFiles(Stream 
blobIdsToDelete) {
 Streams.stream(Iterators.partition(blobIdsToDelete.iterator(), 
gcpProperties.deleteBatchSize()))
 .forEach(batch -> client().delete(batch));
   }
+
+  @Override
+  public boolean recoverFile(String path) {
+Preconditions.checkArgument(
+!Strings.isNullOrEmpty(path), "Cannot recover file: path must not be 
null or empty");
+
+try {
+  BlobId blobId = BlobId.fromGsUtilUri(path);
+
+  // first attempt to restore with soft-delete
+  if (recoverSoftDeletedObject(blobId)) {
+return true;
+  }
+
+  // fallback to restoring by copying the latest version
+  if (recoverLatestVersion(blobId)) {
+return true;
+  }
+
+} catch (IllegalArgumentException e) {
+  LOG.warn("Invalid GCS path format: {}", path, e);
+}
+
+return false;
+  }
+
+  /**
+   * Attempts to restore a soft-deleted object.
+   *
+   * Requires {@code storage.objects.restore} permission
+   *
+   * See https://cloud.google.com/storage/docs/use-soft-deleted-ob

Re: [I] Make field-id of name-mapping optional [iceberg-python]

2024-12-10 Thread via GitHub



Fokko commented on issue #1420:
URL: 
https://github.com/apache/iceberg-python/issues/1420#issuecomment-2531944719

   @barronw certainly, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]

2024-12-10 Thread via GitHub



jkolash opened a new issue, #11742:
URL: https://github.com/apache/iceberg/issues/11742

   ### Feature Request / Improvement
   
   
https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java#L239
   
   It should default to another catalog implementation as hive is a bigger 
dependency than other catalog implementations.
   
   HadoopCatalog does work when I manually in the debugger set
   spark.sql.catalog.default_iceberg.type=hadoop
   spark.sql.catalog.default_iceberg.warehouse=/tmp/testing/warehouse/
   
   I encountered this issue because I would like the ability to load a 
metadata.json directly via spark
   
   ```scala
   spark.read.format("iceberg").load("s3://bucket/path/metadata/metadata.json")
   ```
   
   and encounter missing hive dependencies
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [X] I can contribute this improvement/feature independently
   - [X] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



viirya commented on PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532312814

   Does Iceberg implement ColumnVector interface which is similar to writable 
column vector or constant column vector?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]

2024-12-10 Thread via GitHub



bennychow commented on code in PR #11365:
URL: https://github.com/apache/iceberg/pull/11365#discussion_r1878496657


##
format/view-spec.md:
##
@@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view 
version. Common metad
 
 View definitions can be represented in multiple ways. Representations are 
documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use.
+A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use. For the table identifiers in the view 
definitions to be interoperable across engines, the following must be true:
+* All engines must resolve a fully specified SQL identifier to the same table 
in the same catalog.

Review Comment:
   So, I was imagining that table resolution in the query engine would work 
like this:
   
   1. Assume SQL identifier is already fully qualified and so first segment is 
the catalog name.  Query engine would see if this catalog exists and can 
resolve the remaining segments into a table/view.
   2. Prepend the view's catalog and default-namespace (optional) to the SQL 
identifier.  Now, try to resolve with the query engine's registered catalogs.
   3. If default-catalog is set, prepend the default-catalog and 
default-namespace (optional) to the SQL identifier.  Again, try to resolve with 
query engine's registered catalogs.
   
   I don't think there is anything new with item 1 and 3 but item 2 is 
interesting because it allows two different engines to reference the same table 
using different catalog names.  This works only when the table and view are 
stored in the same catalog.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]

2024-12-10 Thread via GitHub



jkolash commented on issue #11742:
URL: https://github.com/apache/iceberg/issues/11742#issuecomment-2532329867

   This is probably the simplest fix
   https://github.com/apache/iceberg/pull/11743
   
   allow users to set these properties directly 
   
   a bigger fix may be to use another catalog implementation but given that 
HadoopCatalog and HiveCatalog are scheduled to be maybe outside of this project 
it isn't clear which one should be chosen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: adding s3 tables catalog [iceberg]

2024-12-10 Thread via GitHub



stubz151 closed pull request #11739: AWS: adding s3 tables catalog
URL: https://github.com/apache/iceberg/pull/11739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]

2024-12-10 Thread via GitHub



raulcd commented on code in PR #7:
URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878388213


##
.github/workflows/test.yml:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Test
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+env:
+  ICEBERG_HOME: /tmp/iceberg
+
+jobs:
+  ubuntu:
+name: AMD64 Ubuntu latest
+runs-on: ubuntu-latest

Review Comment:
   Do we want to pin macOS too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]

2024-12-10 Thread via GitHub



pitrou commented on code in PR #7:
URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878396289


##
.github/workflows/test.yml:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Test
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+env:
+  ICEBERG_HOME: /tmp/iceberg
+
+jobs:
+  ubuntu:
+name: AMD64 Ubuntu latest
+runs-on: ubuntu-latest

Review Comment:
   It also depends how often these images change versions. If it's once every 2 
years, it's less contentious to use "latest" than if GH bumps their versions 
every month.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark : Derive Stats From Manifest on the Fly [iceberg]

2024-12-10 Thread via GitHub



saitharun15 commented on PR #11615:
URL: https://github.com/apache/iceberg/pull/11615#issuecomment-2532200924

   Hi @RussellSpitzer ,@huaxingao can u please review the pr once, Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Rest Catalog: spark catalog api fails working rest based catalog is used [iceberg]

2024-12-10 Thread via GitHub



sunny1154 opened a new issue, #11741:
URL: https://github.com/apache/iceberg/issues/11741

   ### Apache Iceberg version
   
   1.5.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Hi,
   
 I am observing issues when working with rest based catalog. 
 my spark session has default catalog defined which is based of REST based 
catalog.
   
   SparkSession.catalog api fails to work with rest based catalog. 
   tested with Spark 3.4.
   
   ${SPARK_HOME}/bin/spark-shell --master local[*] \
 --driver-memory 2g \
 --conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
 \
 --conf spark.sql.catalog.iceberg.uri=https://xx.xxx..domain.com \
 --conf spark.sql.warehouse.dir=$SQL_WAREHOUSE_DIR \
 --conf spark.sql.defaultCatalog=iceberg \
 --conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \
 --conf 
spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
   
   scala> spark.catalog.currentCatalog
   res1: String = iceberg
   
   scala> spark.sql("select * from restDb.restTable").show
   +---+--+ 
   
   | id|  data|
   +---+--+
   |  1|some_value|
   +---+--+
   
   scala> spark.catalog.tableExists("restDb.restTable")
   **res3: Boolean = true**
   
   scala> spark.catalog.tableExists("restDb", "restTable")
   **res4: Boolean = false**
   
   other API also fail like  
   
   spark.catalog.getTable("restDb", "restTable")
-- fails with database not found
   spark.catalog.getTable("restDb.restTable")
   -- returns table object


   spark.catalog.tableExists("restDb", "restTable") 
   -- return false (even though table exists)
   spark.catalog.tableExists("restDb.restTable")
   -- return true (if table exists and registered with rest catalog)
   

   spark.catalog.listColumns("restDb", "restTable") 
   -- fails with database not found
   spark.catalog.listColumns("restDb.restTable")
   -- return list of columns
   
   
   
   
   
   
 
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [X] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



pan3793 commented on PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532122346

   update: the failure is related to 
[SPARK-50235](https://github.com/apache/spark/pull/48767), and the test passed 
on Spark 3.5.4 RC0 with reverting that patch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix caching table with metadata table names [iceberg]

2024-12-10 Thread via GitHub



manuzhang commented on code in PR #11123:
URL: https://github.com/apache/iceberg/pull/11123#discussion_r1877751060


##
core/src/main/java/org/apache/iceberg/CachingCatalog.java:
##
@@ -145,22 +146,26 @@ public Table loadTable(TableIdentifier ident) {
 }
 
 if (MetadataTableUtils.hasMetadataTableName(canonicalized)) {

Review Comment:
   Please go ahead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core, Spark3.5: Fix tests failure due to timeout [iceberg]

2024-12-10 Thread via GitHub



manuzhang commented on code in PR #11654:
URL: https://github.com/apache/iceberg/pull/11654#discussion_r185553


##
core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCommits.java:
##
@@ -421,7 +423,13 @@ public void testConcurrentFastAppends(@TempDir File dir) 
throws Exception {
 TABLES.create(
 SCHEMA,
 SPEC,
-ImmutableMap.of(COMMIT_NUM_RETRIES, String.valueOf(threadsCount)),
+ImmutableMap.of(
+COMMIT_NUM_RETRIES,
+String.valueOf(threadsCount),
+COMMIT_MIN_RETRY_WAIT_MS,
+"10",
+LOCK_ACQUIRE_TIMEOUT_MS,

Review Comment:
   Disable extra retry on lock acquire failure since commit will fail anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Views] Update view spec with table identifier requirements [iceberg]

2024-12-10 Thread via GitHub



JanKaul commented on code in PR #11365:
URL: https://github.com/apache/iceberg/pull/11365#discussion_r186853


##
format/view-spec.md:
##
@@ -97,7 +97,10 @@ Summary is a string to string map of metadata about a view 
version. Common metad
 
 View definitions can be represented in multiple ways. Representations are 
documented ways to express a view definition.
 
-A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use.
+A view version can have more than one representation. All representations for 
a version must express the same underlying definition. Engines are free to 
choose the representation to use. For the table identifiers in the view 
definitions to be interoperable across engines, the following must be true:
+* All engines must resolve a fully specified SQL identifier to the same table 
in the same catalog.

Review Comment:
   There are two parts to consider: 
   1. Resolution of table references
   3. Portability constraints
   
   I think @bennychow 's suggestion conflates the two. In addition to the 
portability constraint, it describes how a query engine should resolve a 
partial table reference using either the default-catalog or the view catalog. 
This conflicts with how table references are currently resolved in the view 
spec. Currently only the default-catalog is used.
   
   If the main point of this PR is to define the Portability constraints, then 
we should not change the resolution of table references. Changing the table 
reference resolution would also be a breaking change to the view spec.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] WIP proposal for fixing issue #11742 [iceberg]

2024-12-10 Thread via GitHub



jkolash closed pull request #11743: WIP proposal for fixing issue #11742
URL: https://github.com/apache/iceberg/pull/11743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] org.apache.iceberg.spark.source.IcebergSource.setupDefaultSparkCatalogs shouldn't default to hive catalog [iceberg]

2024-12-10 Thread via GitHub



jkolash commented on issue #11742:
URL: https://github.com/apache/iceberg/issues/11742#issuecomment-2532512695

   ok setting 
   spark.sql.catalog.default_iceberg=org.apache.iceberg.spark.SparkCatalog 
   
   will allow you to override the default values.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump Spark 3.5.4 RC1 [iceberg]

2024-12-10 Thread via GitHub



LuciferYang commented on PR #11731:
URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2532512457

   > Please check for some discussion there [apache/spark#49131 
(comment)](https://github.com/apache/spark/pull/49131#issuecomment-2532341673)
   
   Yes, I tested it locally and found that after adding a no-op 
`closeIfFreeable` method to the `IcebergArrowColumnVector.java` in version 3.5, 
all the test cases in `TestRewriteDataFilesAction` were able to pass. We can 
further test it after the release of 3.5.4 RC2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark3.4,3.5: Fix the BUG of iceberg views when resolved "group/order… [iceberg]

2024-12-10 Thread via GitHub



Ppei-Wang commented on code in PR #11729:
URL: https://github.com/apache/iceberg/pull/11729#discussion_r1877165202


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -562,6 +562,58 @@ public void readFromViewWithCTE() throws 
NoSuchTableException {
 assertThat(sql("SELECT * FROM %s", 
viewName)).hasSize(1).containsExactly(row(10, 1L));
   }
 
+  @TestTemplate
+  public void readFromViewWithGroupByOrdinal() throws NoSuchTableException {

Review Comment:
   @nastra Thanks for your suggestion, modified.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Materialized View Spec [iceberg]

2024-12-10 Thread via GitHub



JanKaul commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r1877723130


##
format/view-spec.md:
##
@@ -160,6 +179,56 @@ Each entry in `version-log` is a struct with the following 
fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's 
`current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+ Full identifier
+
+The full identifier holds a reference, containing a namespace and a name, of a 
table or view in the catalog.
+
+| Requirement | Field name | Description |
+|-||-|
+| _optional_  | `catalog`  | A string specifying the name of the catalog. 
If set to `null`, the catalog is the same as the views' catalog |

Review Comment:
   Thank you!



##
format/view-spec.md:
##
@@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another 
provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current 
metadata location will not be changed before the writer's commit. Once a writer 
has created an update, it commits by swapping the view's metadata file pointer 
from the base location to the new location.
 
+### Materialized Views
+
+Materialized views are a type of view that precompute the data from the view 
query.
+When queried, engines may return the precomputed data for the materialized 
views, shifting the cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view 
and an underlying Iceberg table, known as the storage table, which stores the 
precomputed data.
+The metadata for a materialized view extends the common view metadata, adding 
a pointer to the precomputed data and refresh information to determine if the 
data is still fresh. 
+The refresh information is composed of data about the so-called "source 
tables", which are the tables referenced in the query definition of the 
materialized view. 
+The storage table can be in the states of "fresh", "stale" or "invalid", which 
are determined from the following situations:
+* **fresh** -- The `snapshot_id`'s of the last refresh operation match the 
current `snapshot_id`'s of the source tables.

Review Comment:
   Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark3.4,3.5,Api,Hive: Fix using NullType in View. [iceberg]

2024-12-10 Thread via GitHub



Fokko commented on code in PR #11728:
URL: https://github.com/apache/iceberg/pull/11728#discussion_r1877861454


##
api/src/main/java/org/apache/iceberg/types/Types.java:
##
@@ -412,6 +413,24 @@ public String toString() {
 }
   }
 
+  public static class NullType extends PrimitiveType {

Review Comment:
   @Ppei-Wang this would entail extending the [Iceberg table 
specification](https://iceberg.apache.org/spec/), by adding a new NULL type. 
Hive might have a NullType, but other query engines might not, so we need to 
make sure that it is adopted correctly. For changes to the specification, we 
should have a discussion in the community, it should go through a [Iceberg 
improvement 
proposal](https://iceberg.apache.org/contribute/#apache-iceberg-improvement-proposals).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add `DataFileWriter` tests for schema and partition [iceberg-rust]

2024-12-10 Thread via GitHub



Fokko commented on PR #768:
URL: https://github.com/apache/iceberg-rust/pull/768#issuecomment-2531216814

   @jonathanc-n Could you rebase the PR? I'll merge it right away after that 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Simplify partition structures [iceberg-rust]

2024-12-10 Thread via GitHub



Fokko closed pull request #763: Simplify partition structures
URL: https://github.com/apache/iceberg-rust/pull/763


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Simplify partition structures [iceberg-rust]

2024-12-10 Thread via GitHub



Fokko commented on PR #763:
URL: https://github.com/apache/iceberg-rust/pull/763#issuecomment-2531222156

   Closed in favor of https://github.com/apache/iceberg-rust/pull/771


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]

2024-12-10 Thread via GitHub



manuzhang commented on code in PR #11738:
URL: https://github.com/apache/iceberg/pull/11738#discussion_r1878295947


##
core/src/main/java/org/apache/iceberg/CachingCatalog.java:
##
@@ -144,14 +144,16 @@ public Table loadTable(TableIdentifier ident) {
   return cached;
 }
 
-if (MetadataTableUtils.hasMetadataTableName(canonicalized)) {
+Table table = tableCache.get(canonicalized, catalog::loadTable);
+
+if (table instanceof BaseMetadataTable) {
+  // Cache underlying table
   TableIdentifier originTableIdentifier =
   TableIdentifier.of(canonicalized.namespace().levels());
   Table originTable = tableCache.get(originTableIdentifier, 
catalog::loadTable);
 
-  // share TableOperations instance of origin table for all metadata 
tables, so that metadata
-  // table instances are
-  // also refreshed as well when origin table instance is refreshed.
+  // Share TableOperations instance of origin table for all metadata 
tables, so that metadata

Review Comment:
   Do we still need code below when we already cache the metadata table above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Compatibility Issue with pydantic and annotated-types in pyiceberg 0.8.1 [iceberg-python]

2024-12-10 Thread via GitHub



Fokko commented on issue #1418:
URL: 
https://github.com/apache/iceberg-python/issues/1418#issuecomment-2531608924

   @djouallah do you know which version of Pydantic you're using? You can 
easily check it using:
   
   ```
   Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 
(clang-1500.3.9.4)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import pydantic 
   >>> pydantic.__version__
   '2.10.3'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Bump mkdocs-material from 9.5.47 to 9.5.48 [iceberg-python]

2024-12-10 Thread via GitHub



Fokko merged PR #1419:
URL: https://github.com/apache/iceberg-python/pull/1419


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]

2024-12-10 Thread via GitHub



gaborkaszab commented on code in PR #7:
URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1878384745


##
.github/workflows/test.yml:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Test
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+env:
+  ICEBERG_HOME: /tmp/iceberg
+
+jobs:
+  ubuntu:
+name: AMD64 Ubuntu latest
+runs-on: ubuntu-latest

Review Comment:
   With pinning to a specific version we could document that as the supported 
version and moving on to a later version could be performed in a more 
controlled fashion.
   With using latest we might have surprise breaks even without changing code 
in this project. This wouldn't be that easy to investigate either. Also merging 
code would be blocked as long as we don't fix the incompatibility with a new 
Ubuntu version even though we could still deliver code using the release before.
   
   I'm +1 to pin this to a specific version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] REST: AuthManager API [iceberg]

2024-12-10 Thread via GitHub



nastra commented on code in PR #10753:
URL: https://github.com/apache/iceberg/pull/10753#discussion_r1878738170


##
core/src/main/java/org/apache/iceberg/rest/auth/AuthSession.java:
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest.auth;
+
+import org.apache.iceberg.rest.HTTPRequest;
+
+/**
+ * An authentication session that can be used to authenticate outgoing HTTP 
requests.
+ *
+ * Authentication sessions are usually immutable, but may hold resources 
that need to be released
+ * when the session is no longer needed. Implementations should override 
{@link #close()} to release
+ * any resources.
+ */
+public interface AuthSession extends AutoCloseable {

Review Comment:
   I would prefer to do the rename here. @danielcweeks do you have any 
suggestions in terms of a better name for this interface so that we don't clash 
with the `AuthSession` in `OAuth2Util`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add clang format [iceberg-cpp]

2024-12-10 Thread via GitHub



raulcd commented on code in PR #4:
URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1877791436


##
.clang-format:
##
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+BasedOnStyle: Google
+ColumnLimit: 90

Review Comment:
   From my side, both 90 and 100 sound reasonable. I don't have a strong 
preference. With current screen widths and resolutions, I don't see 100 being a 
problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add clang format [iceberg-cpp]

2024-12-10 Thread via GitHub



pitrou commented on code in PR #4:
URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1877805402


##
.clang-format:
##
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+BasedOnStyle: Google
+ColumnLimit: 90

Review Comment:
   I still think we should be mindful of diverse situations:
   * people with a smaller screen (such as a laptop)
   * side-by-side diffs
   * various windows arrangements
   * people who need a larger font because of eyesight issues
   
   There is also the issue that the longer the line, the farther the eye has to 
scan horizontally. It's not always very comfortable to do so. This is why 
newspapers typically use several narrow columns side-by-side.
   
   In the end, of course, the difference between 90 and 100 will not be 
fundamental.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add clang format [iceberg-cpp]

2024-12-10 Thread via GitHub



gaborkaszab commented on code in PR #4:
URL: https://github.com/apache/iceberg-cpp/pull/4#discussion_r1877769090


##
.clang-format:
##
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+BasedOnStyle: Google
+ColumnLimit: 90

Review Comment:
   > I just checked impala's .clang-format, it has ColumnLimit: 90 for both Cpp 
and Java.
   Yes, and I'm involved with the Impala project and can tell you that whoever 
I talked to on the community hates the 90 char limit and would prefer to raise 
it to 100.
   
   > We don't. It's in the case where we decrease the value that adjustments 
are needed.
   I don't think this statement is correct. If you increase the line length 
then there are lines what are broken into 2 but with the new limit will fit 
into one. In case you have auto formatting those will be rewritten, loosing the 
git blame info.
   
   I don't want to block this PR, don't get me wrong. I just want to be more 
careful than simply rushing commits like this as these will have an effect on 
the project forever.
   
   > someone does not agree with others
   This gives me the impression that you say it's me against everyone else :) 
Fokko also politely mentioned the char limits for other projects too.
   
   I see that there is support for the 90 char limit by people coming from the 
Arrow community, so let's just simply give other set of people a bit more time 
to chime in and then we can resolve this comment if nothing changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: Enable RetryMode for AWS KMS client [iceberg]

2024-12-10 Thread via GitHub



hsiang-c commented on code in PR #11420:
URL: https://github.com/apache/iceberg/pull/11420#discussion_r1877559085


##
aws/src/test/java/org/apache/iceberg/aws/kms/TestKmsClientProperties.java:
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.aws.kms;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import org.apache.iceberg.aws.AwsClientProperties;
+import org.junit.jupiter.api.Test;
+import software.amazon.awssdk.core.retry.RetryMode;
+import software.amazon.awssdk.services.kms.KmsClient;
+import software.amazon.awssdk.services.kms.KmsClientBuilder;
+
+public class TestKmsClientProperties {
+  @Test
+  public void testApplyRetryConfiguration() {
+AwsClientProperties awsClientProperties = new AwsClientProperties();
+
+KmsClientBuilder builder = KmsClient.builder();
+awsClientProperties.applyRetryConfigurations(builder);
+RetryMode retryPolicy = builder.overrideConfiguration().retryMode().get();
+
+assertThat(retryPolicy).as("retry mode was not 
set").isEqualTo(RetryMode.ADAPTIVE_V2);

Review Comment:
   Sorry I missed this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core, Flink, Spark: Drop deprecated APIs scheduled for removal in 1.8.0 [iceberg]

2024-12-10 Thread via GitHub



findepi commented on code in PR #11721:
URL: https://github.com/apache/iceberg/pull/11721#discussion_r1877655052


##
flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/data/TestRowProjection.java:
##
@@ -369,19 +346,7 @@ public void testMapProjection() throws IOException {
 assertThat(projected.getMap(0)).isEqualTo(properties);
   }
 
-  private Map toStringMap(Map map) {

Review Comment:
   the ide didn't flag it as unused there... turns out it was too lazy.
   
   thanks for catching!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat!: Remove `BoundPartitionSpec` (WIP) [iceberg-rust]

2024-12-10 Thread via GitHub



Fokko commented on code in PR #771:
URL: https://github.com/apache/iceberg-rust/pull/771#discussion_r1877888621


##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -119,9 +119,11 @@ pub struct TableMetadata {
 /// ID of the table’s current schema.
 pub(crate) current_schema_id: i32,
 /// A list of partition specs, stored as full partition spec objects.
-pub(crate) partition_specs: HashMap,
+pub(crate) partition_specs: HashMap,
 /// ID of the “current” spec that writers should use by default.
-pub(crate) default_spec: BoundPartitionSpecRef,
+pub(crate) default_spec: PartitionSpecRef,
+/// Partition type of the default partition spec.
+pub(crate) default_partition_type: StructType,

Review Comment:
   I had to think about this one, but I think I like it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat!: Remove `BoundPartitionSpec` (WIP) [iceberg-rust]

2024-12-10 Thread via GitHub



Fokko commented on code in PR #771:
URL: https://github.com/apache/iceberg-rust/pull/771#discussion_r1877890713


##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -776,17 +782,12 @@ pub(super) mod _serde {
 .map(|x| (x.spec_id(), Arc::new(x))),
 );
 let default_spec_id = value.default_spec_id;
-let default_spec = partition_specs
+let default_spec: PartitionSpecRef = partition_specs
 .get(&value.default_spec_id)
-.map(|schemaless_spec| {
-(*schemaless_spec.clone())
-.clone()
-.bind(current_schema.clone())
-})
-.transpose()?
+.map(|schemaless_spec| (**schemaless_spec).clone())

Review Comment:
   ```suggestion
   .map(|spec| (**spec).clone())
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

1 2 >

1 - 100 of 147 matches

Mail list logo