date:20250528

[PR] Revert "feat: Introduce snapshot summary properties (#1336)" [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 opened a new pull request, #1390:
URL: https://github.com/apache/iceberg-rust/pull/1390

   
   ## Which issue does this PR close?
   
   This reverts commit 8bb9a882c7e5f6cfd2a9762136e874421cd7fb59.
   
   See https://github.com/apache/iceberg-rust/pull/1336#issuecomment-2915710134
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Add a method in TableCommit to apply changes to TableMetadata to create a new TableMetadata [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on issue #1386:
URL: https://github.com/apache/iceberg-rust/issues/1386#issuecomment-2915785708

   cc @CTTY 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Change `FieldSummary` `{upper,lower}_bound` to `ByteBuf` [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on code in PR #1369:
URL: https://github.com/apache/iceberg-rust/pull/1369#discussion_r2111508289


##
crates/iceberg/src/expr/visitors/manifest_evaluator.rs:
##
@@ -154,9 +155,19 @@ impl BoundPredicateVisitor for ManifestFilterVisitor<'_> {
 _predicate: &BoundPredicate,
 ) -> crate::Result {
 let field = self.field_summary_for_reference(reference);
+
 match &field.lower_bound {
-Some(bound) if datum <= bound => ROWS_CANNOT_MATCH,
-Some(_) => ROWS_MIGHT_MATCH,
+Some(bound_bytes) => {
+let bound = ManifestFilterVisitor::bytes_to_datum(

Review Comment:
   The cost dependends on the type, but I think it's fine for now since it's 
not supposed to be quite expensive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Change `FieldSummary` `{upper,lower}_bound` to `ByteBuf` [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 merged PR #1369:
URL: https://github.com/apache/iceberg-rust/pull/1369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add retryable property for Error [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on code in PR #1383:
URL: https://github.com/apache/iceberg-rust/pull/1383#discussion_r2111516594


##
crates/iceberg/src/error.rs:
##
@@ -134,6 +134,8 @@ pub struct Error {
 
 source: Option,
 backtrace: Backtrace,
+
+retryable: bool,

Review Comment:
   This is quite confusing. Whether it's retryable should be determined by 
consumer by `ErrorKind`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Migrate Flink TableSchema for IcebergSource [iceberg]

2025-05-28 Thread via GitHub



mxm commented on PR #13072:
URL: https://github.com/apache/iceberg/pull/13072#issuecomment-2915875085

   What are the obstacles for completely getting rid of TableSchema?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Feature request: support to update snapshot summary [iceberg-rust]

2025-05-28 Thread via GitHub



Xuanwo closed issue #1329: Feature request: support to update snapshot summary
URL: https://github.com/apache/iceberg-rust/issues/1329


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive: Throw exception for when listing a non-existing namespace [iceberg]

2025-05-28 Thread via GitHub



pvary commented on code in PR #13130:
URL: https://github.com/apache/iceberg/pull/13130#discussion_r2110933234


##
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##
@@ -1209,11 +1208,4 @@ public void 
testDatabaseLocationWithSlashInWarehouseDir() {
 
 assertThat(database.getLocationUri()).isEqualTo("s3://bucket/database.db");
   }
-
-  @Test
-  @Override
-  @Disabled("Hive currently returns an empty list instead of throwing a 
NoSuchNamespaceException")
-  public void testListNonExistingNamespace() {
-super.testListNonExistingNamespace();
-  }

Review Comment:
   @stevenzwu: Since you'll be the next release manager, do you know what needs 
to be done to add a note about the change to the release notes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Running MERGE INTO with more than one WHEN condition fails if the number of columns in the target table is > 321 [iceberg]

2025-05-28 Thread via GitHub



andreaschiappacasse commented on issue #10294:
URL: https://github.com/apache/iceberg/issues/10294#issuecomment-2915516882

   +1 on the issue since is still a problem in our workloads. @nastra would you 
think is it possible to re-open it? Even if trino solved the upstream issue, it 
seems that on AWS via Athena the problem still persists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-28 Thread via GitHub



pvary commented on code in PR #12254:
URL: https://github.com/apache/iceberg/pull/12254#discussion_r2111329707


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -107,9 +115,12 @@ public abstract class TestRemoveOrphanFilesAction extends 
TestBase {
   protected Map properties;
   @Parameter private int formatVersion;
 
-  @Parameters(name = "formatVersion = {0}")
+  @Parameter(index = 1)
+  private boolean usePrefixListing;
+
+  @Parameters(name = "formatVersion = {0}, usePrefixListing = {1}")
   protected static List parameters() {
-return Arrays.asList(2, 3);
+return Arrays.asList(new Object[] {2, true}, new Object[] {2, false}, new 
Object[] {3, false});

Review Comment:
   I believe @RussellSpitzer thought about something like this:
   ```
 return Arrays.stream(TestHelpers.ALL_VERSIONS)
 .filter(version -> version > 1)
 .flatMap(version -> Stream.of(
 new Object[]{version, true},
 new Object[]{version, false}
 ))
 .collect(Collectors.toList());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Running MERGE INTO with more than one WHEN condition fails if the number of columns in the target table is > 321 [iceberg]

2025-05-28 Thread via GitHub



andreaschiappacasse opened a new issue, #10294:
URL: https://github.com/apache/iceberg/issues/10294

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   Athena (engine v3)
   
   ### Please describe the bug 🐞
   
   Hello everyone, today my team incurred in a very strange bug using Iceberg 
via Athena. I'll descrive the steps we used to reproduce the error below:
   
   **1. We create an iceberg table with an "id" column and 321 other columns 
with random strings - in the example below we use awsrangler to create the 
table, but the same happens when the table is created using Athena directly.**
   
   ``` python
   import awswrangler as wr
   import pandas as pd
   import random, string
   
   NUM_COLS=322
   
   def get_random_string(length):
   letters = string.ascii_lowercase
   result_str = ''.join(random.choice(letters) for i in range(length))
   return result_str
   
   columns = ['id']+[get_random_string(5) for i in range(NUM_COLS-1) ]
   data = pd.DataFrame(data=[columns], columns=columns)
   
   
   wr.athena.to_iceberg(
   data,
   workgroup="my-workgroup",
   database="my_database",
   table="iceberg_limits_322",
   table_location="s3://my_bucket/iceberg_limits",
   )
   
   ```
   
   **2. we then run the following query in athena to insert a random value**
   
   ``` sql
   MERGE INTO my_database.iceberg_limits_322 as existing 
   using (
SELECT 'something' as id
   ) as new on existing.id = new.id
   WHEN NOT MATCHED
   THEN INSERT (id) VALUES (new.id)
   WHEN MATCHED THEN DELETE
   
   ```
   
   **3. which results in the error:**
   
   `[ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an 
internal error while executing this query. Please contact AWS support for 
further assistance. You will not be charged for this query. We apologize for 
the inconvenience.`
   
   
   Notice that the error **only occurs when multiple WHEN are used in the MERGE 
INTO query!** - in case one WHEN is used (just to insert or to delete records) 
everything works fine, and the table can be used normally.
   
   We can replicate this behaviour on multiple AWS accounts and with different 
tables/databases/s3 locations.
   
   After trying with different number of columns we consistently found that 321 
is the maximum limit for the number of columns of the table. Everything works 
fine below this threshold.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Introduce snapshot summary properties [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on PR #1336:
URL: https://github.com/apache/iceberg-rust/pull/1336#issuecomment-2915710134

   Hi, @Xuanwo @dentiny We need to revert this pr, the new added update doesn't 
exist in openapi spec: 
https://github.com/apache/iceberg/blob/cab0decbb0e32bf314039e30807eb033c50665d5/open-api/rest-catalog-open-api.yaml#L3058
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Introduce snapshot summary properties [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on PR #1336:
URL: https://github.com/apache/iceberg-rust/pull/1336#issuecomment-2915721862

   Also I don't think we should allow user to udpate snapshot summaries alone, 
it's dangerous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Introduce snapshot summary properties [iceberg-rust]

2025-05-28 Thread via GitHub



dentiny commented on PR #1336:
URL: https://github.com/apache/iceberg-rust/pull/1336#issuecomment-2915720658

   > Hi, @Xuanwo @dentiny We need to revert this pr, the new added update 
doesn't exist in openapi spec: 
https://github.com/apache/iceberg/blob/cab0decbb0e32bf314039e30807eb033c50665d5/open-api/rest-catalog-open-api.yaml#L3058
   
   Hi @liurenjie1024 , there's a `SnapshotUpdate` action in Java 
implementation, which supports setting summary property.
   
https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/SnapshotUpdate.java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Introduce snapshot summary properties [iceberg-rust]

2025-05-28 Thread via GitHub



dentiny commented on PR #1336:
URL: https://github.com/apache/iceberg-rust/pull/1336#issuecomment-2915729660

   > Also I don't think we should allow user to udpate snapshot summaries 
alone, it's dangerous.
   
   HI @liurenjie1024, curious if you have other suggestions to update snapshot 
summary?
   My feature request is to write customized metadata for snapshot.
   
   The Java impl I linked above seems to fulfill the requirement, and is 
exposed as public?
   Would like to know your thoughts :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Revert "feat: Introduce snapshot summary properties (#1336)" [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 merged PR #1390:
URL: https://github.com/apache/iceberg-rust/pull/1390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Data race in `inclusiveMetricsEval` [iceberg-go]

2025-05-28 Thread via GitHub



psarna opened a new issue, #443:
URL: https://github.com/apache/iceberg-go/issues/443

   ### Apache Iceberg version
   
   None
   
   ### Please describe the bug 🐞
   
   When running the `TestMergeManifests` test with `-race` flag, a data report 
is created (attached to the end of the message).
   I only skimmed the code, but it looks like the culprit is concurrent 
fetching of manifest entries 
[here](https://github.com/apache/iceberg-go/blob/85f700808061fcea3655412245ef5d59f1784b69/table/scanner.go#L356).
 All concurrent requests write to the same structure 
[here](https://github.com/apache/iceberg-go/blob/85f700808061fcea3655412245ef5d59f1784b69/table/evaluators.go#L754-L756).
   
   I confirmed that setting the concurrency here to `1` makes all the tests 
pass the data race check.
   
   ```
   $ go test -race -run TestTableWriting/TestMergeManifests
   ==
   WARNING: DATA RACE
   Write at 0x00c000f7dc20 by goroutine 353:
 github.com/apache/iceberg-go/table.(*inclusiveMetricsEval).Eval()
 /dev/repo/iceberg-go/table/evaluators.go:754 +0xb9
 github.com/apache/iceberg-go/table.(*inclusiveMetricsEval).Eval-fm()
 :1 +0x47
 github.com/apache/iceberg-go/table.openManifest()
 /dev/repo/iceberg-go/table/scanner.go:142 +0x1b2
 github.com/apache/iceberg-go/table.(*Scan).collectManifestEntries.func1()
 /dev/repo/iceberg-go/table/scanner.go:358 +0xd0
 golang.org/x/sync/errgroup.(*Group).add.func1()
 /dev/go/pkg/mod/golang.org/x/sync@v0.14.0/errgroup/errgroup.go:130 
+0x141
   
   Previous write at 0x00c000f7dc20 by goroutine 354:
 github.com/apache/iceberg-go/table.(*inclusiveMetricsEval).Eval()
 /dev/repo/iceberg-go/table/evaluators.go:754 +0xb9
 github.com/apache/iceberg-go/table.(*inclusiveMetricsEval).Eval-fm()
 :1 +0x47
 github.com/apache/iceberg-go/table.openManifest()
 /dev/repo/iceberg-go/table/scanner.go:142 +0x1b2
 github.com/apache/iceberg-go/table.(*Scan).collectManifestEntries.func1()
 /dev/repo/iceberg-go/table/scanner.go:358 +0xd0
 golang.org/x/sync/errgroup.(*Group).add.func1()
 /dev/go/pkg/mod/golang.org/x/sync@v0.14.0/errgroup/errgroup.go:130 
+0x141
   
   Goroutine 353 (running) created at:
 golang.org/x/sync/errgroup.(*Group).add()
 /dev/go/pkg/mod/golang.org/x/sync@v0.14.0/errgroup/errgroup.go:98 +0xe4
 golang.org/x/sync/errgroup.(*Group).Go()
 /dev/go/pkg/mod/golang.org/x/sync@v0.14.0/errgroup/errgroup.go:93 
+0x791
 github.com/apache/iceberg-go/table.(*Scan).collectManifestEntries()
 /dev/repo/iceberg-go/table/scanner.go:356 +0x531
 github.com/apache/iceberg-go/table.(*Scan).PlanFiles()
 /dev/repo/iceberg-go/table/scanner.go:399 +0xb1
 github.com/apache/iceberg-go/table.(*Scan).ToArrowRecords()
 /dev/repo/iceberg-go/table/scanner.go:441 +0x64
 github.com/apache/iceberg-go/table.(*Scan).ToArrowTable()
 /dev/repo/iceberg-go/table/scanner.go:474 +0x8c
 
github.com/apache/iceberg-go/table_test.(*TableWritingTestSuite).TestMergeManifests()
 /dev/repo/iceberg-go/table/table_test.go:1125 +0x21f0
 github.com/apache/iceberg-go/table_test.arrowTableWithNull()
 /dev/repo/iceberg-go/table/table_test.go:974 +0x14b
 
github.com/apache/iceberg-go/table_test.(*TableWritingTestSuite).TestMergeManifests()
 /dev/repo/iceberg-go/table/table_test.go:1067 +0x7d1
 runtime.call16()
 
/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.2.linux-amd64/src/runtime/asm_amd64.s:775
 +0x42
 reflect.Value.Call()
 
/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.2.linux-amd64/src/reflect/value.go:368
 +0xb5
 github.com/stretchr/testify/suite.Run.func1()
 /dev/go/pkg/mod/github.com/stretchr/testify@v1.10.0/suite/suite.go:202 
+0x6ed
 testing.tRunner()
 
/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.2.linux-amd64/src/testing/testing.go:1792
 +0x225
 testing.(*T).Run.gowrap1()
 
/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.2.linux-amd64/src/testing/testing.go:1851
 +0x44
   
   Goroutine 354 (finished) created at:
 golang.org/x/sync/errgroup.(*Group).add()
 /dev/go/pkg/mod/golang.org/x/sync@v0.14.0/errgroup/errgroup.go:98 +0xe4
 golang.org/x/sync/errgroup.(*Group).Go()
 /dev/go/pkg/mod/golang.org/x/sync@v0.14.0/errgroup/errgroup.go:93 
+0x791
 github.com/apache/iceberg-go/table.(*Scan).collectManifestEntries()
 /dev/repo/iceberg-go/table/scanner.go:356 +0x531
 github.com/apache/iceberg-go/table.(*Scan).PlanFiles()
 /dev/repo/iceberg-go/table/scanner.go:399 +0xb1
 github.com/apache/iceberg-go/table.(*Scan).ToArrowRecords()
 /dev/repo/iceberg-go/table/scanner.go:441 +0x64
 github.com/apache/iceberg-go/table.(*Scan).ToArrowTable()
 /dev/repo/iceberg-go/table/scanner.go:474 +0x8c
 
github.com/apache/iceberg-go/table_test.(*TableWritingTestS

Re: [PR] Docs: add column descriptions for entries metadata table [iceberg]

2025-05-28 Thread via GitHub



manuzhang commented on code in PR #13104:
URL: https://github.com/apache/iceberg/pull/13104#discussion_r2111095523


##
docs/docs/spark-queries.md:
##
@@ -301,6 +301,16 @@ SELECT * FROM prod.db.table.entries;
 | -- | -- | -- | -- | -- | -- |
 | 2 | 57897183625154 | 0 | 0 | 
{"content":0,"file_path":"s3:/.../table/data/00047-25-833044d0-127b-415c-b874-038a4f978c29-00612.parquet","file_format":"PARQUET","spec_id":0,"record_count":15,"file_size_in_bytes":473,"column_sizes":{1:103},"value_counts":{1:15},"null_value_counts":{1:0},"nan_value_counts":{},"lower_bounds":{1:},"upper_bounds":{1:},"key_metadata":null,"split_offsets":[4],"equality_ids":null,"sort_order_id":0}
 | 
{"c1":{"column_size":103,"value_count":15,"null_value_count":0,"nan_value_count":null,"lower_bound":1,"upper_bound":3}}
 |
 
+Note:
+
+1. The columns of the `entries` table correspond to the fields of the 
`manifest_entry` struct (see the [manifest file 
schema](../../spec.md#manifests) for the full definition):
+- `status`: Used to track additions and deletions
+- `snapshot_id`: The ID of the snapshot in which the file was added or 
removed
+- `sequence_number`: Used for ordering changes across snapshots
+- `file_sequence_number`: Indicates when the file was added
+- `data_file`: Metadata about the data file

Review Comment:
   The issue is we don't have anchors for `manifest_entry` and `data_file` to 
link to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Swith metadata pointer to new location for different catalogs(sql, hive, glue, etc) [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 opened a new issue, #1389:
URL: https://github.com/apache/iceberg-rust/issues/1389

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Glue catalog: avoid reloading table in `commit_table` [iceberg-python]

2025-05-28 Thread via GitHub



psavalle opened a new issue, #2051:
URL: https://github.com/apache/iceberg-python/issues/2051

   ### Feature Request / Improvement
   
   The Glue catalog implementation [always reloads the 
table](https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/glue.py#L505)
 in `GlueCatalog.commit_table`, even though a `Table` object is passed in.
   
   It would be useful to have to opportunity to skip that for use cases that 
write frequently and already have an up-to-date `Table` object, as the extra 
table reload can take some time (esp. on tables with lots of checkpoints).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add asan and ubsan support to cmake [iceberg-cpp]

2025-05-28 Thread via GitHub



wgtmac commented on code in PR #107:
URL: https://github.com/apache/iceberg-cpp/pull/107#discussion_r2111250416


##
.github/workflows/sanitizer_test.yml:
##
@@ -0,0 +1,63 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: ASAN and UBSAN Tests
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+jobs:
+  sanitizer-test:
+name: "ASAN and UBSAN Tests"
+runs-on: ubuntu-24.04
+strategy:
+  fail-fast: false
+steps:
+  - name: Checkout iceberg-cpp
+uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 
v4.2.2
+  - name: Configure and Build with ASAN & UBSAN
+run: |
+  mkdir build && cd build
+  cmake .. -DCMAKE_BUILD_TYPE=Debug -DICEBERG_ENABLE_ASAN=ON 
-DICEBERG_ENABLE_UBSAN=ON
+  cmake --build . --verbose
+  - name: Run Tests
+working-directory: build
+env:
+  ASAN_OPTIONS: 
log_path=out.log:detect_leaks=1:symbolize=1:strict_string_checks=1:halt_on_error=0:detect_container_overflow=0

Review Comment:
   ```suggestion
 ASAN_OPTIONS: 
log_path=out.log:detect_leaks=1:symbolize=1:strict_string_checks=1:halt_on_error=1:detect_container_overflow=0
   ```
   
   @raulcd Perhaps we need to enable `halt_on_error` to avoid the errors being 
swallowed by the log files.



##
.github/workflows/sanitizer_test.yml:
##
@@ -0,0 +1,63 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: ASAN and UBSAN Tests
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+jobs:
+  sanitizer-test:
+name: "ASAN and UBSAN Tests"
+runs-on: ubuntu-24.04
+strategy:
+  fail-fast: false
+steps:
+  - name: Checkout iceberg-cpp
+uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 
v4.2.2
+  - name: Configure and Build with ASAN & UBSAN
+run: |
+  mkdir build && cd build
+  cmake .. -DCMAKE_BUILD_TYPE=Debug -DICEBERG_ENABLE_ASAN=ON 
-DICEBERG_ENABLE_UBSAN=ON
+  cmake --build . --verbose
+  - name: Run Tests
+working-directory: build
+env:
+  ASAN_OPTIONS: 
log_path=out.log:detect_leaks=1:symbolize=1:strict_string_checks=1:halt_on_error=0:detect_container_overflow=0
+  LSAN_OPTIONS: suppressions=${{ github.workspace 
}}/.github/lsan-suppressions.txt
+  UBSAN_OPTIONS: log_path=out.log:print_stacktrace=1:suppressions=${{ 
github.workspace }}/.github/ubsan-suppressions.txt

Review Comment:
   ```suggestion
 UBSAN_OPTIONS: 
log_path=out.log:halt_on_error=1:print_stacktrace=1:suppressions=${{ 
github.workspace }}/.github/ubsan-suppressions.txt
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-

Re: [PR] feat: add asan and ubsan support to cmake [iceberg-cpp]

2025-05-28 Thread via GitHub



raulcd commented on PR #107:
URL: https://github.com/apache/iceberg-cpp/pull/107#issuecomment-2915302693

   I am slightly confused, the log output from the job, see the artifact 
uploaded here:
   https://github.com/apache/iceberg-cpp/actions/runs/15293677546?pr=107
   suggest there are several leaks, example:
   ```
   Indirect leak of 31 byte(s) in 1 object(s) allocated from:
   #0 0x7f1fcbcfe548 in operator new(unsigned long) 
../../../../src/libsanitizer/asan/asan_new_delete.cpp:95
   #1 0x557a45d3e428 in std::__new_allocator::allocate(unsigned long, 
void const*) /usr/include/c++/13/bits/new_allocator.h:151
   #2 0x557a45cf56e5 in std::allocator::allocate(unsigned long) 
/usr/include/c++/13/bits/allocator.h:198
   #3 0x557a45cf56e5 in std::allocator_traits 
>::allocate(std::allocator&, unsigned long) 
/usr/include/c++/13/bits/alloc_traits.h:482
   #4 0x557a45cf56e5 in std::__cxx11::basic_string, std::allocator 
>::_S_allocate(std::allocator&, unsigned long) 
/usr/include/c++/13/bits/basic_string.h:126
   #5 0x557a45cf3657 in std::__cxx11::basic_string, std::allocator >::_M_create(unsigned long&, 
unsigned long) /usr/include/c++/13/bits/basic_string.tcc:159
   #6 0x557a45cf5196 in std::__cxx11::basic_string, std::allocator >::_M_mutate(unsigned long, 
unsigned long, char const*, unsigned long) 
/usr/include/c++/13/bits/basic_string.tcc:332
   #7 0x557a45cddb0c in std::__cxx11::basic_string, std::allocator >::_M_replace_aux(unsigned long, 
unsigned long, unsigned long, char) 
/usr/include/c++/13/bits/basic_string.tcc:468
   #8 0x557a45e0eefd in std::__cxx11::basic_string, std::allocator >::append(unsigned long, char) 
/usr/include/c++/13/bits/basic_string.h:1488
   #9 0x557a45e1ac3a in std::__cxx11::basic_string, std::allocator >::resize(unsigned long, char) 
/usr/include/c++/13/bits/basic_string.tcc:405
   #10 0x557a45e1737f in std::__cxx11::basic_string, std::allocator >::resize(unsigned long) 
/usr/include/c++/13/bits/basic_string.h:1114
   #11 0x557a465e9c49 in EncodeMetadata 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/vendoredarrow-src/cpp/src/arrow/c/bridge.cc:154
   #12 0x557a465eb370 in ExportMetadata 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/vendoredarrow-src/cpp/src/arrow/c/bridge.cc:296
   #13 0x557a465ea162 in ExportField 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/vendoredarrow-src/cpp/src/arrow/c/bridge.cc:189
   #14 0x557a465eb1b2 in ExportChildren 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/vendoredarrow-src/cpp/src/arrow/c/bridge.cc:283
   #15 0x557a465ea5cb in ExportSchema 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/vendoredarrow-src/cpp/src/arrow/c/bridge.cc:209
   #16 0x557a465ed9db in arrow::ExportSchema(arrow::Schema const&, 
ArrowSchema*) 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/vendoredarrow-src/cpp/src/arrow/c/bridge.cc:523
   #17 0x557a45c92944 in 
iceberg::FromArrowSchemaTest_StructType_Test::TestBody() 
/home/runner/work/iceberg-cpp/iceberg-cpp/test/arrow_test.cc:357
   #18 0x557a46fb1d22 in void 
testing::internal::HandleSehExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2638
   #19 0x557a46faac32 in void 
testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2674
   #20 0x557a46f85b37 in testing::Test::Run() 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2713
   #21 0x557a46f865f5 in testing::TestInfo::Run() 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2859
   #22 0x557a46f86fbd in testing::TestSuite::Run() 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:3037
   #23 0x557a46f96fe3 in testing::internal::UnitTestImpl::RunAllTests() 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:5967
   #24 0x557a46fb31a7 in bool 
testing::internal::HandleSehExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool 
(testing::internal::UnitTestImpl::*)(), char const*) 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2638
   #25 0x557a46fabeda in bool 
testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool 
(testing::internal::UnitTestImpl::*)(), char const*) 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:2674
   #26 0x557a46f954cf in testing::UnitTest::Run() 
/home/runner/work/iceberg-cpp/iceberg-cpp/build/_deps/googletest-src/googletest/src/gtest.cc:5546
   #27 0x557a45e51ea1 in RUN_ALL_TESTS() 
/home/runner

Re: [PR] Flink: Dynamic Iceberg Sink Contribution [iceberg]

2025-05-28 Thread via GitHub



b-rick commented on code in PR #12424:
URL: https://github.com/apache/iceberg/pull/12424#discussion_r2111204802


##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java:
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink.dynamic;
+
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+import java.time.LocalTime;
+import java.util.List;
+import java.util.Map;
+import org.apache.flink.table.data.ArrayData;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.GenericArrayData;
+import org.apache.flink.table.data.GenericMapData;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.MapData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.data.TimestampData;
+import org.apache.flink.table.types.logical.ArrayType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.MapType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.flink.FlinkSchemaUtil;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class RowDataEvolver {
+  private RowDataEvolver() {}
+
+  public static RowData convert(RowData sourceData, Schema sourceSchema, 
Schema targetSchema) {
+return convertStruct(
+sourceData, FlinkSchemaUtil.convert(sourceSchema), 
FlinkSchemaUtil.convert(targetSchema));

Review Comment:
   Indeed, but the case where the record schema does not match the table schema 
is very common. E.g a producer who only ever was publishing a schema with a 
single column "a", and now the schema evolves, and the producer only publishes 
column "b". The table schema will be {"a", "b"} (with a unioning strategy), and 
we expect to continue publishing without fault. But because of the schema 
difference, we hit this slow code path for every single record.
   



##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java:
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink.dynamic;
+
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+import java.time.LocalTime;
+import java.util.List;
+import java.util.Map;
+import org.apache.flink.table.data.ArrayData;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.GenericArrayData;
+import org.apache.flink.table.data.GenericMapData;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.MapData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.data.TimestampData;
+import org.apache.flink.table.types.logical.ArrayType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.MapType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.flink.FlinkSchemaUtil;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class RowDataEvolver {
+  private RowDataEvolver() {}
+
+  public static RowData convert(RowData sourceData, Schema sourceSchema, 
Schema targetSchema)

[PR] feat: implement avro file reader [iceberg-cpp]

2025-05-28 Thread via GitHub



wgtmac opened a new pull request, #113:
URL: https://github.com/apache/iceberg-cpp/pull/113

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Introduce snapshot summary properties [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on code in PR #1336:
URL: https://github.com/apache/iceberg-rust/pull/1336#discussion_r2111438158


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -495,6 +495,12 @@ pub enum TableUpdate {
 /// Schema IDs to remove.
 schema_ids: Vec,
 },
+/// Add snapshot summary properties.
+#[serde(rename_all = "kebab-case")]
+AddSnapshotSummaryProperties {

Review Comment:
   This table update doesn't exist in table updates, see 
https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Introduce snapshot summary properties [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on PR #1336:
URL: https://github.com/apache/iceberg-rust/pull/1336#issuecomment-2915765649

   > > Also I don't think we should allow user to udpate snapshot summaries 
alone, it's dangerous.
   > 
   > HI @liurenjie1024, curious if you have other suggestions to update 
snapshot summary? My feature request is to write customized metadata for 
snapshot.
   > 
   > The Java impl I linked above seems to fulfill the requirement, and is 
exposed as public? Would like to know your thoughts :)
   
   The `SnapshotUpdate` is a super interface of tx actions that could produce a 
new snapshot, so in theory you could use any tx action that produces a new 
snapshot to do that. Currently, you could you `FastAppend`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Revert "feat: Introduce snapshot summary properties (#1336)" [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on PR #1390:
URL: https://github.com/apache/iceberg-rust/pull/1390#issuecomment-2915775716

   cc @Xuanwo @sdd @Fokko @kevinjqliu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [EPIC] Support for appending data to iceberg table. [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on issue #1382:
URL: https://github.com/apache/iceberg-rust/issues/1382#issuecomment-2915783715

   > Hi [@liurenjie1024](https://github.com/liurenjie1024) , thanks for having 
this!
   > 
   > Would it be ok if I take up the commit path work? I can start working on a 
POC tomorrow
   
   Cool, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [EPIC] Support for appending data to iceberg table. [iceberg-rust]

2025-05-28 Thread via GitHub



liurenjie1024 commented on issue #1382:
URL: https://github.com/apache/iceberg-rust/issues/1382#issuecomment-2915783358

   > I don't get why we need this. Is this used by the catalog implementation, 
e.g. SqlCatalog::update_table? Could you elaborate more the commit process and 
where we need this?
   
   Yes. Catalogs uses this new method to generate a new table metadata, and 
then it points to new table metadata.
   
   #964 duplicates with this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-28 Thread via GitHub



liziyan-lzy commented on code in PR #12254:
URL: https://github.com/apache/iceberg/pull/12254#discussion_r237312


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -303,39 +310,89 @@ private Dataset listedFileDS() {
 List subDirs = Lists.newArrayList();
 List matchingFiles = Lists.newArrayList();
 
-Predicate predicate = file -> file.getModificationTime() < 
olderThanTimestamp;
 PathFilter pathFilter = 
PartitionAwareHiddenPathFilter.forSpecs(table.specs());
 
-// list at most MAX_DRIVER_LISTING_DEPTH levels and only dirs that have
-// less than MAX_DRIVER_LISTING_DIRECT_SUB_DIRS direct sub dirs on the 
driver
-listDirRecursively(
-location,
-predicate,
-hadoopConf.value(),
-MAX_DRIVER_LISTING_DEPTH,
-MAX_DRIVER_LISTING_DIRECT_SUB_DIRS,
-subDirs,
-pathFilter,
-matchingFiles);
-
-JavaRDD matchingFileRDD = 
sparkContext().parallelize(matchingFiles, 1);
-
-if (subDirs.isEmpty()) {
+if (usePrefixListing) {
+  Preconditions.checkArgument(
+  table.io() instanceof SupportsPrefixOperations,
+  "Table file io should prefix operations when using prefix list.");

Review Comment:
   Thank you very much for identifying this issue. I’ve corrected it now.



##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -303,39 +310,89 @@ private Dataset listedFileDS() {
 List subDirs = Lists.newArrayList();
 List matchingFiles = Lists.newArrayList();
 
-Predicate predicate = file -> file.getModificationTime() < 
olderThanTimestamp;
 PathFilter pathFilter = 
PartitionAwareHiddenPathFilter.forSpecs(table.specs());
 
-// list at most MAX_DRIVER_LISTING_DEPTH levels and only dirs that have
-// less than MAX_DRIVER_LISTING_DIRECT_SUB_DIRS direct sub dirs on the 
driver
-listDirRecursively(
-location,
-predicate,
-hadoopConf.value(),
-MAX_DRIVER_LISTING_DEPTH,
-MAX_DRIVER_LISTING_DIRECT_SUB_DIRS,
-subDirs,
-pathFilter,
-matchingFiles);
-
-JavaRDD matchingFileRDD = 
sparkContext().parallelize(matchingFiles, 1);
-
-if (subDirs.isEmpty()) {
+if (usePrefixListing) {
+  Preconditions.checkArgument(
+  table.io() instanceof SupportsPrefixOperations,
+  "Table file io should prefix operations when using prefix list.");
+
+  Predicate predicate =
+  fileInfo -> fileInfo.createdAtMillis() < olderThanTimestamp;
+  listDirRecursivelyWithFileIO(
+  (SupportsPrefixOperations) table.io(), location, predicate, 
pathFilter, matchingFiles);
+
+  JavaRDD matchingFileRDD = 
sparkContext().parallelize(matchingFiles, 1);
   return spark().createDataset(matchingFileRDD.rdd(), Encoders.STRING());
+} else {
+  Predicate predicate = file -> file.getModificationTime() < 
olderThanTimestamp;
+  // list at most MAX_DRIVER_LISTING_DEPTH levels and only dirs that have
+  // less than MAX_DRIVER_LISTING_DIRECT_SUB_DIRS direct sub dirs on the 
driver
+  listDirRecursivelyWithHadoop(
+  location,
+  predicate,
+  hadoopConf.value(),
+  MAX_DRIVER_LISTING_DEPTH,
+  MAX_DRIVER_LISTING_DIRECT_SUB_DIRS,
+  subDirs,
+  pathFilter,
+  matchingFiles);
+
+  JavaRDD matchingFileRDD = 
sparkContext().parallelize(matchingFiles, 1);
+
+  if (subDirs.isEmpty()) {
+return spark().createDataset(matchingFileRDD.rdd(), Encoders.STRING());
+  }
+
+  int parallelism = Math.min(subDirs.size(), listingParallelism);
+  JavaRDD subDirRDD = sparkContext().parallelize(subDirs, 
parallelism);
+
+  Broadcast conf = 
sparkContext().broadcast(hadoopConf);
+  ListDirsRecursively listDirs = new ListDirsRecursively(conf, 
olderThanTimestamp, pathFilter);
+  JavaRDD matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs);
+
+  JavaRDD completeMatchingFileRDD = 
matchingFileRDD.union(matchingLeafFileRDD);
+  return spark().createDataset(completeMatchingFileRDD.rdd(), 
Encoders.STRING());
+}
+  }
+
+  private static void listDirRecursivelyWithFileIO(
+  SupportsPrefixOperations io,
+  String dir,
+  Predicate predicate,
+  PathFilter pathFilter,
+  List matchingFiles) {
+String listPath = dir;
+if (!dir.endsWith("/")) {
+  listPath = dir + "/";
+}
+
+Iterable files = io.listPrefix(listPath);
+for (org.apache.iceberg.io.FileInfo file : files) {
+  Path path = new Path(file.location());

Review Comment:
   Thanks for highlight this. I’ll verify the issue to ensure proper path 
handling for S3, etc., and will follow up if it is an issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-28 Thread via GitHub



liziyan-lzy commented on code in PR #12254:
URL: https://github.com/apache/iceberg/pull/12254#discussion_r237789


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -344,6 +374,8 @@ public void testWapFilesAreKept() {
 // wap write
 df.select("c1", "c2", 
"c3").write().format("iceberg").mode("append").save(tableLocation);
 
+spark.conf().unset(SparkSQLProperties.WAP_ID);

Review Comment:
   This configuration may cause test failures if enabled during multiple test 
runs. It is added to ensure that this configuration is removed after each test 
completes.



##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -1067,6 +1148,38 @@ public void testRemoveOrphanFileActionWithDeleteMode() {
 DeleteOrphanFiles.PrefixMismatchMode.DELETE);
   }
 
+  @TestTemplate
+  public void testDefaultToHadoopListing() {
+assumeThat(usePrefixListing)
+.as("Should not test both prefix listing and Hadoop file listing 
(redundant)")

Review Comment:
   Fixed. Thanks for pointing this out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: KeyManagementClient implementation that works with AWS KMS [iceberg]

2025-05-28 Thread via GitHub



jbonofre commented on PR #13136:
URL: https://github.com/apache/iceberg/pull/13136#issuecomment-2915277304

   I will take a look as well. Thanks !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: KeyManagementClient implementation that works with AWS KMS [iceberg]

2025-05-28 Thread via GitHub



jbonofre commented on PR #13136:
URL: https://github.com/apache/iceberg/pull/13136#issuecomment-2915276780

   @nastra @Fokko ^^


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-28 Thread via GitHub



liziyan-lzy commented on code in PR #12254:
URL: https://github.com/apache/iceberg/pull/12254#discussion_r245839


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -107,9 +115,12 @@ public abstract class TestRemoveOrphanFilesAction extends 
TestBase {
   protected Map properties;
   @Parameter private int formatVersion;
 
-  @Parameters(name = "formatVersion = {0}")
+  @Parameter(index = 1)
+  private boolean usePrefixListing;
+
+  @Parameters(name = "formatVersion = {0}, usePrefixListing = {1}")
   protected static List parameters() {
-return Arrays.asList(2, 3);
+return Arrays.asList(new Object[] {2, true}, new Object[] {2, false}, new 
Object[] {3, false});

Review Comment:
   Thanks for your input. I've now updated it to ensure tests are run across 
all versions to cover everything thoroughly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Dynamic Iceberg Sink Contribution [iceberg]

2025-05-28 Thread via GitHub



mxm commented on code in PR #12424:
URL: https://github.com/apache/iceberg/pull/12424#discussion_r265456


##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/RowDataEvolver.java:
##
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink.dynamic;
+
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+import java.time.LocalTime;
+import java.util.List;
+import java.util.Map;
+import org.apache.flink.table.data.ArrayData;
+import org.apache.flink.table.data.DecimalData;
+import org.apache.flink.table.data.GenericArrayData;
+import org.apache.flink.table.data.GenericMapData;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.MapData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.data.TimestampData;
+import org.apache.flink.table.types.logical.ArrayType;
+import org.apache.flink.table.types.logical.DecimalType;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.MapType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.flink.FlinkSchemaUtil;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class RowDataEvolver {
+  private RowDataEvolver() {}
+
+  public static RowData convert(RowData sourceData, Schema sourceSchema, 
Schema targetSchema) {
+return convertStruct(
+sourceData, FlinkSchemaUtil.convert(sourceSchema), 
FlinkSchemaUtil.convert(targetSchema));

Review Comment:
   Thanks for pointing this out. This code path will only be called under 
certain conditions, i.e. when the input schema changes in a way that we can 
adapt the input records to match the table schema. For example, when the input 
schema has a field has type int, but the table type is long. Also, we can 
reorder fields without changing the table schema. It is worthwhile to consider 
removing this functionality for simplicity, or to add extensive caching like we 
have for the other hot paths.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix incremental compute of partition stats with COW deletes [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111663422


##
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##
@@ -336,16 +336,29 @@ private static PartitionMap 
computeStatsDiff(
 Sets.newHashSet(
 SnapshotUtil.ancestorIdsBetween(
 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-Predicate manifestFilePredicate =
-manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+return computeStats(table, toSnapshot, snapshotIdsRange);
   }
 
   private static PartitionMap computeStats(
-  Table table, Snapshot snapshot, Predicate predicate, 
boolean incremental) {
+  Table table, Snapshot snapshot, Set snapshotIdsRange) {
 StructType partitionType = Partitioning.partitionType(table);
-List manifests =
-
snapshot.allManifests(table.io()).stream().filter(predicate).collect(Collectors.toList());
+boolean incremental = !snapshotIdsRange.isEmpty();
+
+List manifests;
+if (incremental) {
+  // DELETED manifest entries are not carried over to subsequent snapshots.
+  // So, for incremental computation, gather the manifests added by each 
snapshot
+  // instead of relying solely on those from the latest snapshot.
+  manifests =
+  snapshotIdsRange.stream()
+  .flatMap(
+  id ->
+  table.snapshot(id).allManifests(table.io()).stream()
+  .filter(file -> file.snapshotId().equals(id)))
+  .collect(Collectors.toList());

Review Comment:
   Also note that, because of snapshot id filter, 
   Each snapshot's added manifest files will be considered only once for 
compute. So, reused manifests won't be considered again. If manifests are 
rewritten, entries will be marked as EXISTING and won't be considered for 
incremental compute from existing logic in `collectStatsForManifest`. 
   
   So, IMO it works for all the scenarios now and we have testcase to cover all 
the scenarios. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Introduce snapshot summary properties [iceberg-rust]

2025-05-28 Thread via GitHub



Xuanwo merged PR #1336:
URL: https://github.com/apache/iceberg-rust/pull/1336


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Incrementally computing partition stats can miss deleted files [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on issue #13155:
URL: https://github.com/apache/iceberg/issues/13155#issuecomment-2916012247

   I found a clean way. Existing testacase was failing because I included same 
manifest twice when reused. Now, I added a snapshot id check. So, each snapshot 
only considers the manifest added by it. 
   
   The PR is ready for review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix incremental compute of partition stats [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111647311


##
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##
@@ -336,16 +336,29 @@ private static PartitionMap 
computeStatsDiff(
 Sets.newHashSet(
 SnapshotUtil.ancestorIdsBetween(
 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-Predicate manifestFilePredicate =
-manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+return computeStats(table, toSnapshot, snapshotIdsRange);
   }
 
   private static PartitionMap computeStats(
-  Table table, Snapshot snapshot, Predicate predicate, 
boolean incremental) {
+  Table table, Snapshot snapshot, Set snapshotIdsRange) {
 StructType partitionType = Partitioning.partitionType(table);
-List manifests =
-
snapshot.allManifests(table.io()).stream().filter(predicate).collect(Collectors.toList());
+boolean incremental = !snapshotIdsRange.isEmpty();
+
+List manifests;
+if (incremental) {
+  // DELETED manifest entries are not carried over to subsequent snapshots.
+  // So, for incremental computation, gather the manifests added by each 
snapshot
+  // instead of relying solely on those from the latest snapshot.
+  manifests =
+  snapshotIdsRange.stream()
+  .flatMap(
+  id ->
+  table.snapshot(id).allManifests(table.io()).stream()
+  .filter(file -> file.snapshotId().equals(id)))
+  .collect(Collectors.toList());

Review Comment:
   I also checked that if snapshots are expired, we cannot find previous stats 
for the table. 
   So, it will fallback to full compute. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix incremental compute of partition stats [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111640567


##
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##
@@ -336,16 +336,22 @@ private static PartitionMap 
computeStatsDiff(
 Sets.newHashSet(
 SnapshotUtil.ancestorIdsBetween(
 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-Predicate manifestFilePredicate =
-manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+return computeStats(table, toSnapshot, snapshotIdsRange, true /* 
incremental */);
   }
 
   private static PartitionMap computeStats(
-  Table table, Snapshot snapshot, Predicate predicate, 
boolean incremental) {
+  Table table, Snapshot snapshot, Set snapshotIdsRange, boolean 
incremental) {
 StructType partitionType = Partitioning.partitionType(table);
-List manifests =
-
snapshot.allManifests(table.io()).stream().filter(predicate).collect(Collectors.toList());
+
+List manifests;
+if (incremental) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix incremental compute of partition stats [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111642709


##
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##
@@ -276,7 +275,8 @@ private static Collection 
computeAndMergeStatsIncremental(
 PartitionMap statsMap = PartitionMap.create(table.specs());
 // read previous stats, note that partition field will be read as 
GenericRecord
 try (CloseableIterable oldStats =
-readPartitionStatsFile(schema(partitionType), 
Files.localInput(previousStatsFile.path( {
+readPartitionStatsFile(
+schema(partitionType), 
table.io().newInputFile(previousStatsFile.path( {

Review Comment:
   Just using the FileIO as pointed out in the PR. 
   Unrelated to this bug fix. But related to this feature. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix incremental compute of partition stats with COW deletes [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111647311


##
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##
@@ -336,16 +336,29 @@ private static PartitionMap 
computeStatsDiff(
 Sets.newHashSet(
 SnapshotUtil.ancestorIdsBetween(
 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-Predicate manifestFilePredicate =
-manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+return computeStats(table, toSnapshot, snapshotIdsRange);
   }
 
   private static PartitionMap computeStats(
-  Table table, Snapshot snapshot, Predicate predicate, 
boolean incremental) {
+  Table table, Snapshot snapshot, Set snapshotIdsRange) {
 StructType partitionType = Partitioning.partitionType(table);
-List manifests =
-
snapshot.allManifests(table.io()).stream().filter(predicate).collect(Collectors.toList());
+boolean incremental = !snapshotIdsRange.isEmpty();
+
+List manifests;
+if (incremental) {
+  // DELETED manifest entries are not carried over to subsequent snapshots.
+  // So, for incremental computation, gather the manifests added by each 
snapshot
+  // instead of relying solely on those from the latest snapshot.
+  manifests =
+  snapshotIdsRange.stream()
+  .flatMap(
+  id ->
+  table.snapshot(id).allManifests(table.io()).stream()
+  .filter(file -> file.snapshotId().equals(id)))
+  .collect(Collectors.toList());

Review Comment:
   I also checked that if snapshots are expired, we cannot find previous stats 
for the table in the caller. 
   So, it will fallback to full compute. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink 1.19, 1.20: Remove the MiniClusterWithClientResource dependency [iceberg]

2025-05-28 Thread via GitHub



JeonDaehong commented on PR #13165:
URL: https://github.com/apache/iceberg/pull/13165#issuecomment-2916040410

   @nastra 
   
   Hello,
   
   I’ve completed the backport.
   Please review and merge it when you get a chance!
   
   Also, this is a separate topic —
   After my previous PR was merged, I was expecting to see my name listed on 
the [contributors page](https://github.com/apache/iceberg/graphs/contributors), 
but it hasn’t shown up yet.
   
   Is merging into main not enough for that?
   Is there anything else I need to do to be listed?
   
   I’m looking forward to continuing to contribute to the Iceberg community and 
actively helping resolve issues!
   
   Thanks always 🙇


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Rest Catalog and Writing data to Minio Raises `OSError: When initiating multiple part upload` [iceberg-python]

2025-05-28 Thread via GitHub



hasanozciftcii commented on issue #974:
URL: https://github.com/apache/iceberg-python/issues/974#issuecomment-2916096063

   > As a workaround, you can define the S3 configurations (endpoint, secret 
access key, and access key ID) in the load_rest method.
   > 
   > catalog = load_rest(
   > name="rest",
   > conf={
   > "uri": "http://localhost:8181/";,
   > "s3.endpoint": "http://localhost:9000";,
   > "s3.access-key-id": "admin",
   > "s3.secret-access-key": "password",
   > },
   > )
   > For some reason, the iceberg-rest-image does not return the overrides in 
the getConfig endpoint.
   > 
   > If you test with Tabular or other catalogs, you will not have this problem.
   
   this works on my end, thanks for help


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix incremental compute of partition stats with COW deletes [iceberg]

2025-05-28 Thread via GitHub



pvary commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111741154


##
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##
@@ -336,16 +336,29 @@ private static PartitionMap 
computeStatsDiff(
 Sets.newHashSet(
 SnapshotUtil.ancestorIdsBetween(
 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-Predicate manifestFilePredicate =
-manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+return computeStats(table, toSnapshot, snapshotIdsRange);
   }
 
   private static PartitionMap computeStats(
-  Table table, Snapshot snapshot, Predicate predicate, 
boolean incremental) {
+  Table table, Snapshot snapshot, Set snapshotIdsRange) {

Review Comment:
   I think the parametrization of this method is hard to understand now. 
Especially with the hidden "empty" `snapshotIdsRange`.
   
   What if:
   - Move the manifest list calculation outside of this method, and use a 
`List manifests` input parameter.
   - Add keep the `boolean incremental` parameter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Fix(catalog): Only remove metadata files if TableOp created a new one [iceberg]

2025-05-28 Thread via GitHub



fivetran-raulpersa opened a new pull request, #13169:
URL: https://github.com/apache/iceberg/pull/13169

   Fixes #13014 
   
   In multiple Table ops (Glue, Dynamo, BigQuery and Nessie) a failing 
operation will clean up the metadata file.
   However it should only do so if it was the one to create the metadata file 
in the first place.
   
   Otherwise, an operation like `registerTable` could corrupt other catalog 
that rely on the existence of the metadata file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink 1.19, 1.20: Remove the MiniClusterWithClientResource dependency [iceberg]

2025-05-28 Thread via GitHub



pvary merged PR #13165:
URL: https://github.com/apache/iceberg/pull/13165


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] feat: set snapshot summary property in append action [iceberg-rust]

2025-05-28 Thread via GitHub



dentiny opened a new pull request, #1391:
URL: https://github.com/apache/iceberg-rust/pull/1391

   ## Which issue does this PR close?
   
   - Closes https://github.com/apache/iceberg-rust/issues/1329
   
   ## What changes are included in this PR?
   
   This PR tries to do the same thing as reverted 
[PR](https://github.com/apache/iceberg-rust/pull/1336), which adds capability 
to set snapshot summary properties. 
   Followup on thread: 
https://github.com/apache/iceberg-rust/pull/1336#issuecomment-2915765649, in 
this PR I took a different way, which expose interfaces within fast append 
action.
   
   ## Are these changes tested?
   
   Yes, I add new unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] fix(table): fix race condition in scanner [iceberg-go]

2025-05-28 Thread via GitHub



zeroshade opened a new pull request, #445:
URL: https://github.com/apache/iceberg-go/pull/445

   fixes #443
   
   Eliminates potential race condition in scanning caused by the metrics 
evaluator used for manifest pruning.
   
   Since the stats are only used for a given run of Eval, we don't need to 
modify the bound version of the `inclusiveMetricsEval` object and can instead 
copy the type and expression (cheap) to populate for the call. This avoids the 
race condition without giving up performance (no locking needed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] fix(catalog/glue): Fix nil map assignment issue in glue catalog [iceberg-go]

2025-05-28 Thread via GitHub



lliangyu-lin opened a new pull request, #446:
URL: https://github.com/apache/iceberg-go/pull/446

   ### Description
   When calling NewCatalog in glue catalog without any AWS properties, causing 
the catalog properties to be nil
   ```
   props: iceberg.Properties(glueOps.awsProperties),
   ```
   Hence, when creating a staged table and need to clone the catalog properties 
in 
https://github.com/apache/iceberg-go/blob/d51f8b8c5b8249a706467910d42ee51eb6032ccd/catalog/internal/utils.go#L137-L138,
 it will result in nil map assignment.
   
   Similar is happening in 
   ```
   func prepareProperties(icebergProperties iceberg.Properties, 
newMetadataLocation string) iceberg.Properties {
glueProperties := maps.Clone(icebergProperties)
   ```
   where if staged table has no properties in the metadata, it will return nil 
and causing nil map.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: support decompress gzip metadata [iceberg-cpp]

2025-05-28 Thread via GitHub



Fokko merged PR #108:
URL: https://github.com/apache/iceberg-cpp/pull/108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: support decompress gzip metadata [iceberg-cpp]

2025-05-28 Thread via GitHub



Fokko commented on PR #108:
URL: https://github.com/apache/iceberg-cpp/pull/108#issuecomment-2917339595

   LGTM @dongxiao1198 , thanks for the review @wgtmac, @lidavidm and 
@yingcai-cy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Append-only option for Spark incremental append read that throws [iceberg]

2025-05-28 Thread via GitHub



smaheshwar-pltr opened a new issue, #13179:
URL: https://github.com/apache/iceberg/issues/13179

   ### Feature Request / Improvement
   
   Spark's [incremental append 
reading](https://iceberg.apache.org/docs/latest/spark-queries/#incremental-read)
 is useful for append-only incremental workflows.
   
   It's documented as only reading snapshots with "append" snapshot types - it 
ignores any snapshots in the range that don't have this type, e.g. `DELETE` 
snapshots.
   
   I think this is sensible default behaviour, but I'm wondering what the 
community thinks about allowing an `append-only` option to be set that *throws* 
if there are non-append snapshots in the range, to mitigate data integrity of a 
downstream table being compromised for strictly append-only workflows. (It 
would be unset by default)
   
   ### Query engine
   
   None
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Incremental Append Scan [iceberg-python]

2025-05-28 Thread via GitHub



smaheshwar-pltr commented on PR #2031:
URL: https://github.com/apache/iceberg-python/pull/2031#issuecomment-2917350349

   Put up https://github.com/apache/iceberg/issues/13179 re an append-only 
option


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Append-only option for Spark incremental append read that throws [iceberg]

2025-05-28 Thread via GitHub



smaheshwar-pltr commented on issue #13179:
URL: https://github.com/apache/iceberg/issues/13179#issuecomment-2917349624

   Not sure who to ping here so CC-ing folks I see on previous Spark 
incremental read work 😄: @aokolnychyi @huaxingao @szehon-ho @RussellSpitzer, 
please let me know your thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Build: Fix errorprone warnings [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat opened a new issue, #13178:
URL: https://github.com/apache/iceberg/issues/13178

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   `./gradlew clean build -x test -x integrationTest --no-build-cache`
   
   Default profile build with above command results in few errorprone warnings. 
   
   https://github.com/user-attachments/assets/77a1912d-8b36-4910-a707-d0f1fc64a4ee";
 />
   
   
   Good to keep the build green. 
   
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: set snapshot summary property in append action [iceberg-rust]

2025-05-28 Thread via GitHub



dentiny commented on code in PR #1391:
URL: https://github.com/apache/iceberg-rust/pull/1391#discussion_r2112436407


##
Cargo.lock:
##
@@ -3712,7 +3712,7 @@ dependencies = [
  "iceberg-catalog-rest",
  "iceberg-datafusion",
  "iceberg_test_utils",
- "ordered-float 4.6.0",
+ "ordered-float 2.10.1",

Review Comment:
   I didn't make the change, `cargo test` did



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] build(deps): bump the gomod_updates group with 3 updates [iceberg-go]

2025-05-28 Thread via GitHub



zeroshade merged PR #440:
URL: https://github.com/apache/iceberg-go/pull/440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Feature request: expose `DataFile` builder as public [iceberg-rust]

2025-05-28 Thread via GitHub



dentiny closed issue #1392: Feature request: expose `DataFile` builder as public
URL: https://github.com/apache/iceberg-rust/issues/1392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Feature request: expose `DataFile` builder as public [iceberg-rust]

2025-05-28 Thread via GitHub



dentiny commented on issue #1392:
URL: https://github.com/apache/iceberg-rust/issues/1392#issuecomment-2917202154

   Sorry I get fooled by gpt and deepseek, the default visibility is public
   Ref: https://docs.rs/derive_builder/latest/derive_builder/#setter-visibility


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Build: Fix errorprone warnings [iceberg]

2025-05-28 Thread via GitHub



TomasDarquier commented on issue #13178:
URL: https://github.com/apache/iceberg/issues/13178#issuecomment-2917209543

   - [x] I can contribute a fix for this bug independently
   Hi, I’d like to work on this issue — it seems like a great fit for my first 
contribution. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Build: Fix errorprone warnings [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on issue #13178:
URL: https://github.com/apache/iceberg/issues/13178#issuecomment-2917115167

   Good first issue for new contributors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Feature request: expose `DataFile` builder as public [iceberg-rust]

2025-05-28 Thread via GitHub



dentiny opened a new issue, #1392:
URL: https://github.com/apache/iceberg-rust/issues/1392

   ### Is your feature request related to a problem or challenge?
   
   My workflow looks like
   - Accumulate arrow batches in-memory, flush to local filesystem when it 
reaches certain threshold
   - Periodically import local parquet files to iceberg table
   
   iceberg-rust doesn't provide a util function or class to generate 
`DataFile`, which is needed for fast append action (apart from 
`DataFileWriter`), so I have to read the whole arrow batches and write it with 
`DataFileWriter`, which is definitely inefficient.
   
   I would like to expose `DataFileBuilder` as a public struct, so I could 
construct these fields myself.
   
   ### Describe the solution you'd like
   
   Expose `DataFileBuilder` as public
   
   ### Willingness to contribute
   
   I can contribute to this feature independently


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink 1.19, 1.20: Remove the MiniClusterWithClientResource dependency [iceberg]

2025-05-28 Thread via GitHub



pvary commented on PR #13165:
URL: https://github.com/apache/iceberg/pull/13165#issuecomment-2916219167

   Merged to main.
   Thanks @JeonDaehong for the backport and @nastra for the review.
   
   >After my previous PR was merged, I was expecting to see my name listed on 
the [contributors page](https://github.com/apache/iceberg/graphs/contributors), 
but it hasn’t shown up yet.
   >
   > Is merging into main not enough for that?
   > Is there anything else I need to do to be listed?
   
   I see people with 7-8 commits on the list. Maybe that's the threshold?
   
   > I’m looking forward to continuing to contribute to the Iceberg community 
and actively helping resolve issues!
   
   We are looking forward your contributions!
   Welcome to the team! 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add retryable property for Error [iceberg-rust]

2025-05-28 Thread via GitHub



ZENOTME commented on code in PR #1383:
URL: https://github.com/apache/iceberg-rust/pull/1383#discussion_r2111914945


##
crates/iceberg/src/error.rs:
##
@@ -134,6 +134,8 @@ pub struct Error {
 
 source: Option,
 backtrace: Backtrace,
+
+retryable: bool,

Review Comment:
   > This is quite confusing. Whether it's retryable should be determined by 
consumer by ErrorKind.
   
   `temporary: bool` +1. I think the basic idea here is to let retryable or 
temporary property orthogonal of ErrorKind.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix incremental compute of partition stats with COW deletes [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on code in PR #13163:
URL: https://github.com/apache/iceberg/pull/13163#discussion_r2111918206


##
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##
@@ -336,16 +336,29 @@ private static PartitionMap 
computeStatsDiff(
 Sets.newHashSet(
 SnapshotUtil.ancestorIdsBetween(
 toSnapshot.snapshotId(), fromSnapshot.snapshotId(), 
table::snapshot));
-Predicate manifestFilePredicate =
-manifestFile -> snapshotIdsRange.contains(manifestFile.snapshotId());
-return computeStats(table, toSnapshot, manifestFilePredicate, true /* 
incremental */);
+return computeStats(table, toSnapshot, snapshotIdsRange);
   }
 
   private static PartitionMap computeStats(
-  Table table, Snapshot snapshot, Predicate predicate, 
boolean incremental) {
+  Table table, Snapshot snapshot, Set snapshotIdsRange) {

Review Comment:
   Done. 
   
   I had considered it earlier as well, but decided to leave it as-is since I 
felt I might be overthinking the refactoring of this private method. But it 
makes sense now as it is very clean now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support commit retrie [iceberg-rust]

2025-05-28 Thread via GitHub



ZENOTME closed issue #964: Support commit retrie
URL: https://github.com/apache/iceberg-rust/issues/964


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink 1.19, 1.20: Remove the MiniClusterWithClientResource dependency [iceberg]

2025-05-28 Thread via GitHub



JeonDaehong commented on PR #13165:
URL: https://github.com/apache/iceberg/pull/13165#issuecomment-2916382580

   > Merged to main. Thanks @JeonDaehong for the backport and @nastra for the 
review.
   > 
   > > After my previous PR was merged, I was expecting to see my name listed 
on the [contributors 
page](https://github.com/apache/iceberg/graphs/contributors), but it hasn’t 
shown up yet.
   > > Is merging into main not enough for that?
   > > Is there anything else I need to do to be listed?
   > 
   > I see people with 7-8 commits on the list. Maybe that's the threshold?
   > 
   > > I’m looking forward to continuing to contribute to the Iceberg community 
and actively helping resolve issues!
   > 
   > We are looking forward your contributions! Welcome to the team! 😄
   
   Thank you for merging!
   I’ll continue to do my best and stay active!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Proposal: IRC Events endpoint [iceberg]

2025-05-28 Thread via GitHub



c-thiel commented on code in PR #12584:
URL: https://github.com/apache/iceberg/pull/12584#discussion_r2112113418


##
open-api/rest-catalog-open-api.yaml:
##
@@ -3964,6 +4172,302 @@ components:
 metadata:
   $ref: '#/components/schemas/TableMetadata'
 
+EventsResponse:
+  type: object
+  required:
+- highest-processed-timestamp-ms
+- events
+  properties:
+next-page-token:
+  $ref: "#/components/schemas/PageToken"
+highest-processed-timestamp-ms:
+  description: >
+The highest timestamp processed by the server when generating this 
response. 
+This may not necessarily appear in the returned changes if it was 
filtered out.
+
+Clients can use this value as the `after-timestamp-ms` parameter 
in subsequent 
+requests to continue retrieving changes after this point.
+  type: integer
+  format: int64
+events:
+  type: array
+  items:
+$ref: "#/components/schemas/Event"
+
+Event:
+  type: object
+  required:
+- event-id
+- request-id
+- event-count
+- timestamp-ms
+- operation
+  properties:
+event-id:
+  type: string
+  description: Unique ID of this event. Clients should perform 
deduplication based on this ID.
+request-id:
+  description: ID of the request this change belongs to.
+  type: string
+event-count:
+  type: integer
+  description: Number of events in the request / batch of events
+timestamp-ms:
+  type: integer
+  format: int64
+  description: >
+Timestamp when this transaction occurred (epoch milliseconds).
+Timestamps are not guaranteed to be unique. Typically all events in
+a transaction will have the same timestamp.
+actor-chain:
+  type: array
+  items:
+$ref: "#/components/schemas/Actor"
+  description: >
+An ordered list of actors involved in the operation, with the most 
direct actor 
+(the one who actually performed the operation) first, followed by 
delegating actors 
+in order. For example, if a service account (actor[0]) performed 
an operation 
+on behalf of a role (actor[1]) assumed by a user (actor[2]), the 
chain represents 
+this delegation path.
+operation:
+  type: object
+  discriminator:
+propertyName: operation-type
+mapping:
+  create-table: "#/components/schemas/CreateTableOperation"
+  register-table: "#/components/schemas/RegisterTableOperation"
+  drop-table: "#/components/schemas/DropTableOperation"
+  update-table: "#/components/schemas/UpdateTableOperation"
+  rename-table: "#/components/schemas/RenameTableOperation"
+  create-view: "#/components/schemas/CreateViewOperation"
+  drop-view: "#/components/schemas/DropViewOperation"
+  replace-view: "#/components/schemas/ReplaceViewOperation"
+  rename-view: "#/components/schemas/RenameTableOperation"
+  create-namespace: "#/components/schemas/CreateNamespaceOperation"
+  update-namespace-properties: 
"#/components/schemas/UpdateNamespacePropertiesOperation"
+  drop-namespace: "#/components/schemas/DropNamespaceOperation"
+  custom: "#/components/schemas/CustomOperation"
+  oneOf:
+- $ref: "#/components/schemas/CreateTableOperation"
+- $ref: "#/components/schemas/RegisterTableOperation"
+- $ref: "#/components/schemas/DropTableOperation"
+- $ref: "#/components/schemas/UpdateTableOperation"
+- $ref: "#/components/schemas/RenameTableOperation"
+- $ref: "#/components/schemas/CreateViewOperation"
+- $ref: "#/components/schemas/DropViewOperation"
+- $ref: "#/components/schemas/ReplaceViewOperation"
+- $ref: "#/components/schemas/RenameTableOperation"
+- $ref: "#/components/schemas/CreateNamespaceOperation"
+- $ref: "#/components/schemas/UpdateNamespacePropertiesOperation"
+- $ref: "#/components/schemas/DropNamespaceOperation"
+- $ref: "#/components/schemas/CustomOperation"
+
+CreateTableOperation:
+  required:
+- operation-type
+- identifier
+- table-uuid
+- metadata
+  properties:
+operation-type:
+  $ref: "#/components/schemas/OperationType"
+  const: "create-table"
+identifier:
+  $ref: "#/components/schemas/TableIdentifier"
+table-uuid:
+  type: string
+  format: uuid
+metadata:
+  $ref: "#/components/schemas/TableMetadata"
+
+RegisterTableOperation

Re: [PR] Proposal: IRC Events endpoint [iceberg]

2025-05-28 Thread via GitHub



c-thiel commented on code in PR #12584:
URL: https://github.com/apache/iceberg/pull/12584#discussion_r2112129175


##
open-api/rest-catalog-open-api.py:
##
@@ -1389,6 +1558,101 @@ class CommitTableResponse(BaseModel):
 metadata: TableMetadata
 
 
+class EventsResponse(BaseModel):
+next_page_token: Optional[PageToken] = Field(None, alias='next-page-token')
+highest_processed_timestamp_ms: int = Field(
+...,
+alias='highest-processed-timestamp-ms',
+description='The highest timestamp processed by the server when 
generating this response.  This may not necessarily appear in the returned 
changes if it was filtered out.\nClients can use this value as the 
`after-timestamp-ms` parameter in subsequent  requests to continue retrieving 
changes after this point.\n',
+)
+events: List[Event]
+
+
+class Event(BaseModel):
+event_id: str = Field(
+...,
+alias='event-id',
+description='Unique ID of this event. Clients should perform 
deduplication based on this ID.',
+)
+request_id: str = Field(
+..., alias='request-id', description='ID of the request this change 
belongs to.'
+)
+event_count: int = Field(
+...,
+alias='event-count',
+description='Number of events in the request / batch of events',
+)
+timestamp_ms: int = Field(
+...,
+alias='timestamp-ms',
+description='Timestamp when this transaction occurred (epoch 
milliseconds). Timestamps are not guaranteed to be unique. Typically all events 
in a transaction will have the same timestamp.\n',
+)
+actor_chain: Optional[List[Actor]] = Field(
+None,
+alias='actor-chain',
+description='An ordered list of actors involved in the operation, with 
the most direct actor  (the one who actually performed the operation) first, 
followed by delegating actors  in order. For example, if a service account 
(actor[0]) performed an operation  on behalf of a role (actor[1]) assumed by a 
user (actor[2]), the chain represents  this delegation path.\n',
+)
+operation: Union[
+CreateTableOperation,
+RegisterTableOperation,
+DropTableOperation,
+UpdateTableOperation,
+RenameTableOperation,
+CreateViewOperation,
+DropViewOperation,
+ReplaceViewOperation,
+CreateNamespaceOperation,
+UpdateNamespacePropertiesOperation,
+DropNamespaceOperation,
+CustomOperation,
+] = Field(..., discriminator='operation_type')
+
+
+class CreateTableOperation(BaseModel):
+operation_type: Literal['create-table'] = Field(
+..., alias='operation-type', const=True
+)
+identifier: TableIdentifier
+table_uuid: UUID = Field(..., alias='table-uuid')
+metadata: TableMetadata
+
+
+class RegisterTableOperation(BaseModel):
+operation_type: Literal['register-table'] = Field(
+..., alias='operation-type', const=True
+)
+identifier: TableIdentifier
+table_uuid: UUID = Field(..., alias='table-uuid')
+metadata: TableMetadata

Review Comment:
   Metadata is only returned for the initial `createTable` event. Other events 
only contain incremental updates.
   As this event is emitted at the time of table creation, TableMetadata is 
typically quite short, so I don't see a big problem storing it. REST Catalogs 
already store Metadata somewhere internally, and this is certainly one of the 
smaller ones as it is the first in the lifecycle of the table.
   Use-cases are outlined in the docs for this PR, I do see a value especially 
for Federation & Workflow triggers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: port range distribution to v2 iceberg sink [iceberg]

2025-05-28 Thread via GitHub



rodmeneses commented on code in PR #12071:
URL: https://github.com/apache/iceberg/pull/12071#discussion_r2112131384


##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java:
##
@@ -645,72 +711,135 @@ private DataStream 
distributeDataStream(DataStream input) {
 DistributionMode mode = flinkWriteConf.distributionMode();
 Schema schema = table.schema();
 PartitionSpec spec = table.spec();
+SortOrder sortOrder = table.sortOrder();
+
 LOG.info("Write distribution mode is '{}'", mode.modeName());
 switch (mode) {
   case NONE:
-if (equalityFieldIds.isEmpty()) {
-  return input;
-} else {
-  LOG.info("Distribute rows by equality fields, because there are 
equality fields set");
-  return input.keyBy(new EqualityFieldKeySelector(schema, 
flinkRowType, equalityFieldIds));
-}
-
+return distributeDataStreamByNoneDistributionMode(input, schema);
   case HASH:
-if (equalityFieldIds.isEmpty()) {
-  if (table.spec().isUnpartitioned()) {
-LOG.warn(
-"Fallback to use 'none' distribution mode, because there are 
no equality fields set "
-+ "and table is unpartitioned");
-return input;
-  } else {
-if (BucketPartitionerUtil.hasOneBucketField(spec)) {
-  return input.partitionCustom(
-  new BucketPartitioner(spec),
-  new BucketPartitionKeySelector(spec, schema, flinkRowType));
-} else {
-  return input.keyBy(new PartitionKeySelector(spec, schema, 
flinkRowType));
-}
-  }
-} else {
-  if (spec.isUnpartitioned()) {
-LOG.info(
-"Distribute rows by equality fields, because there are 
equality fields set "
-+ "and table is unpartitioned");
-return input.keyBy(
-new EqualityFieldKeySelector(schema, flinkRowType, 
equalityFieldIds));
-  } else {
-for (PartitionField partitionField : spec.fields()) {
-  Preconditions.checkState(
-  equalityFieldIds.contains(partitionField.sourceId()),
-  "In 'hash' distribution mode with equality fields set, 
partition field '%s' "
-  + "should be included in equality fields: '%s'",
-  partitionField,
-  equalityFieldColumns);
-}
-return input.keyBy(new PartitionKeySelector(spec, schema, 
flinkRowType));
-  }
-}
-
+return distributeDataStreamByHashDistributionMode(input, schema, spec);
   case RANGE:
-if (equalityFieldIds.isEmpty()) {
-  LOG.warn(
-  "Fallback to use 'none' distribution mode, because there are no 
equality fields set "
-  + "and {}=range is not supported yet in flink",
-  WRITE_DISTRIBUTION_MODE);
-  return input;
-} else {
-  LOG.info(
-  "Distribute rows by equality fields, because there are equality 
fields set "
-  + "and{}=range is not supported yet in flink",
-  WRITE_DISTRIBUTION_MODE);
-  return input.keyBy(new EqualityFieldKeySelector(schema, 
flinkRowType, equalityFieldIds));
-}
-
+return distributeDataStreamByRangeDistributionMode(input, schema, 
spec, sortOrder);
   default:
 throw new RuntimeException("Unrecognized " + WRITE_DISTRIBUTION_MODE + 
": " + mode);
 }
   }
 
+  private DataStream distributeDataStreamByNoneDistributionMode(
+  DataStream input, Schema iSchema) {
+if (equalityFieldIds.isEmpty()) {
+  return input;
+} else {
+  LOG.info("Distribute rows by equality fields, because there are equality 
fields set");
+  return input.keyBy(new EqualityFieldKeySelector(iSchema, flinkRowType, 
equalityFieldIds));
+}
+  }
+
+  private DataStream distributeDataStreamByHashDistributionMode(
+  DataStream input, Schema iSchema, PartitionSpec partitionSpec) {
+if (equalityFieldIds.isEmpty()) {
+  if (partitionSpec.isUnpartitioned()) {
+LOG.warn(
+"Fallback to use 'none' distribution mode, because there are no 
equality fields set "
++ "and table is unpartitioned");
+return input;
+  } else {
+return input.keyBy(new PartitionKeySelector(partitionSpec, iSchema, 
flinkRowType));
+  }
+} else {
+  if (partitionSpec.isUnpartitioned()) {
+LOG.info(
+"Distribute rows by equality fields, because there are equality 
fields set "
++ "and table is unpartitioned");
+return input.keyBy(new EqualityFieldKeySelector(iSchema, flinkRowType, 
equalityFieldIds));
+  } else {
+for (PartitionField partitionField : partitionSpec.fields()) {
+  Preconditions.checkState(
+  equalityFieldIds.c

Re: [PR] Proposal: IRC Events endpoint [iceberg]

2025-05-28 Thread via GitHub



c-thiel commented on code in PR #12584:
URL: https://github.com/apache/iceberg/pull/12584#discussion_r2112110143


##
open-api/rest-catalog-open-api.yaml:
##
@@ -3405,6 +3488,131 @@ components:
   allOf:
 - $ref: '#/components/schemas/ScanTasks'
 
+Actor:
+  type: object
+  description: Represents an actor that performed operations in the catalog
+  required:
+- type
+- actor-id
+  properties:
+type:
+  type: string
+  description: The type of actor (e.g., "user", "principal", "role")
+actor-id:
+  type: string
+  description: The identifier of the actor
+metadata:
+  type: object
+  description: Additional metadata about the actor that may vary 
between implementations
+  additionalProperties:
+type: string
+  example:
+type: "role"
+actor-id: "1234"
+metadata:
+  permissions: "read-write"
+  expiration: "2025-06-07T12:00:00Z"
+  name: "Admin Role"
+assumed-by:
+  type: "user"
+  actor-id: "4567"
+  metadata:
+name: "Peter Cold"
+email: "pe...@example.com"
+
+GetEventsRequest:
+  type: object
+  properties:
+next-page-token:
+  $ref: "#/components/schemas/PageToken"
+page-size:
+  type: integer
+  format: int32
+  description: >
+The maximum number of events to return in a single response.
+If not provided, the server may choose a default page size.
+after-timestamp-ms:
+  type: integer
+  format: int64
+  description: >
+The (server) timestamp in milliseconds to start consuming events 
from (inclusive).
+If not provided, the first available timestamp is used.

Review Comment:
   We do have the `next-page-token` for pagination with opaque tokens. This 
filter here is meant merely as a rough parameter to limit events obtained in 
the first request.
   Do you think a more precise description is missing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] INT96 timestamp is read as OffsetDateTime, not LocalDateTime [iceberg]

2025-05-28 Thread via GitHub



anesterenok commented on issue #12266:
URL: https://github.com/apache/iceberg/issues/12266#issuecomment-2916672959

   While I quite in favor of deprecating int96 storage, I must say that its 
only a projection of Iceberg types.
   If one uses Iceberg `timestamp` type, the value must be considered a local 
timestamp and processed as `LocalDateTime`.  
   If one uses Iceberg `timestamptz` type, the value must be considered an 
instant and processed as `OffsetDateTime` (well, you'd better used `Instant` 
actually, as no zone offset could really be stored for `timestamptz`).
   
   Hence, instead of TimestampInt96Reader: `OffsetDateTime` there should 
actually have been 2 readers:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: port range distribution to v2 iceberg sink [iceberg]

2025-05-28 Thread via GitHub



rodmeneses commented on code in PR #12071:
URL: https://github.com/apache/iceberg/pull/12071#discussion_r2112131384


##
flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java:
##
@@ -645,72 +711,135 @@ private DataStream 
distributeDataStream(DataStream input) {
 DistributionMode mode = flinkWriteConf.distributionMode();
 Schema schema = table.schema();
 PartitionSpec spec = table.spec();
+SortOrder sortOrder = table.sortOrder();
+
 LOG.info("Write distribution mode is '{}'", mode.modeName());
 switch (mode) {
   case NONE:
-if (equalityFieldIds.isEmpty()) {
-  return input;
-} else {
-  LOG.info("Distribute rows by equality fields, because there are 
equality fields set");
-  return input.keyBy(new EqualityFieldKeySelector(schema, 
flinkRowType, equalityFieldIds));
-}
-
+return distributeDataStreamByNoneDistributionMode(input, schema);
   case HASH:
-if (equalityFieldIds.isEmpty()) {
-  if (table.spec().isUnpartitioned()) {
-LOG.warn(
-"Fallback to use 'none' distribution mode, because there are 
no equality fields set "
-+ "and table is unpartitioned");
-return input;
-  } else {
-if (BucketPartitionerUtil.hasOneBucketField(spec)) {
-  return input.partitionCustom(
-  new BucketPartitioner(spec),
-  new BucketPartitionKeySelector(spec, schema, flinkRowType));
-} else {
-  return input.keyBy(new PartitionKeySelector(spec, schema, 
flinkRowType));
-}
-  }
-} else {
-  if (spec.isUnpartitioned()) {
-LOG.info(
-"Distribute rows by equality fields, because there are 
equality fields set "
-+ "and table is unpartitioned");
-return input.keyBy(
-new EqualityFieldKeySelector(schema, flinkRowType, 
equalityFieldIds));
-  } else {
-for (PartitionField partitionField : spec.fields()) {
-  Preconditions.checkState(
-  equalityFieldIds.contains(partitionField.sourceId()),
-  "In 'hash' distribution mode with equality fields set, 
partition field '%s' "
-  + "should be included in equality fields: '%s'",
-  partitionField,
-  equalityFieldColumns);
-}
-return input.keyBy(new PartitionKeySelector(spec, schema, 
flinkRowType));
-  }
-}
-
+return distributeDataStreamByHashDistributionMode(input, schema, spec);
   case RANGE:
-if (equalityFieldIds.isEmpty()) {
-  LOG.warn(
-  "Fallback to use 'none' distribution mode, because there are no 
equality fields set "
-  + "and {}=range is not supported yet in flink",
-  WRITE_DISTRIBUTION_MODE);
-  return input;
-} else {
-  LOG.info(
-  "Distribute rows by equality fields, because there are equality 
fields set "
-  + "and{}=range is not supported yet in flink",
-  WRITE_DISTRIBUTION_MODE);
-  return input.keyBy(new EqualityFieldKeySelector(schema, 
flinkRowType, equalityFieldIds));
-}
-
+return distributeDataStreamByRangeDistributionMode(input, schema, 
spec, sortOrder);
   default:
 throw new RuntimeException("Unrecognized " + WRITE_DISTRIBUTION_MODE + 
": " + mode);
 }
   }
 
+  private DataStream distributeDataStreamByNoneDistributionMode(
+  DataStream input, Schema iSchema) {
+if (equalityFieldIds.isEmpty()) {
+  return input;
+} else {
+  LOG.info("Distribute rows by equality fields, because there are equality 
fields set");
+  return input.keyBy(new EqualityFieldKeySelector(iSchema, flinkRowType, 
equalityFieldIds));
+}
+  }
+
+  private DataStream distributeDataStreamByHashDistributionMode(
+  DataStream input, Schema iSchema, PartitionSpec partitionSpec) {
+if (equalityFieldIds.isEmpty()) {
+  if (partitionSpec.isUnpartitioned()) {
+LOG.warn(
+"Fallback to use 'none' distribution mode, because there are no 
equality fields set "
++ "and table is unpartitioned");
+return input;
+  } else {
+return input.keyBy(new PartitionKeySelector(partitionSpec, iSchema, 
flinkRowType));
+  }
+} else {
+  if (partitionSpec.isUnpartitioned()) {
+LOG.info(
+"Distribute rows by equality fields, because there are equality 
fields set "
++ "and table is unpartitioned");
+return input.keyBy(new EqualityFieldKeySelector(iSchema, flinkRowType, 
equalityFieldIds));
+  } else {
+for (PartitionField partitionField : partitionSpec.fields()) {
+  Preconditions.checkState(
+  equalityFieldIds.c

Re: [PR] Spark: Add 'skip_file_list' option to RewriteTablePathProcedure for optional file-list generation [iceberg]

2025-05-28 Thread via GitHub



manuzhang commented on code in PR #12844:
URL: https://github.com/apache/iceberg/pull/12844#discussion_r2112148631


##
api/src/main/java/org/apache/iceberg/actions/RewriteTablePath.java:
##
@@ -86,6 +86,16 @@ public interface RewriteTablePath extends 
Action

Re: [PR] Docs, Flink: Fix equality fields requirement in upsert mode [iceberg]

2025-05-28 Thread via GitHub



pvary merged PR #13127:
URL: https://github.com/apache/iceberg/pull/13127


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs, Flink: Fix equality fields requirement in upsert mode [iceberg]

2025-05-28 Thread via GitHub



pvary commented on PR #13127:
URL: https://github.com/apache/iceberg/pull/13127#issuecomment-2916483984

   Merged to main.
   Thanks @manuzhang for the PR and @mxm for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add asan and ubsan support to cmake [iceberg-cpp]

2025-05-28 Thread via GitHub



raulcd commented on code in PR #107:
URL: https://github.com/apache/iceberg-cpp/pull/107#discussion_r2112029921


##
.github/workflows/sanitizer_test.yml:
##
@@ -0,0 +1,63 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: ASAN and UBSAN Tests
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+jobs:
+  sanitizer-test:
+name: "ASAN and UBSAN Tests"
+runs-on: ubuntu-24.04
+strategy:
+  fail-fast: false
+steps:
+  - name: Checkout iceberg-cpp
+uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 
v4.2.2
+  - name: Configure and Build with ASAN & UBSAN
+run: |
+  mkdir build && cd build
+  cmake .. -DCMAKE_BUILD_TYPE=Debug -DICEBERG_ENABLE_ASAN=ON 
-DICEBERG_ENABLE_UBSAN=ON
+  cmake --build . --verbose
+  - name: Run Tests
+working-directory: build
+env:
+  ASAN_OPTIONS: 
log_path=out.log:detect_leaks=1:symbolize=1:strict_string_checks=1:halt_on_error=0:detect_container_overflow=0

Review Comment:
   I'll let @wgtmac and yourself decide but I would rather merge a failing CI 
job on this case, knowing that it has to be fixed on a subsequent PR, than a 
false positive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Proposal: IRC Events endpoint [iceberg]

2025-05-28 Thread via GitHub



c-thiel commented on code in PR #12584:
URL: https://github.com/apache/iceberg/pull/12584#discussion_r2112105575


##
open-api/rest-catalog-open-api.yaml:
##
@@ -3041,6 +3096,34 @@ components:
 - $ref: '#/components/schemas/AddViewVersionUpdate'
 - $ref: '#/components/schemas/SetCurrentViewVersionUpdate'
 
+OperationType:
+  type: string
+  description: >
+Defines the type of operation, either a standard operation defined by 
the Iceberg REST API
+specification or a custom operation specific to an implementation.
+  anyOf:
+  - enum:
+- create-table

Review Comment:
   The matrix approach that you propose can lead to invalid combinations, 
already today, register wouldn't be easy to distinguish from create. Also 
"replace" is something that is only valid for views and not namespaces for 
example. Because of this, I would prefer to stick to the current long list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Spark, Avro and Iceberg Timestamp, Timestamp NTZ definition need more clarifications [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on issue #12751:
URL: https://github.com/apache/iceberg/issues/12751#issuecomment-2916840926

   Is this an issue for Datafiles written with Avro Data instead of Parquet?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Infra: Set 1.9.1 to Latest Release [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer merged PR #13177:
URL: https://github.com/apache/iceberg/pull/13177


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix a cast that is too narrow [iceberg]

2025-05-28 Thread via GitHub



angelo-DNAStack commented on PR #12743:
URL: https://github.com/apache/iceberg/pull/12743#issuecomment-2917511977

   Checking in on this PR again, please let me know if there are any additional 
changes required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Adds 1.9.1 Versioned JavaDocs [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on PR #13173:
URL: https://github.com/apache/iceberg/pull/13173#issuecomment-2917518749

   Thanks @ajantha-bhat and @danielcweeks for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Adds 1.9.1 Versioned JavaDocs [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer merged PR #13173:
URL: https://github.com/apache/iceberg/pull/13173


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Infra: Set 1.9.1 to Latest Release [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on PR #13177:
URL: https://github.com/apache/iceberg/pull/13177#issuecomment-2917517509

   Thanks @ajantha-bhat , @flyrain + @singhpk234 for reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Site: Updates for 1.9.1 Release [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer merged PR #13176:
URL: https://github.com/apache/iceberg/pull/13176


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Site: Updates for 1.9.1 Release [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on PR #13176:
URL: https://github.com/apache/iceberg/pull/13176#issuecomment-2917516161

   Thanks @ajantha-bhat + @danielcweeks for review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Site: Add Versioned Docs for 1.9.1 [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer merged PR #13172:
URL: https://github.com/apache/iceberg/pull/13172


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Site: Add Versioned Docs for 1.9.1 [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on PR #13172:
URL: https://github.com/apache/iceberg/pull/13172#issuecomment-2917520514

   Thanks @ajantha-bhat + @danielcweeks for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on code in PR #12254:
URL: https://github.com/apache/iceberg/pull/12254#discussion_r2112716340


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -107,9 +115,12 @@ public abstract class TestRemoveOrphanFilesAction extends 
TestBase {
   protected Map properties;
   @Parameter private int formatVersion;
 
-  @Parameters(name = "formatVersion = {0}")
+  @Parameter(index = 1)
+  private boolean usePrefixListing;
+
+  @Parameters(name = "formatVersion = {0}, usePrefixListing = {1}")
   protected static List parameters() {
-return Arrays.asList(2, 3);
+return Arrays.asList(new Object[] {2, true}, new Object[] {2, false}, new 
Object[] {3, false});

Review Comment:
   I mostly just don't want to have another explicit "version number" I'm fine 
if we even only test on the default version number really.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on code in PR #12254:
URL: https://github.com/apache/iceberg/pull/12254#discussion_r2112725064


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -344,6 +374,8 @@ public void testWapFilesAreKept() {
 // wap write
 df.select("c1", "c2", 
"c3").write().format("iceberg").mode("append").save(tableLocation);
 
+spark.conf().unset(SparkSQLProperties.WAP_ID);

Review Comment:
   We should be using withSQLConf then for this test then like
   
   ```java
   withSQLConf(
   ImmutableMap.of(confName, "2m"),
   () -> {
 SparkConfParser parser = new SparkConfParser(spark, table, 
ImmutableMap.of());
 Duration duration = 
parser.durationConf().sessionConf(confName).parseOptional();
 assertThat(duration).hasMinutes(2);
   });
```

or a cloned Spark Session



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] SPARK: Remove dependency on hadoop's filesystem class from remove orphan files [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer commented on code in PR #12254:
URL: https://github.com/apache/iceberg/pull/12254#discussion_r2112725064


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -344,6 +374,8 @@ public void testWapFilesAreKept() {
 // wap write
 df.select("c1", "c2", 
"c3").write().format("iceberg").mode("append").save(tableLocation);
 
+spark.conf().unset(SparkSQLProperties.WAP_ID);

Review Comment:
   We should be using withSQLConf then for this test then like
   
   ```java
   withSQLConf(Map.of(SparkSQLProperties.WAP_ID, "1"), () -> {
   // wap write
   df.select("c1", "c2", 
"c3").write().format("iceberg").mode("append").save(tableLocation);
   
   
   Dataset resultDF = 
spark.read().format("iceberg").load(tableLocation);
   List actualRecords =
 
resultDF.as(Encoders.bean(ThreeColumnRecord.class)).collectAsList();
   
 // TODO: currently fails because DVs delete stuff from WAP branch
 assertThat(actualRecords)
   .as("Should not return data from the staged snapshot")
   .isEqualTo(records);
   });
```

or a cloned Spark Session



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] fix(Catalog): Handle NotFound exception for missing metadata file [iceberg]

2025-05-28 Thread via GitHub



nastra commented on code in PR #13143:
URL: https://github.com/apache/iceberg/pull/13143#discussion_r2112280095


##
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##
@@ -959,6 +959,16 @@ public void testLoadMissingTable() {
 .hasMessageStartingWith("Table does not exist: ns.tbl");
   }
 
+  @Test
+  public void testLoadTableWithMissingMetadatafile() {
+C catalog = catalog();
+
+assertThat(catalog.tableExists(TBL)).as("Table should not 
exist").isFalse();
+assertThatThrownBy(() -> catalog.loadTable(TBL))
+.isInstanceOf(NoSuchTableException.class)

Review Comment:
   the test doesn't reproduce the issue you're trying to repro and it passes 
without applying the fix that is being proposed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Docs: Adds 1.9.1 Versioned JavaDocs [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer opened a new pull request, #13173:
URL: https://github.com/apache/iceberg/pull/13173

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4: streaming-skip-overwrite-snapshots fix [iceberg]

2025-05-28 Thread via GitHub



huaxingao merged PR #13168:
URL: https://github.com/apache/iceberg/pull/13168


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4: streaming-skip-overwrite-snapshots fix [iceberg]

2025-05-28 Thread via GitHub



huaxingao commented on PR #13168:
URL: https://github.com/apache/iceberg/pull/13168#issuecomment-2916961579

   Merged. Thanks @wypoon for the PR! Thanks @singhpk234 for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Site: Add Versioned Docs for 1.9.1 [iceberg]

2025-05-28 Thread via GitHub



RussellSpitzer opened a new pull request, #13172:
URL: https://github.com/apache/iceberg/pull/13172

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] ERROR: Procedure system.refresh_table not found [iceberg]

2025-05-28 Thread via GitHub



vishwanthp12 opened a new issue, #13171:
URL: https://github.com/apache/iceberg/issues/13171

   Command I'm using in Spark-sql: CALL 
iceberg_catalog.system.refresh_table('iceberg_catalog.$db.$table');
   
   I'm using JDBC catalog configured on MariaDB (MYSQL)
   
   here is my spark-default.conf:
   
   # Iceberg extensions
   
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   
   # JDBC catalog for MariaDB
   spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog
   
spark.sql.catalog.iceberg_catalog.catalog-impl=org.apache.iceberg.jdbc.JdbcCatalog
   
spark.sql.catalog.iceberg_catalog.uri=jdbc:mysql://localhost:3306/iceberg_catalog
   spark.sql.catalog.iceberg_catalog.jdbc.user=iceberg
   spark.sql.catalog.iceberg_catalog.jdbc.password=@xxx
   
spark.sql.catalog.iceberg_catalog.warehouse=s3://xxx-xxx-x/spark-test/iceberg-warehouse
   spark.sql.catalog.iceberg_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
   
   The error I get in CALL command is : ERROR: Procedure system.refresh_table 
not found
   
   
   What Am I missing ? I also tried switching to hadoop catalog. Still same 
issue.
   
   Can someone share their inputs on this issue please ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add basic classes for writing table format-version 4 [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat commented on code in PR #13123:
URL: https://github.com/apache/iceberg/pull/13123#discussion_r2112311313


##
api/src/test/java/org/apache/iceberg/TestHelpers.java:
##
@@ -54,7 +54,7 @@ public class TestHelpers {
 
   private TestHelpers() {}
 
-  public static final int MAX_FORMAT_VERSION = 3;
+  public static final int MAX_FORMAT_VERSION = 4;

Review Comment:
   I think it’s worth focusing on what’s technically correct rather than solely 
relying on what has been done previously. Since we publish nightly builds, 
there’s a chance users could start creating v4 tables before any spec is 
officially finalized. With the 1.10.0 discussions already underway, this change 
could potentially be included in a release without any approved v4 spec.
   
   My suggestion was simply to finalize at least one v4 spec change before 
beginning v4 development on the main branch, so that users have a clear 
understanding of what features come with enabling v4. 
   
   That said, I may be overthinking this or being too cautious 😅. Please feel 
free to go with the broader consensus from the community on this PR (other 
reviewers opinion). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Spec: Mark version 3 as completed [iceberg]

2025-05-28 Thread via GitHub



ajantha-bhat opened a new pull request, #13175:
URL: https://github.com/apache/iceberg/pull/13175

   voted on mailing list: 
https://lists.apache.org/thread/9oncq0j0222nrm6scvm89xdq3gozg24p


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

1 2 >

1 - 100 of 184 matches

Mail list logo