vustef opened a new issue, #1636:
URL: https://github.com/apache/iceberg-rust/issues/1636

   ### Is your feature request related to a problem or challenge?
   
   
   Currently iceberg-rust doesn't provide a way to see changes between two 
snapshots. In Spark, through Iceberg Java implementation, this is done using 
[create_changelog_view](https://iceberg.apache.org/docs/nightly/spark-procedures/#create_changelog_view).
 This is very useful for doing change data capture on top of Iceberg tables.
   
   
   
   ### Describe the solution you'd like
   
   The output for Spark's `create_changelog_view`, in default mode, is 
something like this:
   
   <img width="468" height="151" alt="Image" 
src="https://github.com/user-attachments/assets/5c17e3a4-3f57-4bd5-b497-f4cce7579663";
 />
   
   where each row shows its user-defined columns, with addition of 3 metadata 
columns (`_change_type`, `_change_ordinal`, `_commit_snapshot_id`).
   
   The way Java code does it is incremental, meaning only the data between the 
optional timestamps (or commit IDs) is processed. Here are some references:
   `openChangelogScanTask` in 
https://github.com/apache/iceberg/blob/efbfb7ef9addeb33e72208c927936e50b92d3357/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/ChangelogRowReader.java
   `doPlanFiles` in 
https://github.com/apache/iceberg/blob/6ec3de390d3fa6e797c6975b1eaaea41719db0fe/core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java
   
[BaseAddedRowsScanTask](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/core/src/main/java/org/apache/iceberg/BaseAddedRowsScanTask.java)
 and 
[BaseDeletedDataFileScanTask](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/core/src/main/java/org/apache/iceberg/BaseDeletedDataFileScanTask.java).
 
[BaseDeletedRowsScanTask](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/core/src/main/java/org/apache/iceberg/BaseDeletedRowsScanTask.java)
 is unused, which means that Spark doesn't support row-level deletes, only 
copy-on-write kind of deletes, for the changelog scan. But it would be good if 
Rust actually supported that as well, I see no particular reason why this 
wasn't supported in Spark.
   
   The `create_changelog_view` has several options, and perhaps we don't have 
to support them all in Rust immediately, but over time.
   
   ### Willingness to contribute
   
   I would be willing to contribute to this feature with guidance from the 
Iceberg Rust community


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to