rdblue commented on code in PR #16092:
URL: https://github.com/apache/iceberg/pull/16092#discussion_r3244853485


##########
core/src/main/java/org/apache/iceberg/TrackingStruct.java:
##########
@@ -246,4 +266,78 @@ public String toString() {
         .add("replaced_positions", replacedPositions == null ? "null" : 
"(binary)")
         .toString();
   }
+
+  static class Builder {
+    private EntryStatus status = null;
+    private Long snapshotId = null;
+    private Long dataSequenceNumber = null;
+    private Long fileSequenceNumber = null;
+    private Long dvSnapshotId = null;
+    private Long firstRowId = null;
+    private byte[] deletedPositions = null;
+    private byte[] replacedPositions = null;
+
+    Builder status(EntryStatus entryStatus) {

Review Comment:
   Entry status has a specific lifecycle: when files are written, they are 
`ADDED`. When read and written again, they move to `EXISTING`. And when either 
an `ADDED` or `EXISTING` entry is deleted, it is set to `DELETED`. Also, the 
`snapshotId` field is related: it is required for `ADDED`, must not be modified 
for `EXISTING`, and again must be supplied for `DELETED`. (I think `REPLACED` 
will have similar behavior and it is easier to think about the API by omitting 
it for now.)
   
   I think that it makes more sense to have the builder handle these cases than 
to allow creating possibly misconfigured `Tracking` structs. I would handle 
these requirements in a way that match the lifecycle:
   
   ```java
     Builder added(long snapshotId); // creates a new Tracking for an addition
     Builder existing(Tracking); // creates Tracking for an existing file based 
on Tracking read from a manifest
     Builder deleted(Tracking, long snapshotId); // creates Tracking to delete 
a file
   ```
   
   Passing the existing tracking in makes it so the builder can keep values 
that must be preserved. And this approach allows the builder to use the initial 
intent (added/existing/deleted) to perform better validation. For instance, you 
can't call `dvSnapshotId` for a deleted entry -- it has to come from the 
existing/added entry and can't be changed.
   
   I should also note that a builder may not be the right way to create 
`Tracking` instances. It could be that we have a builder for `ADDED`, but then 
use instance methods to change from one state to the next, like 
`toExisting(...)`. I think going through the thought exercise of restricting 
what you can do through the builder is the right path forward. If it doesn't 
really work then we can consider other approaches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to