rdblue commented on code in PR #16408:
URL: https://github.com/apache/iceberg/pull/16408#discussion_r3336916077
##########
core/src/main/java/org/apache/iceberg/TrackingStruct.java:
##########
@@ -249,95 +254,43 @@ protected <T> void internalSet(int pos, T value) {
}
}
- static Builder builder() {
- return new Builder();
+ /** Creates a builder for a newly added file in the given snapshot. */
+ static TrackingBuilder added(long snapshotId) {
+ return new TrackingBuilder(snapshotId);
+ }
+
+ /**
+ * Creates a builder for a tracking row derived from {@code source} at the
current snapshot.
+ *
+ * <p>Without MODIFIED status, this produces an EXISTING row. Once MODIFIED
lands, the status will
+ * be auto-derived from the source, the snapshot, and which mutation methods
are called.
+ */
+ // TODO: when MODIFIED is added, derive status from source +
currentSnapshotId + mutations.
+ static TrackingBuilder builder(Tracking source, long currentSnapshotId) {
Review Comment:
I'm strongly in favor of keeping the logic for updating status in the
builder. One of my main concerns about introducing the `MODIFIED` status is
that writers would get it wrong. Having this in the builder ensures that a
single place makes the decision to use that status.
This also makes the builder easier to use for callers. The caller doesn't
need to go through its changes to decide what kind of builder to use. It just
needs to configure the builder properly.
Here's a rough sketch of what it looks like to delegate modified/existing to
the caller:
```java
if (!hasDVUpdate && !hasColumnUpdate) {
return tracking.asExisting();
}
Builder trackingBuilder = Builder.modified(tracking, newSnapshotId);
if (hasDVUpdate) {
trackingBuilder.dvUpdated();
}
if (hasColumnUpdate) {
trackingBuilder.columnUpdated();
}
return trackingBuilder.build();
```
And here's the one where status is in the builder:
```java
Builder trackingBuilder = Builder.builder(tracking, newSnapshotId);
if (hasDVUpdate) {
trackingBuilder.dvUpdated();
}
if (hasColumnUpdate) {
trackingBuilder.columnUpdated();
}
return trackingBuilder.build();
```
The builder has to be updated for a column update or DV anyway, so it makes
little sense to use a different builder. And in the first case, the builder has
to verify that the right status was used anyway. As long as the builder is
responsible for checking that status is correct, we may as well skip leaking
status logic into the calling code and do it in the builder.
> I think, for simplicity we could introduce an API add `Tracking
Tracking.asExisting()`
I don't think this actually simplifies the use of the API, since it leaks
state logic into the caller, as you can see above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]