rdblue commented on code in PR #16689:
URL: https://github.com/apache/iceberg/pull/16689#discussion_r3364092997
##########
core/src/main/java/org/apache/iceberg/Tracking.java:
##########
@@ -28,13 +28,13 @@ interface Tracking {
0,
"status",
Types.IntegerType.get(),
- "Entry status: 0=existing, 1=added, 2=deleted, 3=replaced");
+ "Entry status: 0=existing, 1=added, 2=deleted, 3=replaced,
4=modified");
Types.NestedField SNAPSHOT_ID =
Types.NestedField.optional(
1,
"snapshot_id",
Types.LongType.get(),
- "Snapshot ID where the file was added or deleted");
+ "Snapshot ID where the file was added, deleted, replaced, or
modified");
Review Comment:
Have we agreed to modify the snapshot ID for a replaced entry? I thought
that we were not going to change replaced entries.
We change the snapshot ID for deleted entries, but not for existing entries
so there's precedent both ways. If you're scanning for changes, the snapshot ID
is useful for filtering out changes that are left-over from older snapshots.
For instance, I may rewrite a manifest and delete a file in it. If I'm later
scanning that file for changes, I would be able to check whether the delete
entry is for the snapshot ID I'm getting changes for.
The counter-argument is that the manifest would probably only be scanned for
changes if you're looking for changes that would match. In order to scan that
manifest, you'd first check its snapshot ID (when it was added) and not scan
otherwise.
Overall, I think the right thing is to update the snapshot ID as you have
here. That way if any implementation reads files it doesn't need to, it has
enough information to filter out the entries.
Good to note in the spec @stevenzwu and @amogh-jahagirdar.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]