rdblue commented on code in PR #16263:
URL: https://github.com/apache/iceberg/pull/16263#discussion_r3221746387
##########
core/src/main/java/org/apache/iceberg/ManifestReader.java:
##########
@@ -417,14 +417,9 @@ public ManifestEntry<F> apply(ManifestEntry<F> entry) {
}
};
} else {
- // data file's first_row_id is null when the manifest's first_row_id is
null
- return entry -> {
- if (entry.file() instanceof BaseFile) {
- ((BaseFile<?>) entry.file()).setFirstRowId(null);
- }
-
- return entry;
- };
+ // Preserve the source entry’s first row ID even if the manifest hasn’t
assigned one since it
+ // may be EXISTING
+ return Function.identity();
Review Comment:
I'm not sure about this change.
It looks like this case covers when the manifest's `first_row_id` is null.
That should only happen when reading a snapshot from an older version without
row IDs. In that case, the snapshot's `first-row-id` is null and we don't
assign `first_row_id` to manifests or to data files. This is covered by [Row
Lineage for Upgraded
Tables](https://iceberg.apache.org/spec/#row-lineage-for-upgraded-tables) in
the spec.
If `first-row-id` is assigned, then every manifest should have a
`first_row_id` assigned. Then this branch shouldn't be triggering.
Is it possible that the problem is further up and a manifest is somehow
missing a `first_row_id`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]