stevenzwu commented on code in PR #16263:
URL: https://github.com/apache/iceberg/pull/16263#discussion_r3214307973


##########
core/src/main/java/org/apache/iceberg/ManifestReader.java:
##########
@@ -417,14 +417,8 @@ public ManifestEntry<F> apply(ManifestEntry<F> entry) {
         }
       };
     } else {
-      // data file's first_row_id is null when the manifest's first_row_id is 
null

Review Comment:
   The global change reads correctly. The `else` branch only runs when 
`manifest.firstRowId` is null, and in every such case the on-disk 
`file.firstRowId` is what should be preserved:
   
   - pre-v3 manifests: `file.firstRowId` is null in the data, identity 
preserves null → unchanged behavior
   - v3 EXISTING entries in a transient filtered/merged manifest: identity 
preserves the original assignment → the fix
   - v3 ADDED entries in a freshly-written manifest: `file.firstRowId` is null 
in the data (assigned at read time once the manifest gets a `firstRowId`), 
identity preserves null → unchanged behavior
   
   I am also wondering about the reason for [the previous behavior 
change](https://github.com/apache/iceberg/pull/12843/changes#diff-79baa98562948784b4c255ee2dd14ae3a26f875d246f1607f76d58668285ef31).
 Maybe it is a defensive check  covering: `ManifestFiles.copyAppendManifest`, 
the user-facing `appendManifest()` flow. A user can construct a `ManifestFile` 
externally with a non-null file-level `first_row_id` set on an ADDED entry — 
nothing in the public API stops that.
   
   Under the previous reader-side clobber, the value was stripped before the 
writer saw it. Under `Function.identity()`, the value reaches the writer — but 
`writer.add(entry)` → `wrapAppend` → `Delegates.suppressFirstRowId`, so the 
wrapper's `firstRowId()` returns null and the bytes written for the ADDED entry 
have null on disk. Identical on-disk result.
   
   The reader-side clobber was redundant on this path; the writer-side 
suppression in `MergingSnapshotProducer.add` and 
`GenericManifestEntry.wrapAppend` carries the "no override from client" 
enforcement. The global change preserves that property.



##########
core/src/main/java/org/apache/iceberg/ManifestReader.java:
##########
@@ -417,14 +417,8 @@ public ManifestEntry<F> apply(ManifestEntry<F> entry) {
         }
       };
     } else {
-      // data file's first_row_id is null when the manifest's first_row_id is 
null
-      return entry -> {
-        if (entry.file() instanceof BaseFile) {
-          ((BaseFile<?>) entry.file()).setFirstRowId(null);
-        }
-
-        return entry;
-      };
+      // preserve any existing file-level first_row_id

Review Comment:
   Suggested wording that names the specific case the branch handles:
   
   ```suggestion
         // preserve any existing file-level first_row_id. EXISTING entries 
carried over
         // from a rewrite or merge. Keep their original first_row_id even when 
the containing
         // manifest hasn't been assigned its own first_row_id yet.
   ```
   
   A future reader sees exactly which call path is defended — the 
merge-of-just-written-filtered-manifest case where the previous null-clobber 
lost row IDs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to