yihua commented on code in PR #18279:
URL: https://github.com/apache/hudi/pull/18279#discussion_r3035041062


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java:
##########


Review Comment:
   **Line 419:** 🤖 The `StoragePathInfo` is constructed with dummy values 
`(false, (short) 0, 0, 0)` for `isDirectory`, `replication`, `blockSize`, 
`modificationTime`. While this is safe for map key usage (since 
`StoragePathInfo.equals()` only compares the path per Hadoop's FileStatus 
contract), the `fileSize` captured via `writer.getCurrentSize()` is meaningful 
and will differ from the old behavior where `getPathInfo()` returned the 
on-disk size after close.
   
   The writer's `getCurrentSize()` reflects bytes written to the output stream, 
which may differ from the final on-disk file size (e.g., due to buffering, 
compression, or filesystem block alignment). For the rollback stats tracking 
purpose this is likely fine, but worth a brief comment noting why 
`getCurrentSize()` is an acceptable approximation.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java:
##########


Review Comment:
   **Line 389:** 🤖 When `preComputedVersion` is present, both `withLogVersion` 
and `withLogWriteToken` are overridden. The write token was already set a few 
lines above via `CommonClientUtils.generateWriteToken(taskContextSupplier)`. 
Since `HoodieLogFormatWriter.getOutputStream()` rolls over to version+1 when 
the file at the given version already exists, the pre-computed token (from the 
existing file on disk) becomes the `rolloverLogWriteToken` for the new file.
   
   In the original code (without pre-computation), the builder's listing 
discovers the latest version N, uses it with a **fresh** write token T_new, and 
since no file exists at `(N, T_new)`, it creates the file directly at version 
N. With pre-computation, the builder uses `(N, T_existing)`, the file exists, 
so it rolls over to `(N+1, rolloverToken)`.
   
   Both approaches produce a valid rollback command block log file, but at 
different version numbers (N vs N+1). Could you confirm this version difference 
is intentional/acceptable? My understanding is it should be fine since the log 
reader processes all log files for a file group regardless of version, but 
wanted to flag the behavioral difference.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelper.java:
##########


Review Comment:
   **Line 307:** 🤖 The filter `path -> 
path.getName().contains(HoodieLogFile.DELTA_EXTENSION)` uses `contains` rather 
than a stricter pattern match. This could match unexpected filenames like 
`foo.log.backup` or `data.log.tmp`. While the `InvalidHoodiePathException` 
catch on line 318 handles such cases gracefully, a stricter filter (e.g., 
checking that the extension appears at the expected position in the Hudi log 
file naming convention) would avoid constructing `HoodieLogFile` objects only 
to discard them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to