yihua commented on code in PR #18279: URL: https://github.com/apache/hudi/pull/18279#discussion_r3035041062
########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java: ########## Review Comment: **Line 419:** 🤖 The `StoragePathInfo` is constructed with dummy values `(false, (short) 0, 0, 0)` for `isDirectory`, `replication`, `blockSize`, `modificationTime`. While this is safe for map key usage (since `StoragePathInfo.equals()` only compares the path per Hadoop's FileStatus contract), the `fileSize` captured via `writer.getCurrentSize()` is meaningful and will differ from the old behavior where `getPathInfo()` returned the on-disk size after close. The writer's `getCurrentSize()` reflects bytes written to the output stream, which may differ from the final on-disk file size (e.g., due to buffering, compression, or filesystem block alignment). For the rollback stats tracking purpose this is likely fine, but worth a brief comment noting why `getCurrentSize()` is an acceptable approximation. ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java: ########## Review Comment: **Line 389:** 🤖 When `preComputedVersion` is present, both `withLogVersion` and `withLogWriteToken` are overridden. The write token was already set a few lines above via `CommonClientUtils.generateWriteToken(taskContextSupplier)`. Since `HoodieLogFormatWriter.getOutputStream()` rolls over to version+1 when the file at the given version already exists, the pre-computed token (from the existing file on disk) becomes the `rolloverLogWriteToken` for the new file. In the original code (without pre-computation), the builder's listing discovers the latest version N, uses it with a **fresh** write token T_new, and since no file exists at `(N, T_new)`, it creates the file directly at version N. With pre-computation, the builder uses `(N, T_existing)`, the file exists, so it rolls over to `(N+1, rolloverToken)`. Both approaches produce a valid rollback command block log file, but at different version numbers (N vs N+1). Could you confirm this version difference is intentional/acceptable? My understanding is it should be fine since the log reader processes all log files for a file group regardless of version, but wanted to flag the behavioral difference. ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelper.java: ########## Review Comment: **Line 307:** 🤖 The filter `path -> path.getName().contains(HoodieLogFile.DELTA_EXTENSION)` uses `contains` rather than a stricter pattern match. This could match unexpected filenames like `foo.log.backup` or `data.log.tmp`. While the `InvalidHoodiePathException` catch on line 318 handles such cases gracefully, a stricter filter (e.g., checking that the extension appears at the expected position in the Hudi log file naming convention) would avoid constructing `HoodieLogFile` objects only to discard them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
