Re: [PR] Minor 18 [hudi]

via GitHub Tue, 07 Apr 2026 00:37:17 -0700


yihua commented on code in PR #18476:
URL: https://github.com/apache/hudi/pull/18476#discussion_r3043520609



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -427,7 +426,6 @@ private Dataset<Row> 
readRecordsForGroupAsRow(JavaSparkContext jsc,
 
     HashMap<String, String> params = new HashMap<>();

Review Comment:
   🤖 Could you confirm that `createRelation` with explicit 
`hoodie.datasource.read.paths` + `glob.paths` truly suppresses Hudi's 
file-group-based log file auto-discovery for MoR? My concern is that if the 
relation still scans for *all* log files belonging to a file group (rather than 
only the ones in `paths`), a concurrent completed commit that wrote new log 
files to the same file group between clustering scheduling and execution could 
get silently included — producing a clustered output that covers a wider time 
range than the plan intended.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Minor 18 [hudi]

Reply via email to