andygrove opened a new issue, #4194: URL: https://github.com/apache/datafusion-comet/issues/4194
Sub-issue of #4098. ## Description Three tests fail in \`Spark 4.1, JDK 17/auto [exec]\` (\`CometTaskMetricsSuite\`): - \`native_datafusion scan reports task-level input metrics matching Spark\` - \`input metrics aggregate across multiple native scans in a join\` - \`input metrics aggregate across multiple native scans in a union\` Symptom (one example): \`\`\` 9.6 was greater than or equal to 0.7, but 9.6 was not less than or equal to 1.3 bytesRead ratio out of range: comet=90498, spark=9427, ratio=9.6 \`\`\` Two more failures with similar 6.4 and 13.9 ratios. ## Suspected root cause Spark 4.1 changed what \`inputMetrics.bytesRead\` accounts for, most likely now reports a smaller subset (e.g. only bytes actually read into row buffers, versus full Parquet footer plus row group). Compare \`ParquetFileReader\` / \`PartitionedFile\` accounting between 4.0 and 4.1 and adjust Comet's metric source accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
