RussellSpitzer commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1693320217
########## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkScan.java: ########## @@ -734,6 +801,21 @@ private Expression[] expressions(Expression... expressions) { return expressions; } + private void checkStatistics(SparkScan scan, long expectedRowCount, boolean expectedColumnStats) { Review Comment: This may be over parameterized, we only use this function 2 times and we use them for relatively different things. So I'm not sure it makes things clearer. It's also missing a bit to be completely parameterized. So if it was completely parameterized we would want probably two functions like ```java checkStatisticsNotReported(scan) checkReportedStatistics(scan, rowCount, map<String, Int> distinctValueCounts) ``` As it is we hard code in the column name "id" and the distinct value count "4l" but parameterize the expectedRowCount. So we couldn't really re-use this function for any reason. Now since we only actually use this function for 1 example above it may be ok to not even parameterize it at all. So I think there are 2 ways to go here. Find more uses for the function and fully parameterize Deparameterize and inline. I think we have a few more use cases we should probably test out just for fun so maybe route 1 is better? For example Test where 1 column has NDV and the other does not. Test where stats reporting is enabled, but the table does not have stats Test with different distinct values for columns in the table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org