RussellSpitzer commented on code in PR #10659:
URL: https://github.com/apache/iceberg/pull/10659#discussion_r1693320217


##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkScan.java:
##########
@@ -734,6 +801,21 @@ private Expression[] expressions(Expression... 
expressions) {
     return expressions;
   }
 
+  private void checkStatistics(SparkScan scan, long expectedRowCount, boolean 
expectedColumnStats) {

Review Comment:
   This may be over parameterized, we only use this function 2 times and we use 
them for relatively different things. So I'm not sure it makes things clearer. 
It's also missing a bit to be completely parameterized.
   
   So if it was completely parameterized we would want probably two functions 
like
   ```java
   checkStatisticsNotReported(scan)
   checkReportedStatistics(scan, rowCount, map<String, Int> distinctValueCounts)
   ```
   As it is we hard code in the column name "id" and the distinct value count 
"4l" but parameterize the expectedRowCount. So we couldn't really re-use this 
function for any reason.
   
   Now since we only actually use this function for 1 example above it may be 
ok to not even parameterize it at all. So I think there are 2 ways to go here.
   
   Find more uses for the function and fully parameterize
   Deparameterize and inline.
   
   I think we have a few more use cases we should probably test out just for 
fun so maybe route 1 is better?
   
   For example
   
   Test where 1 column has NDV and the other does not.
   Test where stats reporting is enabled, but the table does not have stats
   Test with different distinct values for columns in the table



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to