count to iceberg

via GitHub Sat, 04 Feb 2023 22:07:51 -0800


huaxingao commented on code in PR #6622:
URL: https://github.com/apache/iceberg/pull/6622#discussion_r1096628616



##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestAggregatePushDown.java:
##########
@@ -459,9 +492,157 @@ public void testAggregatePushDownForTimeTravel() {
     if (explainString2.contains("count(id)")) {
       explainContainsPushDownAggregates2 = true;
     }
+
     Assert.assertTrue("count pushed down", explainContainsPushDownAggregates2);
 
     List<Object[]> actual2 = sql("SELECT count(id) FROM %s", tableName);
     assertEquals("count push down", expected2, actual2);
   }
+
+  @Test
+  public void testAllNull() {
+    sql("CREATE TABLE %s (id int, data int) USING iceberg PARTITIONED BY 
(id)", tableName);
+    sql(
+        "INSERT INTO %s VALUES (1, null),"
+            + "(1, null), "
+            + "(2, null), "
+            + "(2, null), "
+            + "(3, null), "
+            + "(3, null)",
+        tableName);
+    String select = "SELECT count(*), max(data), min(data), count(data) FROM 
%s";
+
+    List<Object[]> explain = sql("EXPLAIN " + select, tableName);
+    String explainString = explain.get(0)[0].toString();
+    boolean explainContainsPushDownAggregates = false;
+    if (explainString.contains("max(data)")
+        && explainString.contains("min(data)")
+        && explainString.contains("count(data)")) {
+      explainContainsPushDownAggregates = true;
+    }
+
+    Assert.assertTrue(
+        "explain should contain the pushed down aggregates", 
explainContainsPushDownAggregates);
+
+    List<Object[]> actual = sql(select, tableName);
+    List<Object[]> expected = Lists.newArrayList();
+    expected.add(new Object[] {6L, null, null, 0L});
+    assertEquals("min/max/count push down", expected, actual);
+  }
+
+  @Test
+  public void testAllNaN() {
+    sql("CREATE TABLE %s (id int, data float) USING iceberg PARTITIONED BY 
(id)", tableName);
+    sql(
+        "INSERT INTO %s VALUES (1, float('nan')),"
+            + "(1, float('nan')), "
+            + "(2, float('nan')), "
+            + "(2, float('nan')), "
+            + "(3, float('nan')), "
+            + "(3, float('nan'))",
+        tableName);
+    String select = "SELECT count(*), max(data), min(data), count(data) FROM 
%s";
+
+    List<Object[]> explain = sql("EXPLAIN " + select, tableName);
+    String explainString = explain.get(0)[0].toString();
+    boolean explainContainsPushDownAggregates = false;
+    if (explainString.contains("max(data)")
+        && explainString.contains("min(data)")
+        && explainString.contains("count(data)")) {
+      explainContainsPushDownAggregates = true;
+    }
+
+    Assert.assertTrue(
+        "explain should contain the pushed down aggregates", 
explainContainsPushDownAggregates);
+
+    List<Object[]> actual = sql(select, tableName);
+    List<Object[]> expected = Lists.newArrayList();
+    //    expected.add(
+    //            new Object[] {6L, Float.NaN, Float.NaN, 6L});
+    expected.add(new Object[] {6L, null, null, 6L});

Review Comment:
   There is a problem here: if all the values in the column is `NaN`, then we 
don't have stats for `upper_bounds` and `lower_bounds` for this column, so the 
pushed down min and max are `null`. They should be `NaN`. The correct expected 
value should be
   ```
       expected.add(new Object[] {6L, Float.NaN, Float.NaN, 6L});
   ```
   
   I am not sure what is the correct fix for this problem. Shall we return 
`NaN` for `upper_bounds` and `lower_bounds` if all the values are `NaN`?
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] huaxingao commented on a diff in pull request #6622: push down min/max/count to iceberg

Reply via email to