[GitHub] [iceberg] rdblue commented on a diff in pull request #7886: Spark 3.4: Support pushing down system functions by V2 filters

via GitHub Fri, 23 Jun 2023 14:53:31 -0700


rdblue commented on code in PR #7886:
URL: https://github.com/apache/iceberg/pull/7886#discussion_r1240437035



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java:
##########
@@ -322,10 +423,77 @@ private static boolean hasNoInFilter(Predicate predicate) 
{
   }
 
   private static boolean isSupportedInPredicate(Predicate predicate) {
-    if (!isRef(childAtIndex(predicate, 0))) {
+    if (!couldConvert(childAtIndex(predicate, 0))) {
       return false;
     } else {
       return 
Arrays.stream(predicate.children()).skip(1).allMatch(SparkV2Filters::isLiteral);
     }
   }
+
+  private static <I, T> UnboundTerm<T> toTerm(I input) {
+    if (input instanceof NamedReference) {
+      return Expressions.ref(SparkUtil.toColumnName((NamedReference) input));
+    } else if (input instanceof UserDefinedScalarFunc) {
+      return udfToTerm((UserDefinedScalarFunc) input);
+    } else {
+      return null;
+    }
+  }
+
+  @VisibleForTesting
+  @SuppressWarnings("unchecked")
+  static <T> UnboundTerm<T> udfToTerm(UserDefinedScalarFunc udf) {
+    switch (udf.name().toLowerCase(Locale.ROOT)) {
+      case "years":
+        Preconditions.checkArgument(
+            udf.children().length == 1, "years function should have only one 
children (column)");
+        if (isRef(udf.children()[0])) {
+          return year(SparkUtil.toColumnName((NamedReference) 
udf.children()[0]));
+        }
+        return null;
+      case "months":
+        Preconditions.checkArgument(
+            udf.children().length == 1, "months function should have only one 
children (column)");
+        if (isRef(udf.children()[0])) {
+          return month(SparkUtil.toColumnName((NamedReference) 
udf.children()[0]));
+        }
+        return null;
+      case "days":
+        Preconditions.checkArgument(
+            udf.children().length == 1, "days function should have only one 
children (column)");
+        if (isRef(udf.children()[0])) {
+          return day(SparkUtil.toColumnName((NamedReference) 
udf.children()[0]));
+        }
+        return null;
+      case "hours":
+        Preconditions.checkArgument(
+            udf.children().length == 1, "hours function should have only one 
children (colum)");
+        if (isRef(udf.children()[0])) {
+          return hour(SparkUtil.toColumnName((NamedReference) 
udf.children()[0]));
+        }
+        return null;
+      case "bucket":
+        Preconditions.checkArgument(
+            udf.children().length == 2,
+            "bucket function should have two children (numBuckets and 
column)");
+        if (isLiteral(udf.children()[0]) && isRef(udf.children()[1])) {
+          return bucket(
+              SparkUtil.toColumnName((NamedReference) udf.children()[1]),
+              convertLiteral((Literal<Integer>) udf.children()[0]));
+        }
+        return null;
+      case "truncate":
+        Preconditions.checkArgument(
+            udf.children().length == 2,
+            "truncate function should have two children (width and column)");
+        if (isLiteral(udf.children()[0]) && isRef(udf.children()[1])) {
+          return truncate(
+              SparkUtil.toColumnName((NamedReference) udf.children()[1]),
+              convertLiteral((Literal<Integer>) udf.children()[0]));
+        }
+        return null;
+      default:
+        return null;

Review Comment:
   This needs more whitespace to comply with style guidelines. Please add 
whitespace between control flow blocks and the following statement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #7886: Spark 3.4: Support pushing down system functions by V2 filters

Reply via email to