stevenzwu commented on code in PR #14101:
URL: https://github.com/apache/iceberg/pull/14101#discussion_r2628488194


##########
api/src/main/java/org/apache/iceberg/expressions/Expression.java:
##########
@@ -44,6 +44,8 @@ enum Operation {
     OR,
     STARTS_WITH,
     NOT_STARTS_WITH,
+    ST_INTERSECTS,

Review Comment:
   nit: add to the end of the list?



##########
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java:
##########
@@ -1315,6 +1322,48 @@ private static VariantArray createArrayWithNestedTypes() 
{
     return (VariantArray) VariantValue.from(metadata, variantBB);
   }
 
+  private static Stream<Arguments> geospatialPredicateParameters() {
+    return Stream.of(
+        Arguments.of(Expression.Operation.ST_INTERSECTS, "geom"),
+        Arguments.of(Expression.Operation.ST_INTERSECTS, "geog"),
+        Arguments.of(Expression.Operation.ST_DISJOINT, "geom"),
+        Arguments.of(Expression.Operation.ST_DISJOINT, "geog"));
+  }
+
+  @ParameterizedTest
+  @MethodSource("geospatialPredicateParameters")
+  public void testSanitizeGeospatialPredicates(Expression.Operation operation, 
String columnName) {
+    // Create a schema with geometry and geography fields
+    Schema geoSchema =

Review Comment:
   nit: make this a static constant?



##########
api/src/test/java/org/apache/iceberg/types/TestConversions.java:
##########
@@ -45,6 +47,7 @@
 public class TestConversions {
 
   @Test
+  @SuppressWarnings("MethodLength")

Review Comment:
   nit: I don't know why we put all those tests in one giant test method. For 
the new geo byte buffer conversion, maybe we can put them into a separate test 
method.



##########
api/src/test/java/org/apache/iceberg/expressions/TestPredicateBinding.java:
##########
@@ -648,4 +651,53 @@ public void 
testNotInPredicateBindingConversionToExpression() {
         .as("Should change NOT_IN to alwaysTrue expression")
         .isEqualTo(Expressions.alwaysTrue());
   }
+
+  @Test
+  public void testGeospatialPredicateBinding() {
+    StructType struct =
+        StructType.of(
+            required(20, "geom", Types.GeometryType.crs84()),
+            required(21, "geog", Types.GeographyType.crs84()));
+
+    // Create a bounding box for testing
+    GeospatialBound min = GeospatialBound.createXY(1.0, 2.0);
+    GeospatialBound max = GeospatialBound.createXY(3.0, 4.0);
+    BoundingBox bbox = new BoundingBox(min, max);
+
+    // Test ST_INTERSECTS with geometry
+    UnboundPredicate<ByteBuffer> stIntersectsGeom =
+        Expressions.geospatialPredicate(Expression.Operation.ST_INTERSECTS, 
"geom", bbox);

Review Comment:
   wondering if we should make `Expressions.geospatialPredicate` private or 
package private. it seems to be only used by test code. 
   
   - In some tests, It can be simply replaced by calling the 
`Expressions.stIntersects` public method. 
   - Some have parameterized tests (like `TestExpressionUtil`). wondering if 
the parameters can be replaced as `UnboundPredicate` type.
   
   The goal is to expose minimal scope especially for `api` module.



##########
api/src/main/java/org/apache/iceberg/expressions/UnboundPredicate.java:
##########
@@ -284,6 +287,14 @@ public String toString() {
         return term() + " startsWith \"" + literal() + "\"";
       case NOT_STARTS_WITH:
         return term() + " notStartsWith \"" + literal() + "\"";
+      case ST_INTERSECTS:
+      case ST_DISJOINT:
+        {
+          ByteBuffer serializedBoundingBox = (ByteBuffer) literal().value();

Review Comment:
   `BoundingBoxLiteral` should probably implement `toString` method to simplify 
the code here.



##########
api/src/main/java/org/apache/iceberg/expressions/Evaluator.java:
##########
@@ -156,5 +157,15 @@ public <T> Boolean startsWith(Bound<T> valueExpr, 
Literal<T> lit) {
     public <T> Boolean notStartsWith(Bound<T> valueExpr, Literal<T> lit) {
       return !startsWith(valueExpr, lit);
     }
+
+    @Override
+    public <T> Boolean stIntersects(Bound<T> valueExpr, Literal<ByteBuffer> 
literal) {
+      throw new UnsupportedOperationException("Evaluation of stIntersects is 
not implemented.");

Review Comment:
   should this class change be postponed to the next PR?



##########
core/src/main/java/org/apache/iceberg/expressions/ExpressionParser.java:
##########
@@ -229,6 +242,44 @@ private void unboundLiteral(Object object) throws 
IOException {
       }
     }
 
+    private void geospatialBoundingBox(BoundingBox value) throws IOException {

Review Comment:
   maybe move the bounding box code to a separate package private 
`BoundingBoxParser` class?



##########
api/src/test/java/org/apache/iceberg/expressions/TestEvaluator.java:
##########
@@ -809,4 +816,35 @@ public void testNotInExceptions() {
         .isInstanceOf(ValidationException.class)
         .hasMessageContaining("Invalid value for conversion to type int");
   }
+
+  private static Stream<Arguments> geospatialPredicateParameters() {
+    return Stream.of(
+        Arguments.of(Expression.Operation.ST_INTERSECTS, "geom"),

Review Comment:
   also let's use the full name like `geometry` and `geography`. The current 
shorter name doesn't seem like commonly recognizable. 



##########
core/src/main/java/org/apache/iceberg/expressions/ExpressionParser.java:
##########
@@ -159,8 +162,13 @@ public <T> Void predicate(BoundPredicate<T> pred) {
 
             if (pred.isLiteralPredicate()) {
               gen.writeFieldName(VALUE);
-              SingleValueParser.toJson(
-                  pred.term().type(), 
pred.asLiteralPredicate().literal().value(), gen);
+              if (pred.op() == Operation.ST_INTERSECTS || pred.op() == 
Operation.ST_DISJOINT) {

Review Comment:
   since bounding box literal is a byte buffer type, it can be json serialized 
as as binary types. I guess the special handling here is to make the json 
serialization more human readable.



##########
api/src/test/java/org/apache/iceberg/expressions/TestExpressionUtil.java:
##########
@@ -1315,6 +1322,48 @@ private static VariantArray createArrayWithNestedTypes() 
{
     return (VariantArray) VariantValue.from(metadata, variantBB);
   }
 
+  private static Stream<Arguments> geospatialPredicateParameters() {
+    return Stream.of(
+        Arguments.of(Expression.Operation.ST_INTERSECTS, "geom"),
+        Arguments.of(Expression.Operation.ST_INTERSECTS, "geog"),
+        Arguments.of(Expression.Operation.ST_DISJOINT, "geom"),
+        Arguments.of(Expression.Operation.ST_DISJOINT, "geog"));
+  }
+
+  @ParameterizedTest
+  @MethodSource("geospatialPredicateParameters")
+  public void testSanitizeGeospatialPredicates(Expression.Operation operation, 
String columnName) {
+    // Create a schema with geometry and geography fields
+    Schema geoSchema =
+        new Schema(
+            Types.NestedField.required(1, "geom", Types.GeometryType.crs84()),
+            Types.NestedField.required(2, "geog", 
Types.GeographyType.crs84()));
+    Types.StructType geoStruct = geoSchema.asStruct();
+
+    // Create a bounding box for testing
+    GeospatialBound min = GeospatialBound.createXY(1.0, 2.0);
+    GeospatialBound max = GeospatialBound.createXY(3.0, 4.0);
+    BoundingBox bbox = new BoundingBox(min, max);
+
+    UnboundPredicate<ByteBuffer> geoPredicate =
+        Expressions.geospatialPredicate(operation, columnName, bbox);
+    Expression predicateSanitized = Expressions.predicate(operation, 
columnName, "(boundingbox)");

Review Comment:
   the 3rd arg string of `(boundingbox)` is a bit weird to me. shouldn't it be 
a byte buffer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to