jiayuasu commented on code in PR #2946:
URL: https://github.com/apache/sedona/pull/2946#discussion_r3223997210


##########
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetFileFormat.scala:
##########
@@ -273,6 +273,15 @@ class GeoParquetFileFormat(val spatialFilter: 
Option[GeoParquetSpatialFilter])
         None
       }
 
+      // Spatial filters that translate to Parquet row-group predicates (e.g. 
Box2D bounds
+      // comparisons on a Box2D-typed column) are AND'd into the pushed-down 
filter so Parquet
+      // can skip row groups whose column statistics disprove them.
+      val combinedPushed = spatialFilter.flatMap(_.toParquetFilter) match {
+        case Some(spatialPredicate) =>
+          Some(pushed.fold(spatialPredicate)(p => FilterApi.and(p, 
spatialPredicate)))
+        case None => pushed
+      }

Review Comment:
   Fixed in 0d69d507 — gated the spatial Parquet predicate on 
enableParquetFilterPushDown, so disabling spark.sql.parquet.filterPushdown also 
disables Sedona-injected row-group predicates.



##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/optimization/SpatialFilterPushDownForGeoParquet.scala:
##########
@@ -256,6 +278,16 @@ class SpatialFilterPushDownForGeoParquet(sparkSession: 
SparkSession) extends Rul
     parseColumnPath(name).mkString(".")
   }
 
+  /**
+   * Extract a [[Box2D]] from a Catalyst literal value. Box2DUDT serializes to 
an InternalRow of
+   * four doubles; if the value is something else, the predicate is not 
pushable.
+   */
+  private def extractBox2DLiteral(value: Any): Option[Box2D] = value match {
+    case row: InternalRow if row.numFields == 4 =>
+      Some(new Box2D(row.getDouble(0), row.getDouble(1), row.getDouble(2), 
row.getDouble(3)))

Review Comment:
   Fixed in 0d69d507 — extractBox2DLiteral now returns None for inverted 
bounds, so the predicate falls back to per-row evaluation and surfaces the 
IllegalArgumentException from Predicates.requireOrderedPlanarBox. Added a test 
that asserts the predicate is not pushed and that collect() raises 
IllegalArgumentException in the cause chain.



##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/optimization/SpatialFilterPushDownForGeoParquet.scala:
##########
@@ -144,6 +148,24 @@ class SpatialFilterPushDownForGeoParquet(sparkSession: 
SparkSession) extends Rul
             SpatialPredicate.INTERSECTS,
             GeometryUDT.deserialize(value))
 
+      // Box2D predicates push down as Parquet row-group filters on the Box2D 
column's underlying
+      // (xmin, ymin, xmax, ymax) double leaves. Pruning is done by Parquet's 
stats-based skipping
+      // against the column's recorded min/max, which is sound regardless of 
how the writer chose
+      // the per-row Box2D values.
+      case ST_BoxIntersects(_) =>
+        // Intersects is symmetric — both argument orders produce the same 
predicate.

Review Comment:
   The PR title and body were updated after this review fired — the description 
no longer references the removed opt-in conf or geometry-bbox metadata pruning, 
and now documents the Parquet row-group statistics path that is actually 
implemented.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to