Copilot commented on code in PR #2801:
URL: https://github.com/apache/sedona/pull/2801#discussion_r3006872471


##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala:
##########
@@ -786,6 +786,9 @@ class JoinQueryDetector(sparkSession: SparkSession) extends 
SparkStrategy {
         case (Some(_), _, true, _, false) => "ST_Distance (Geography) <"
         case (None, _, false, _, false) => s"ST_$spatialPredicate"
         case (None, _, false, _, true) => s"RS_$spatialPredicate"
+        case (None, _, true, _, false) => s"ST_$spatialPredicate (Geography)"
+        case (None, _, true, _, true) => s"RS_$spatialPredicate"
+        case (Some(_), _, _, _, true) => s"RS_$spatialPredicate"
       }

Review Comment:
   These new match arms make raster predicates with `distance.isDefined` and/or 
`isGeography=true` appear supported (avoids a `MatchError`), but the broadcast 
join implementation still assumes geometry serialization when `distance` is 
present. That combination can lead to runtime failures (raster bytes fed into 
`GeometrySerializer.deserialize`) or incorrect semantics. Consider adding an 
explicit guard before planning (e.g., reject `distance` joins when 
`isRasterPredicate` is true) so users get a clear error instead of a later 
deserialization failure.



##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/BroadcastIndexJoinExec.scala:
##########
@@ -118,6 +118,7 @@ case class BroadcastIndexJoinExec(
     case (Some(r), _, false) => s"ST_Distance($windowExpression, 
$objectExpression) < $r"
     case (None, _, false) => s"ST_$spatialPredicate($windowExpression, 
$objectExpression)"
     case (None, _, true) => s"RS_$spatialPredicate($windowExpression, 
$objectExpression)"
+    case (Some(r), _, true) => s"RS_$spatialPredicate($windowExpression, 
$objectExpression)"
   }

Review Comment:
   `distance` + `isRasterPredicate=true` is treated as a valid combination 
here, but the execution path still assumes geometry bytes when `distance` is 
defined (e.g., `createStreamShapes` deserializes with 
`GeometrySerializer.deserialize`, and 
`SpatialIndexExec`/`TraitJoinQueryBase.toExpandedEnvelopeRDD` does the same). 
If a raster is on the side being deserialized, this will fail at runtime. 
Consider explicitly rejecting distance joins when either side is `RasterUDT` 
(fail fast with a clear error), or implement raster-aware distance expansion 
(e.g., expand the WGS84 envelope) so the join can execute correctly. Also, if 
distance joins are supported for rasters, the `spatialExpression` string 
currently drops `$r`, which makes plans misleading.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to