jiayuasu opened a new pull request, #2953:
URL: https://github.com/apache/sedona/pull/2953

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes 
#2939.
   
   ## What changes were proposed in this PR?
   
   Teaches Sedona's spatial join planner about the Box2D predicates from #2926. 
After this PR, both broadcast index joins and distributed range joins handle:
   
   - `ST_BoxIntersects(box_a, box_b)`
   - `ST_BoxContains(box_a, box_b)` (both argument orders)
   
   …using the same machinery (partitioner, R-tree, refine evaluator) that 
already powers `ST_Intersects` / `ST_Contains` joins. No new physical operator, 
no new index implementation, no new partitioning code.
   
   ### How it works
   
   At every executor boundary where a shape column is materialised — 
`TraitJoinQueryBase.toSpatialRDD`, `TraitJoinQueryBase.toExpandedEnvelopeRDD`, 
and `BroadcastIndexJoinExec.createStreamShapes` — dispatch on the shape 
expression's `dataType`. If it is `Box2DUDT`, read the four doubles out of the 
serialized `InternalRow` and materialise the implied closed rectangular Polygon 
via `Constructors.polygonFromEnvelope`. Geometry columns continue through 
`GeometrySerializer.deserialize` as before.
   
   The materialised Polygon flows through `SpatialRDD<T extends Geometry>`, the 
spatial partitioner's sample step, `IndexBuilder`'s R-tree, and 
`SpatialPredicateEvaluators` unchanged. JTS already short-circuits axis-aligned 
rectangle predicates via `RectangleIntersects` / `RectangleContains` (gated on 
`Polygon.isRectangle()`), which `polygonFromEnvelope` produces exactly. The 
refine step therefore pays only a four-double envelope comparison per pair — 
the savings the user expects from "we know the data is a box".
   
   ### Predicate-kind mapping
   
   | Source predicate | `SpatialPredicate` |
   | ---------------- | ------------------ |
   | `ST_BoxIntersects(a, b)` | `INTERSECTS` |
   | `ST_BoxContains(a, b)`   | `COVERS` |
   
   `ST_BoxContains` deliberately maps to `COVERS`, not `CONTAINS`. 
PostGIS-style closed-interval containment counts edge-touching pairs as 
contained; JTS `Geometry.contains` excludes shared-edge pairs (strict 
interior), whereas `Geometry.covers` accepts them — which is what we want.
   
   ### What's left untouched
   
   - The `Box2D` class hierarchy (still a value class).
   - Storage / on-wire layout (still a struct of four doubles via `Box2DUDT`).
   - `SpatialRDD`, the partitioners, the R-tree, the refine evaluator.
   - The raster join path and all Geography join paths.
   
   ### Scope notes
   
   `ST_DWithin` (distance join) only has `(Geometry, Geometry, distance)` and 
`(Geography, Geography, distance)` overloads today, so Box2D × Box2D distance 
joins are out of scope for this PR. Adding a `(Box2D, Box2D, distance)` 
overload is a follow-up.
   
   ## How was this patch tested?
   
   New `Box2DJoinSuite` under 
`spark/common/src/test/scala/org/apache/sedona/sql/`:
   
   - `ST_BoxIntersects` broadcast index join produces the expected 4 pairs from 
a 3×3 fixture, with `BroadcastIndexJoinExec` in the plan.
   - `ST_BoxIntersects(R, L)` produces the same 4 pairs (argument order 
symmetric).
   - `ST_BoxContains` broadcast index join produces the expected 2 
closed-interval containment pairs.
   - Edge-touching containment (closed-interval) is counted — locks in the 
`COVERS`-not-`CONTAINS` mapping.
   - Non-broadcast range join produces the same 4 pairs with `RangeJoinExec` in 
the plan.
   - Result is equivalent (same ordered row pairs) to `ST_Intersects` on the 
same data materialised as polygons via `ST_GeomFromBox2D`.
   
   Run locally against Spark 3.5 / Scala 2.12. Regression runs: 
`BroadcastIndexJoinSuite` 65/65, `SpatialJoinSuite` 160/160, `Box2DUDTSuite` 
5/5, `Box2DCastResolutionRuleSuite` 3/3, `GeoParquetSpatialFilterPushDownSuite` 
25/25 — all still pass.
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public SQL API documentation surface in 
isolation. The new spatial-join behavior for `ST_BoxIntersects` / 
`ST_BoxContains` is covered by the consolidated Phase 1+2+3 Box2D docs update.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to