jiayuasu opened a new issue, #2939: URL: https://github.com/apache/sedona/issues/2939
Follow-up to the Phase 1 + Wave 1 Box2D work (#2877, #2925, #2926). ## Scope Extend Sedona's spatial join detection so a `JOIN ... ON ST_BoxIntersects(a.bbox, b.bbox)` (or `ST_BoxContains`) gets routed through the partitioned spatial join (broadcast index join, range join — whichever the existing optimizer picks for `ST_Intersects` on geometry columns). Today these predicates work as scalar filters but do not trigger any partitioning / index-based optimization on join, so two large bbox-bearing tables joined on `ST_BoxIntersects` would degrade to an O(N×M) cross product. The Box2D type was meant to make these joins cheaper, so this issue is the missing planner half. ## Why this matters Pre-computed bbox columns are a common pattern: extract a bbox once, then repeatedly join multiple datasets against it. Each join should: 1. Skip geometry deserialization on both sides (Box2D = 4 doubles, no JTS round-trip). 2. Use the existing R-tree / partitioner machinery — it already operates on bboxes internally; the work is at the predicate-recognition layer, not the index layer. ## Implementation outline - Find the rule that recognizes spatial join predicates today (likely `JoinQueryDetector` or a similar Catalyst rule) and add `ST_BoxIntersects` / `ST_BoxContains` to the recognized set. - Adapt the input plumbing so the join physical operator can extract `Box2D` envelopes directly without a Geometry deserialization step. - For `ST_BoxContains` joins, treat as the asymmetric-containment variant of the existing range join (matches the semantics of `ST_Contains(geom, geom)` join detection). - Mixed Box2D / Geometry join predicates wait on the implicit cast from #2927 or explicit overloads. ## Tests - Two `Box2D` columns joined with `ST_BoxIntersects` produces the correct result and uses a partitioned plan (verify via `explain()` not falling back to BroadcastNestedLoopJoin / SortMergeJoin without an inequality condition). - Same for `ST_BoxContains`. - Compare runtime against the equivalent `ST_Intersects(geom_a, geom_b)` join on the same data — should be at least as fast (typically faster because no geometry deserialization). ## Depends on - This issue. (Standalone — works on top of the existing Phase 1 + Wave 1 Box2D surface.) ## Out of scope - A specialized R-tree index keyed by `Box2D` (skip the JTS Envelope round-trip in the index itself). Tracked separately as a perf follow-up — only worth doing if profiling shows the round-trip is hot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
