jiayuasu opened a new issue, #2938: URL: https://github.com/apache/sedona/issues/2938
Follow-up to the Phase 1 + Wave 1 Box2D work (#2877, #2925, #2926). ## Scope Teach the GeoParquet predicate-pushdown machinery to recognize `ST_BoxIntersects(box_col, box_lit)` and `ST_BoxContains(box_col, box_lit)` and translate them into the same row-group / partition pruning path that already handles `ST_Intersects(geom_col, geom_lit)` against bbox covering columns. This is the highest-leverage piece of Phase 3 because it works on **existing GeoParquet 1.1 files** (which already carry bbox covering columns) without any other planner change. Users who pre-compute a `Box2D` column or read covering-column Parquet files get pruning for free. ## Implementation outline - Extend `SpatialFilterPushDownForGeoParquet` (or its modern equivalent) to recognize the new predicates with a literal Box2D RHS. - Convert the recognized predicate into the existing `GeoParquetSpatialFilter` shape — same `xmin/ymin/xmax/ymax` ranges that the geometry path produces, just sourced from the literal Box2D directly. - For `ST_BoxContains(box_col, box_lit)` the pruning is symmetric to intersection but tighter — covering cells that are not fully contained can be pruned for join keys but not for filters, so we likely should pushdown only `ST_BoxIntersects` initially and document `ST_BoxContains` as a non-pushdown predicate. (Alternative: pushdown both as conservative `ST_BoxIntersects` filters; refine in-memory.) ## Tests - DataFrame `WHERE ST_BoxIntersects(bbox_col, lit(some_box))` reads only the row groups whose bbox metadata overlaps the literal. - Same query against a file with no bbox covering metadata falls back cleanly (no pruning, but correct results). - NULL bbox literal short-circuits to no rows (or all rows, consistent with the existing geometry-side behavior). ## Pairs naturally with - **Reader auto-materialization** of GeoParquet bbox covering columns as `Box2D` (deferred from #2886). That makes `WHERE ST_BoxIntersects(box_col, lit(b))` the canonical way to express bbox-pruned reads — the typed column comes from disk, the predicate prunes the disk read. Worth scoping these together. ## Out of scope - Two-sided pushdown (`box_col_a` vs `box_col_b`) — that's the spatial-join planner work tracked separately. - Pushdown for `ST_BoxIntersects(geom_col, lit(box))` / mixed inputs — depends on the implicit cast from #2927 or explicit mixed overloads; revisit after. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
