Copilot commented on code in PR #2966: URL: https://github.com/apache/sedona/pull/2966#discussion_r3255758482
########## docs/api/sql/Optimizer.md: ########## @@ -416,3 +416,53 @@ We can compare the metrics of querying the GeoParquet dataset with or without th |  |  | Spatial predicate push-down to GeoParquet is enabled by default. Users can manually disable it by setting the Spark configuration `spark.sedona.geoparquet.spatialFilterPushDown` to `false`. + +## Box2D filter pushdown + +When a query filters on a `Box2D` column (see [Box2D Functions](box2d/Box2D-Functions.md)) using `ST_BoxIntersects` or `ST_BoxContains` against a literal `Box2D`, Sedona translates the predicate into Parquet row-group inequalities on the column's underlying `xmin` / `ymin` / `xmax` / `ymax` leaves and pushes them down via `ParquetInputFormat.setFilterPredicate`. Parquet's row-group statistics machinery then skips row groups whose recorded min/max disprove the predicate — no file metadata scan is required. + +This works for any writer that produces a `Box2D` column (including the `<geom>_bbox` covering column auto-generated by Sedona when writing GeoParquet 1.1), because the pruning operates on the actual stored values' statistics rather than on a separate geometry-column bbox. Review Comment: This overstates the supported source of Box2D pushdown. The auto-generated GeoParquet `<geom>_bbox` column is written as a plain `StructType` (`GeoParquetWriteSupport` generates `struct<xmin,ymin,xmax,ymax>`), while `ST_BoxIntersects` / `ST_BoxContains` require `Box2DUDT` inputs. As written, users may expect these predicates to work directly on the auto-generated covering column after reading a GeoParquet file, but that column is not a `Box2D` unless it was explicitly created as one (for example with `ST_Box2D`) before writing. ########## docs/api/sql/box2d/Box2D-Functions.md: ########## @@ -0,0 +1,82 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Box2D Functions + +The `Box2D` type in Sedona represents a planar axis-aligned bounding box — a rectangle described by four `Double` values: `xmin`, `ymin`, `xmax`, `ymax`. It is a first-class SQL type backed by a Spark UDT and serialises to a struct of four non-nullable doubles, so columns of `Box2D` round-trip natively through Parquet and align with GeoParquet 1.1 bbox covering columns. + +`Box2D` complements the [Geometry](../Geometry-Functions.md) and [Geography](../geography/Geography-Functions.md) types. Use it when you need a compact, comparable bounding rectangle — for example, as a covering column on a GeoParquet table that lets the reader prune row groups, or as the join key in a spatial join that only needs an envelope-level match. + +## Semantic notes + +- `Box2D` values use closed-interval semantics: edge-touching boxes are considered intersecting and (per [ST_BoxContains](Box2D-Predicates/ST_BoxContains.md)) contained. +- Absence is represented by SQL `NULL` rather than an in-band sentinel. +- Bounds are required to be ordered (`xmin <= xmax`, `ymin <= ymax`). Inverted-bound values are reserved for a future antimeridian-wraparound semantics on geography bboxes; predicates and join planning throw `IllegalArgumentException` on inverted input today. +- Unlike [ST_Envelope](../Bounding-Box-Functions/ST_Envelope.md), which returns a `Geometry` polygon, [ST_Box2D](Box2D-Constructors/ST_Box2D.md) returns a typed `Box2D` value. Prefer the typed form when downstream code only needs the four bounds, and prefer the polygon when downstream code expects a `Geometry`. Review Comment: `ST_Envelope` does not always return a polygon: Sedona delegates to JTS `geometry.getEnvelope()`, so degenerate inputs can return a `POINT` or `LINESTRING` (and empty geometries preserve their type). Describing it as a Geometry polygon makes the contrast with `ST_Box2D` inaccurate for point, line, and empty inputs. ########## mkdocs.yml: ########## @@ -75,6 +75,7 @@ nav: - Quick start: api/sql/Overview.md - Vector data: - Geometry Functions: api/sql/Geometry-Functions.md + - Box2D Functions: api/sql/box2d/Box2D-Functions.md Review Comment: This new navigation label has no corresponding entry under the Chinese `nav_translations` block, while the neighboring SQL Vector data labels (`Geometry Functions`, `Geography Functions`, etc.) are translated there. The zh build will therefore show this new item in English even though the rest of this navigation group is localized. ########## docs/api/sql/box2d/Box2D-Constructors/ST_Box2D.md: ########## @@ -0,0 +1,46 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# ST_Box2D + +Introduction: Return the planar bounding box of a Geometry as a typed `Box2D` value (four doubles: `xmin`, `ymin`, `xmax`, `ymax`). + +`ST_Box2D` is the typed counterpart to [ST_Envelope](../../Bounding-Box-Functions/ST_Envelope.md). `ST_Envelope` returns a `Geometry` polygon; `ST_Box2D` returns a `Box2D` value that serialises to a struct of four non-nullable doubles and round-trips through Parquet without WKB overhead. Review Comment: `ST_Envelope` can return lower-dimensional geometries for degenerate inputs (for example a point envelope for a point, or a line envelope for a vertical/horizontal line), not only polygons. This sentence should describe it as returning a `Geometry` envelope rather than a polygon to avoid contradicting Sedona's `ST_Envelope` behavior. ########## docs/api/sql/box2d/Box2D-Functions.md: ########## @@ -0,0 +1,82 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# Box2D Functions + +The `Box2D` type in Sedona represents a planar axis-aligned bounding box — a rectangle described by four `Double` values: `xmin`, `ymin`, `xmax`, `ymax`. It is a first-class SQL type backed by a Spark UDT and serialises to a struct of four non-nullable doubles, so columns of `Box2D` round-trip natively through Parquet and align with GeoParquet 1.1 bbox covering columns. + +`Box2D` complements the [Geometry](../Geometry-Functions.md) and [Geography](../geography/Geography-Functions.md) types. Use it when you need a compact, comparable bounding rectangle — for example, as a covering column on a GeoParquet table that lets the reader prune row groups, or as the join key in a spatial join that only needs an envelope-level match. + +## Semantic notes + +- `Box2D` values use closed-interval semantics: edge-touching boxes are considered intersecting and (per [ST_BoxContains](Box2D-Predicates/ST_BoxContains.md)) contained. +- Absence is represented by SQL `NULL` rather than an in-band sentinel. +- Bounds are required to be ordered (`xmin <= xmax`, `ymin <= ymax`). Inverted-bound values are reserved for a future antimeridian-wraparound semantics on geography bboxes; predicates and join planning throw `IllegalArgumentException` on inverted input today. +- Unlike [ST_Envelope](../Bounding-Box-Functions/ST_Envelope.md), which returns a `Geometry` polygon, [ST_Box2D](Box2D-Constructors/ST_Box2D.md) returns a typed `Box2D` value. Prefer the typed form when downstream code only needs the four bounds, and prefer the polygon when downstream code expects a `Geometry`. + +## Box2D Constructors + +| Function | Return type | Description | Since | +| :--- | :--- | :--- | :--- | +| [ST_Box2D](Box2D-Constructors/ST_Box2D.md) | Box2D | Return the planar bounding box of a Geometry as a Box2D. | v1.9.1 | +| [ST_MakeBox2D](Box2D-Constructors/ST_MakeBox2D.md) | Box2D | Build a Box2D from two corner POINT geometries. | v1.9.1 | +| [ST_GeomFromBox2D](Box2D-Constructors/ST_GeomFromBox2D.md) | Geometry | Convert a Box2D to a closed rectangular polygon Geometry (degenerate boxes return a Point or LineString). | v1.9.1 | + +## Box2D Accessors + +| Function | Return type | Description | Since | +| :--- | :--- | :--- | :--- | +| [ST_XMin](Box2D-Accessors/ST_XMin.md) | Double | Return the minimum X coordinate of a Box2D. | v1.9.1 | +| [ST_YMin](Box2D-Accessors/ST_YMin.md) | Double | Return the minimum Y coordinate of a Box2D. | v1.9.1 | +| [ST_XMax](Box2D-Accessors/ST_XMax.md) | Double | Return the maximum X coordinate of a Box2D. | v1.9.1 | +| [ST_YMax](Box2D-Accessors/ST_YMax.md) | Double | Return the maximum Y coordinate of a Box2D. | v1.9.1 | + +The same `ST_XMin` / `ST_YMin` / `ST_XMax` / `ST_YMax` functions also accept `Geometry` inputs — see [Bounding Box Functions](../Geometry-Functions.md#bounding-box-functions). + +## Box2D Predicates + +| Function | Return type | Description | Since | +| :--- | :--- | :--- | :--- | +| [ST_BoxIntersects](Box2D-Predicates/ST_BoxIntersects.md) | Boolean | Closed-interval bbox intersection over two Box2D arguments. Matches PostGIS `&&` on `box2d`. | v1.9.1 | +| [ST_BoxContains](Box2D-Predicates/ST_BoxContains.md) | Boolean | Closed-interval bbox containment over two Box2D arguments. Matches PostGIS `~` on `box2d`. | v1.9.1 | + +## Box2D Functions + +| Function | Return type | Description | Since | +| :--- | :--- | :--- | :--- | +| [ST_Expand](Box2D-Functions/ST_Expand.md) | Box2D | Expand a Box2D by a per-axis or uniform delta. | v1.9.1 | +| [ST_AsText](Box2D-Functions/ST_AsText.md) | String | Return the `BOX(xmin ymin, xmax ymax)` text representation of a Box2D. | v1.9.1 | + Review Comment: The Box2D surface documented here omits `ST_Extent`, which is registered as a SQL aggregate and returns a `Box2D` for a geometry column. Without an aggregate entry (or link to an aggregate page), users won't discover one of the public Box2D-producing functions from the new Box2D reference page. ########## docs/api/sql/Optimizer.md: ########## @@ -416,3 +416,53 @@ We can compare the metrics of querying the GeoParquet dataset with or without th |  |  | Spatial predicate push-down to GeoParquet is enabled by default. Users can manually disable it by setting the Spark configuration `spark.sedona.geoparquet.spatialFilterPushDown` to `false`. + +## Box2D filter pushdown + +When a query filters on a `Box2D` column (see [Box2D Functions](box2d/Box2D-Functions.md)) using `ST_BoxIntersects` or `ST_BoxContains` against a literal `Box2D`, Sedona translates the predicate into Parquet row-group inequalities on the column's underlying `xmin` / `ymin` / `xmax` / `ymax` leaves and pushes them down via `ParquetInputFormat.setFilterPredicate`. Parquet's row-group statistics machinery then skips row groups whose recorded min/max disprove the predicate — no file metadata scan is required. + +This works for any writer that produces a `Box2D` column (including the `<geom>_bbox` covering column auto-generated by Sedona when writing GeoParquet 1.1), because the pruning operates on the actual stored values' statistics rather than on a separate geometry-column bbox. + +SQL Example + +```sql +SELECT * +FROM geoparquet_dataset +WHERE ST_BoxIntersects( + geom_bbox, + ST_MakeBox2D(ST_Point(0.0, 0.0), ST_Point(10.0, 10.0))) +``` + +Predicate types and the per-row inequality system they translate to: + +| Predicate | Pushed-down conjunction (per row) | +| ------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- | +| `ST_BoxIntersects(box_col, lit)` | `box.xmax >= lit.xmin AND box.xmin <= lit.xmax AND box.ymax >= lit.ymin AND box.ymin <= lit.ymax` (symmetric — reverse arg order is identical) | +| `ST_BoxContains(box_col, lit)` | `box.xmin <= lit.xmin AND box.xmax >= lit.xmax AND box.ymin <= lit.ymin AND box.ymax >= lit.ymax` | +| `ST_BoxContains(lit, box_col)` | `box.xmin >= lit.xmin AND box.xmax <= lit.xmax AND box.ymin >= lit.ymin AND box.ymax <= lit.ymax` | + +Pushdown is enabled by default; it is gated by the same Spark setting as ordinary Parquet predicate pushdown (`spark.sql.parquet.filterPushdown`). Inverted-bound literals (`xmin > xmax` / `ymin > ymax`) are not pushed down — the predicate falls back to per-row evaluation so callers see the expected `IllegalArgumentException` from the scalar contract. Review Comment: This only mentions `spark.sql.parquet.filterPushdown`, but the optimizer rule that attaches Box2D spatial filters is also gated by `spark.sedona.geoparquet.spatialFilterPushDown`. If users disable Sedona spatial filter pushdown, no Box2D Parquet predicate is injected even when Spark's Parquet filter pushdown remains enabled, so this sentence should document both controls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
