jiayuasu commented on code in PR #2966:
URL: https://github.com/apache/sedona/pull/2966#discussion_r3256933363
##########
docs/api/sql/Optimizer.md:
##########
@@ -416,3 +416,53 @@ We can compare the metrics of querying the GeoParquet
dataset with or without th
|  |  |
Spatial predicate push-down to GeoParquet is enabled by default. Users can
manually disable it by setting the Spark configuration
`spark.sedona.geoparquet.spatialFilterPushDown` to `false`.
+
+## Box2D filter pushdown
+
+When a query filters on a `Box2D` column (see [Box2D
Functions](box2d/Box2D-Functions.md)) using `ST_BoxIntersects` or
`ST_BoxContains` against a literal `Box2D`, Sedona translates the predicate
into Parquet row-group inequalities on the column's underlying `xmin` / `ymin`
/ `xmax` / `ymax` leaves and pushes them down via
`ParquetInputFormat.setFilterPredicate`. Parquet's row-group statistics
machinery then skips row groups whose recorded min/max disprove the predicate —
no file metadata scan is required.
+
+This works for any writer that produces a `Box2D` column (including the
`<geom>_bbox` covering column auto-generated by Sedona when writing GeoParquet
1.1), because the pruning operates on the actual stored values' statistics
rather than on a separate geometry-column bbox.
Review Comment:
Fixed in 3ef5ef5e — rewrote the paragraph to clarify that the pushdown
applies to Box2DUDT-typed columns (obtained via ST_Box2D(geom) or the SQL
cast). The auto-generated <geom>_bbox column is a plain
struct<xmin,ymin,xmax,ymax>; it satisfies the GeoParquet covering contract but
is not a Box2D, so these predicates don't target it directly — users on that
column rely on the existing file-metadata pushdown described in the previous
section.
##########
docs/api/sql/Optimizer.md:
##########
@@ -416,3 +416,53 @@ We can compare the metrics of querying the GeoParquet
dataset with or without th
|  |  |
Spatial predicate push-down to GeoParquet is enabled by default. Users can
manually disable it by setting the Spark configuration
`spark.sedona.geoparquet.spatialFilterPushDown` to `false`.
+
+## Box2D filter pushdown
+
+When a query filters on a `Box2D` column (see [Box2D
Functions](box2d/Box2D-Functions.md)) using `ST_BoxIntersects` or
`ST_BoxContains` against a literal `Box2D`, Sedona translates the predicate
into Parquet row-group inequalities on the column's underlying `xmin` / `ymin`
/ `xmax` / `ymax` leaves and pushes them down via
`ParquetInputFormat.setFilterPredicate`. Parquet's row-group statistics
machinery then skips row groups whose recorded min/max disprove the predicate —
no file metadata scan is required.
+
+This works for any writer that produces a `Box2D` column (including the
`<geom>_bbox` covering column auto-generated by Sedona when writing GeoParquet
1.1), because the pruning operates on the actual stored values' statistics
rather than on a separate geometry-column bbox.
+
+SQL Example
+
+```sql
+SELECT *
+FROM geoparquet_dataset
+WHERE ST_BoxIntersects(
+ geom_bbox,
+ ST_MakeBox2D(ST_Point(0.0, 0.0), ST_Point(10.0, 10.0)))
+```
+
+Predicate types and the per-row inequality system they translate to:
+
+| Predicate | Pushed-down conjunction (per row)
|
+| ------------------------------------ |
--------------------------------------------------------------------------------------------------------------------------------------------
|
+| `ST_BoxIntersects(box_col, lit)` | `box.xmax >= lit.xmin AND box.xmin <=
lit.xmax AND box.ymax >= lit.ymin AND box.ymin <= lit.ymax` (symmetric —
reverse arg order is identical) |
+| `ST_BoxContains(box_col, lit)` | `box.xmin <= lit.xmin AND box.xmax >=
lit.xmax AND box.ymin <= lit.ymin AND box.ymax >= lit.ymax`
|
+| `ST_BoxContains(lit, box_col)` | `box.xmin >= lit.xmin AND box.xmax <=
lit.xmax AND box.ymin >= lit.ymin AND box.ymax <= lit.ymax`
|
+
+Pushdown is enabled by default; it is gated by the same Spark setting as
ordinary Parquet predicate pushdown (`spark.sql.parquet.filterPushdown`).
Inverted-bound literals (`xmin > xmax` / `ymin > ymax`) are not pushed down —
the predicate falls back to per-row evaluation so callers see the expected
`IllegalArgumentException` from the scalar contract.
Review Comment:
Fixed in 3ef5ef5e — documented both flags:
spark.sedona.geoparquet.spatialFilterPushDown gates rule attachment,
spark.sql.parquet.filterPushdown gates Parquet honouring it. Disabling either
disables Box2D pushdown.
##########
docs/api/sql/box2d/Box2D-Functions.md:
##########
@@ -0,0 +1,82 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Box2D Functions
+
+The `Box2D` type in Sedona represents a planar axis-aligned bounding box — a
rectangle described by four `Double` values: `xmin`, `ymin`, `xmax`, `ymax`. It
is a first-class SQL type backed by a Spark UDT and serialises to a struct of
four non-nullable doubles, so columns of `Box2D` round-trip natively through
Parquet and align with GeoParquet 1.1 bbox covering columns.
+
+`Box2D` complements the [Geometry](../Geometry-Functions.md) and
[Geography](../geography/Geography-Functions.md) types. Use it when you need a
compact, comparable bounding rectangle — for example, as a covering column on a
GeoParquet table that lets the reader prune row groups, or as the join key in a
spatial join that only needs an envelope-level match.
+
+## Semantic notes
+
+- `Box2D` values use closed-interval semantics: edge-touching boxes are
considered intersecting and (per
[ST_BoxContains](Box2D-Predicates/ST_BoxContains.md)) contained.
+- Absence is represented by SQL `NULL` rather than an in-band sentinel.
+- Bounds are required to be ordered (`xmin <= xmax`, `ymin <= ymax`).
Inverted-bound values are reserved for a future antimeridian-wraparound
semantics on geography bboxes; predicates and join planning throw
`IllegalArgumentException` on inverted input today.
+- Unlike [ST_Envelope](../Bounding-Box-Functions/ST_Envelope.md), which
returns a `Geometry` polygon, [ST_Box2D](Box2D-Constructors/ST_Box2D.md)
returns a typed `Box2D` value. Prefer the typed form when downstream code only
needs the four bounds, and prefer the polygon when downstream code expects a
`Geometry`.
Review Comment:
Fixed in 3ef5ef5e — softened the contrast: ST_Envelope returns the envelope
as a Geometry (typically a polygon, but Point/LineString for degenerate
inputs), while ST_Box2D always returns a typed Box2D.
##########
docs/api/sql/box2d/Box2D-Constructors/ST_Box2D.md:
##########
@@ -0,0 +1,46 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# ST_Box2D
+
+Introduction: Return the planar bounding box of a Geometry as a typed `Box2D`
value (four doubles: `xmin`, `ymin`, `xmax`, `ymax`).
+
+`ST_Box2D` is the typed counterpart to
[ST_Envelope](../../Bounding-Box-Functions/ST_Envelope.md). `ST_Envelope`
returns a `Geometry` polygon; `ST_Box2D` returns a `Box2D` value that
serialises to a struct of four non-nullable doubles and round-trips through
Parquet without WKB overhead.
Review Comment:
Fixed in 3ef5ef5e — same softening on ST_Box2D.md.
##########
docs/api/sql/box2d/Box2D-Functions.md:
##########
@@ -0,0 +1,82 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Box2D Functions
+
+The `Box2D` type in Sedona represents a planar axis-aligned bounding box — a
rectangle described by four `Double` values: `xmin`, `ymin`, `xmax`, `ymax`. It
is a first-class SQL type backed by a Spark UDT and serialises to a struct of
four non-nullable doubles, so columns of `Box2D` round-trip natively through
Parquet and align with GeoParquet 1.1 bbox covering columns.
+
+`Box2D` complements the [Geometry](../Geometry-Functions.md) and
[Geography](../geography/Geography-Functions.md) types. Use it when you need a
compact, comparable bounding rectangle — for example, as a covering column on a
GeoParquet table that lets the reader prune row groups, or as the join key in a
spatial join that only needs an envelope-level match.
+
+## Semantic notes
+
+- `Box2D` values use closed-interval semantics: edge-touching boxes are
considered intersecting and (per
[ST_BoxContains](Box2D-Predicates/ST_BoxContains.md)) contained.
+- Absence is represented by SQL `NULL` rather than an in-band sentinel.
+- Bounds are required to be ordered (`xmin <= xmax`, `ymin <= ymax`).
Inverted-bound values are reserved for a future antimeridian-wraparound
semantics on geography bboxes; predicates and join planning throw
`IllegalArgumentException` on inverted input today.
+- Unlike [ST_Envelope](../Bounding-Box-Functions/ST_Envelope.md), which
returns a `Geometry` polygon, [ST_Box2D](Box2D-Constructors/ST_Box2D.md)
returns a typed `Box2D` value. Prefer the typed form when downstream code only
needs the four bounds, and prefer the polygon when downstream code expects a
`Geometry`.
+
+## Box2D Constructors
+
+| Function | Return type | Description | Since |
+| :--- | :--- | :--- | :--- |
+| [ST_Box2D](Box2D-Constructors/ST_Box2D.md) | Box2D | Return the planar
bounding box of a Geometry as a Box2D. | v1.9.1 |
+| [ST_MakeBox2D](Box2D-Constructors/ST_MakeBox2D.md) | Box2D | Build a Box2D
from two corner POINT geometries. | v1.9.1 |
+| [ST_GeomFromBox2D](Box2D-Constructors/ST_GeomFromBox2D.md) | Geometry |
Convert a Box2D to a closed rectangular polygon Geometry (degenerate boxes
return a Point or LineString). | v1.9.1 |
+
+## Box2D Accessors
+
+| Function | Return type | Description | Since |
+| :--- | :--- | :--- | :--- |
+| [ST_XMin](Box2D-Accessors/ST_XMin.md) | Double | Return the minimum X
coordinate of a Box2D. | v1.9.1 |
+| [ST_YMin](Box2D-Accessors/ST_YMin.md) | Double | Return the minimum Y
coordinate of a Box2D. | v1.9.1 |
+| [ST_XMax](Box2D-Accessors/ST_XMax.md) | Double | Return the maximum X
coordinate of a Box2D. | v1.9.1 |
+| [ST_YMax](Box2D-Accessors/ST_YMax.md) | Double | Return the maximum Y
coordinate of a Box2D. | v1.9.1 |
+
+The same `ST_XMin` / `ST_YMin` / `ST_XMax` / `ST_YMax` functions also accept
`Geometry` inputs — see [Bounding Box
Functions](../Geometry-Functions.md#bounding-box-functions).
+
+## Box2D Predicates
+
+| Function | Return type | Description | Since |
+| :--- | :--- | :--- | :--- |
+| [ST_BoxIntersects](Box2D-Predicates/ST_BoxIntersects.md) | Boolean |
Closed-interval bbox intersection over two Box2D arguments. Matches PostGIS
`&&` on `box2d`. | v1.9.1 |
+| [ST_BoxContains](Box2D-Predicates/ST_BoxContains.md) | Boolean |
Closed-interval bbox containment over two Box2D arguments. Matches PostGIS `~`
on `box2d`. | v1.9.1 |
+
+## Box2D Functions
+
+| Function | Return type | Description | Since |
+| :--- | :--- | :--- | :--- |
+| [ST_Expand](Box2D-Functions/ST_Expand.md) | Box2D | Expand a Box2D by a
per-axis or uniform delta. | v1.9.1 |
+| [ST_AsText](Box2D-Functions/ST_AsText.md) | String | Return the `BOX(xmin
ymin, xmax ymax)` text representation of a Box2D. | v1.9.1 |
+
Review Comment:
Added in 3ef5ef5e: new docs/api/sql/Aggregate-Functions/ST_Extent.md (the
function is already SQL-registered via Catalog.aggregateExpressions; only the
docs page was missing). Cross-referenced from Geometry-Functions.md's Aggregate
Functions table (Geometry input) and from a new "Box2D Aggregates" subsection
on Box2D-Functions.md (Box2D output).
##########
mkdocs.yml:
##########
@@ -75,6 +75,7 @@ nav:
- Quick start: api/sql/Overview.md
- Vector data:
- Geometry Functions: api/sql/Geometry-Functions.md
+ - Box2D Functions: api/sql/box2d/Box2D-Functions.md
Review Comment:
Fixed in 3ef5ef5e — added "Box2D Functions: Box2D 函数" under nav_translations
alongside the Geometry / Geography entries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]