jiayuasu opened a new pull request, #2921:
URL: https://github.com/apache/sedona/pull/2921

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes #2886
   
   ## What changes were proposed in this PR?
   
   When a user has a `Box2D`-typed column in the schema being written, both the 
explicit covering option (`geoparquet.covering[.geom]=<col>`) and the 
auto-detect `<geom>_bbox` path now register it in GeoParquet metadata as the 
bbox covering column for the associated geometry.
   
   `Box2DUDT.sqlType` is `struct<xmin, ymin, xmax, ymax: double>` — exactly the 
shape required by the GeoParquet 1.1 covering spec — so **no data-path change 
is needed**: the existing `UserDefinedType` → struct fallback in 
`GeoParquetWriteSupport.makeWriter` already serializes correctly. Only the 
metadata path needed to learn that a UDT-wrapped struct is a valid covering 
source.
   
   The change is one new case in 
`GeoParquetMetaData.createCoveringColumnMetadata` that recognizes `Box2DUDT` 
and dispatches to the existing struct-shaped covering builder.
   
   ## Why Float32 + conservative rounding is deferred
   
   The original issue scoped Float32 + `Math.nextUp` / `Math.nextDown` outward 
rounding for size and bit-compatibility with `apache/sedona-db`'s `next_after` 
writer. Shipping that without a paired reader-side change would create a 
write/read asymmetry: a written `Box2D` column comes back as a generic 
`struct<float>` (not a `Box2D`) because the on-disk Float32 schema no longer 
matches `Box2DUDT.sqlType` (which is Float64), so PySpark's UDT resolver 
doesn't claim it.
   
   That asymmetry would regress the user-visible type round-trip we just 
shipped across #2890–#2906. Pairing Float32 with reader auto-materialization is 
its own follow-up; this PR delivers the metadata side cleanly without 
committing the read-path regression.
   
   ## How was this patch tested?
   
   `geoparquetIOTests`:
   - "GeoParquet supports writing covering metadata from a Box2D column" — 
user-supplied Box2D column referenced via `geoparquet.covering.geometry`; 
verifies the GeoParquet metadata has matching 
`covering.bbox.{xmin,ymin,xmax,ymax}` paths.
   - "GeoParquet auto populates covering metadata for a Box2D `<geom>_bbox` 
column" — auto-detect path: a Box2D-typed `geometry_bbox` column gets 
registered without explicit configuration.
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public SQL API documentation surface in 
isolation. Documentation for the Phase 1 Box2D surface (#2877) lands as a 
single coherent docs update once any deferred follow-ups (Float32 covering 
writer + reader auto-materialization) are scoped.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to