jiayuasu opened a new pull request, #2863: URL: https://github.com/apache/sedona/pull/2863
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, the PR name follows the format `[GH-XXX] my subject`. Closes #2857 ## What changes were proposed in this PR? `ST_S2CellIDs` returned a covering that under-covered the planar input polygon along long non-meridional edges (~0.92% of the reporter's polygon at level 12). Sedona geometry-type objects are planar — an edge between two vertices is a straight line in `(longitude, latitude)` space — but S2 connects the same vertices with great-circle arcs on the sphere. The two interpretations agree at the vertices but diverge along the edges (a great-circle arc between two points at the same northern latitude bulges poleward of the planar chord). Handing JTS vertices to S2 directly produces a spherical polygon whose interior is *smaller* than the planar polygon's, and the S2 covering of that smaller polygon misses thin slivers of the original input. This PR JTS-buffers the input by an upper bound on the worst-case great-circle/chord deviation before converting to S2: - **Polygon**: per-edge midpoint deviation across exterior + interior rings. Buffering offsets edges in place, so per-edge analysis is tight. - **LineString**: bbox-corner pairs (SW–NE diagonal, NW–SE diagonal, worst-case east-west). Buffering turns the line into a polygon corridor whose long edges span the line's full envelope (especially for collinear segments JTS simplifies away), which per-segment analysis under-bounds. Diagonal great-circle arcs deviate more than east-west arcs at high latitudes, so we probe both. - A 1.5× safety multiplier absorbs numerical error and the small additional deviation the buffered polygon's own (slightly different) edges introduce. Multi-geometries are decomposed by \`GeomUtils.extractGeometryCollection\` upstream and each leaf is buffered/covered independently. ### Status of the reported bug | Metric | Before (master) | After (this PR) | |----------------------------------|-----------------|-----------------| | Original polygon area | 22.1941 deg² | 22.1941 deg² | | Cells returned at level 12 | 49,086 | 53,192 (+8.4%) | | Uncovered area | 0.20349708 deg² | **0.0** | | Uncovered fraction | 0.916897% | **0.000000%** | | \`coverage.covers(input)\` | **false** | **true** |  The dark-blue outline is the input polygon. Light-blue fill is the union of cell geometries returned by \`ST_S2ToGeom(ST_S2CellIDs(...))\`. Red is \`input.difference(coverage)\` — the planar input area not covered. The bottom row zooms in on the southern boundary where the issue showed the slivers concentrated. ### Side effect for \`LineString\` After the buffer, a line input becomes a polygon corridor before S2 sees it. Returned cells therefore cover a thin strip *around* the line rather than only cells the line geometrically traverses. This is documented in the new "Planar input, spherical cells" note in the SQL / Flink / Snowflake \`ST_S2CellIDs\` pages. Use a sufficiently fine \`level\` to keep the corridor narrow. ## How was this patch tested? \`mvn -pl common test -Dtest=FunctionsTest\` — 284 tests, all green. New test cases in \`common/src/test/java/org/apache/sedona/common/FunctionsTest.java\`: - \`testS2CoverageContainsInput\` — the reporter's exact polygon at level 12; reproduces \`0.20349708 deg² uncovered\` on master, passes after the fix. - \`testS2CoverageContainsLineString\` — long east-west line at mid-latitude. - \`testS2CoverageContainsMultiPolygon\` — disjoint polygons, decomposition + per-leaf buffer. - \`testS2CoverageContainsMultiLineString\` — north / south / diagonal multi-segment lines (the diagonal case is what required the diagonal-aware envelope bound). - \`testS2CoverageContainsGeometryCollection\` — heterogeneous Polygon + LineString. - \`testS2CoverageContainsPolygonWithHole\` — exterior + interior ring deviations. - \`testS2CoverageContainsHighLatitudePolygon\` — 80°N polygon with 60° east-west edges. Existing tests (including \`S2CellIDs\` spatial-join correctness, MBR equivalence, single-cell polygons, lines, points) continue to pass. ## Did this PR include necessary documentation updates? - Yes, I have updated the documentation. Added a "Planar input, spherical cells" note to \`ST_S2CellIDs.md\` for SQL, Flink, and Snowflake explaining the planar-vs-spherical mismatch, when no extra cells are added (meridional/equator-aligned edges), and the LineString corridor consequence. No public API surface changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
