jiayuasu opened a new pull request, #2863:
URL: https://github.com/apache/sedona/pull/2863

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, the PR name follows the format `[GH-XXX] my subject`. Closes #2857
   
   ## What changes were proposed in this PR?
   
   `ST_S2CellIDs` returned a covering that under-covered the planar input 
polygon along long non-meridional edges (~0.92% of the reporter's polygon at 
level 12). Sedona geometry-type objects are planar — an edge between two 
vertices is a straight line in `(longitude, latitude)` space — but S2 connects 
the same vertices with great-circle arcs on the sphere. The two interpretations 
agree at the vertices but diverge along the edges (a great-circle arc between 
two points at the same northern latitude bulges poleward of the planar chord). 
Handing JTS vertices to S2 directly produces a spherical polygon whose interior 
is *smaller* than the planar polygon's, and the S2 covering of that smaller 
polygon misses thin slivers of the original input.
   
   This PR JTS-buffers the input by an upper bound on the worst-case 
great-circle/chord deviation before converting to S2:
   
   - **Polygon**: per-edge midpoint deviation across exterior + interior rings. 
Buffering offsets edges in place, so per-edge analysis is tight.
   - **LineString**: bbox-corner pairs (SW–NE diagonal, NW–SE diagonal, 
worst-case east-west). Buffering turns the line into a polygon corridor whose 
long edges span the line's full envelope (especially for collinear segments JTS 
simplifies away), which per-segment analysis under-bounds. Diagonal 
great-circle arcs deviate more than east-west arcs at high latitudes, so we 
probe both.
   - A 1.5× safety multiplier absorbs numerical error and the small additional 
deviation the buffered polygon's own (slightly different) edges introduce.
   
   Multi-geometries are decomposed by \`GeomUtils.extractGeometryCollection\` 
upstream and each leaf is buffered/covered independently.
   
   ### Status of the reported bug
   
   | Metric                           | Before (master) | After (this PR) |
   |----------------------------------|-----------------|-----------------|
   | Original polygon area            | 22.1941 deg²    | 22.1941 deg²    |
   | Cells returned at level 12       | 49,086          | 53,192 (+8.4%)  |
   | Uncovered area                   | 0.20349708 deg² | **0.0**         |
   | Uncovered fraction               | 0.916897%       | **0.000000%**   |
   | \`coverage.covers(input)\`       | **false**       | **true**        |
   
   ![GH-2857 before / 
after](https://github.com/jiayuasu/sedona/raw/fix/s2-cellids-coverage/docs/image/GH-2857/coverage_before_after.png)
   
   The dark-blue outline is the input polygon. Light-blue fill is the union of 
cell geometries returned by \`ST_S2ToGeom(ST_S2CellIDs(...))\`. Red is 
\`input.difference(coverage)\` — the planar input area not covered. The bottom 
row zooms in on the southern boundary where the issue showed the slivers 
concentrated.
   
   ### Side effect for \`LineString\`
   
   After the buffer, a line input becomes a polygon corridor before S2 sees it. 
Returned cells therefore cover a thin strip *around* the line rather than only 
cells the line geometrically traverses. This is documented in the new "Planar 
input, spherical cells" note in the SQL / Flink / Snowflake \`ST_S2CellIDs\` 
pages. Use a sufficiently fine \`level\` to keep the corridor narrow.
   
   ## How was this patch tested?
   
   \`mvn -pl common test -Dtest=FunctionsTest\` — 284 tests, all green.
   
   New test cases in 
\`common/src/test/java/org/apache/sedona/common/FunctionsTest.java\`:
   
   - \`testS2CoverageContainsInput\` — the reporter's exact polygon at level 
12; reproduces \`0.20349708 deg² uncovered\` on master, passes after the fix.
   - \`testS2CoverageContainsLineString\` — long east-west line at mid-latitude.
   - \`testS2CoverageContainsMultiPolygon\` — disjoint polygons, decomposition 
+ per-leaf buffer.
   - \`testS2CoverageContainsMultiLineString\` — north / south / diagonal 
multi-segment lines (the diagonal case is what required the diagonal-aware 
envelope bound).
   - \`testS2CoverageContainsGeometryCollection\` — heterogeneous Polygon + 
LineString.
   - \`testS2CoverageContainsPolygonWithHole\` — exterior + interior ring 
deviations.
   - \`testS2CoverageContainsHighLatitudePolygon\` — 80°N polygon with 60° 
east-west edges.
   
   Existing tests (including \`S2CellIDs\` spatial-join correctness, MBR 
equivalence, single-cell polygons, lines, points) continue to pass.
   
   ## Did this PR include necessary documentation updates?
   
   - Yes, I have updated the documentation. Added a "Planar input, spherical 
cells" note to \`ST_S2CellIDs.md\` for SQL, Flink, and Snowflake explaining the 
planar-vs-spherical mismatch, when no extra cells are added 
(meridional/equator-aligned edges), and the LineString corridor consequence. No 
public API surface changed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to