Copilot commented on code in PR #2863: URL: https://github.com/apache/sedona/pull/2863#discussion_r3145211104
########## docs/api/snowflake/vector-data/Spatial-Indexing/ST_S2CellIDs.md: ########## @@ -27,6 +27,36 @@ Format: `ST_S2CellIDs(geom: geometry, level: Int)` Return type: `Array<Long>` +!!! note "Planar input, spherical cells" + Sedona geometry type objects are planar: an edge between two vertices is a straight line in + `(longitude, latitude)` space. S2 cells are spherical: an edge between two vertices is a + great-circle arc on the unit sphere. The two interpretations agree at the vertices but not + along the edges — for example, a great-circle arc connecting two points at the same + non-equatorial latitude bulges *toward the nearer pole* rather than following the parallel. + + Without compensation this would let the returned cells under-cover the original planar + geometry along long, non-meridional edges (the bug reported in + [GH-2857](https://github.com/apache/sedona/issues/2857)). To prevent that, `ST_S2CellIDs` + JTS-buffers the input by an upper bound on the great-circle/chord deviation before + converting it to an S2 region. The result is a covering that always *contains* the + original planar geometry, at the cost of a small number of extra boundary cells. Inputs + that are themselves spherical (e.g. polygons whose edges are explicitly meridians or the + equator) see no additional cells. + + For `LineString` and `MultiLineString` inputs the buffer turns the line into a polygon + corridor, so the returned cells cover a thin strip *around* the line rather than only + cells the line geometrically passes through. Use a sufficiently fine `level` to keep the + corridor narrow. Review Comment: This note claims `ST_S2CellIDs` buffers inputs before converting to an S2 region, but the implementation skips buffering for antimeridian-crossing geometries (see `S2Utils.toS2Region` / `crossesAntimeridian`). Please document that exception here because it affects output (no extra boundary cells / no LineString corridor) for those cases. ```suggestion JTS-buffers inputs that do **not** cross the antimeridian by an upper bound on the great-circle/chord deviation before converting them to an S2 region. The result is a covering that always *contains* the original planar geometry, at the cost of a small number of extra boundary cells. Inputs that are themselves spherical (e.g. polygons whose edges are explicitly meridians or the equator) see no additional cells. Geometries that cross the antimeridian are an exception: they are converted to an S2 region without this compensating buffer, so they do not get the extra boundary cells described above. For `LineString` and `MultiLineString` inputs where buffering is applied, the buffer turns the line into a polygon corridor, so the returned cells cover a thin strip *around* the line rather than only cells the line geometrically passes through. Antimeridian-crossing lines are the exception because the buffer is skipped for them. Use a sufficiently fine `level` to keep the corridor narrow. ``` ########## common/src/main/java/org/apache/sedona/common/utils/S2Utils.java: ########## @@ -107,13 +107,267 @@ public static Polygon toJTSPolygon(S2CellId cellId) { return new GeometryFactory().createPolygon(coords); } + /** + * Convert a JTS planar geometry into an S2Region whose lat/lng projection is guaranteed to + * contain the input geometry. + * + * <p>Why a buffer is needed: Sedona geometries are planar — an edge between two vertices is a + * straight line in (lng, lat) space — but S2 connects the same vertices with a great-circle arc + * on the sphere. The two interpretations agree at the vertices but diverge along the edges (e.g. + * the great-circle arc between two points at the same northern latitude bulges northward, leaving + * the parallel that would form the planar chord). If we hand the JTS vertices to S2 directly, the + * spherical polygon's interior is *smaller* than the planar polygon's interior along + * non-meridional edges, so the S2 covering misses thin slivers of the original planar polygon + * (see GH-2857). + * + * <p>To compensate, we JTS-buffer the input by an upper bound on the worst-case great-circle + * deviation before converting to S2. A side effect for {@link LineString} inputs is that the + * buffer turns the line into a polygon corridor; downstream callers therefore see cells in a thin + * strip around the line rather than only cells the line geometrically traverses. + */ Review Comment: `toS2Region`’s Javadoc says the input is JTS-buffered before converting to S2, but the implementation explicitly skips buffering when `crossesAntimeridian(geom)` is true (`eps = 0.0`). Please update the Javadoc to document this exception (and the rationale), otherwise the “guaranteed to contain”/buffer description is misleading for antimeridian-crossing inputs. ########## docs/api/flink/Spatial-Indexing/ST_S2CellIDs.md: ########## @@ -29,6 +29,36 @@ Return type: `Array<Long>` Since: `v1.4.0` +!!! note "Planar input, spherical cells" + Sedona geometry type objects are planar: an edge between two vertices is a straight line in + `(longitude, latitude)` space. S2 cells are spherical: an edge between two vertices is a + great-circle arc on the unit sphere. The two interpretations agree at the vertices but not + along the edges — for example, a great-circle arc connecting two points at the same + non-equatorial latitude bulges *toward the nearer pole* rather than following the parallel. + + Without compensation this would let the returned cells under-cover the original planar + geometry along long, non-meridional edges (the bug reported in + [GH-2857](https://github.com/apache/sedona/issues/2857)). To prevent that, `ST_S2CellIDs` + JTS-buffers the input by an upper bound on the great-circle/chord deviation before + converting it to an S2 region. The result is a covering that always *contains* the + original planar geometry, at the cost of a small number of extra boundary cells. Inputs + that are themselves spherical (e.g. polygons whose edges are explicitly meridians or the + equator) see no additional cells. + + For `LineString` and `MultiLineString` inputs the buffer turns the line into a polygon + corridor, so the returned cells cover a thin strip *around* the line rather than only + cells the line geometrically passes through. Use a sufficiently fine `level` to keep the + corridor narrow. Review Comment: This note says `ST_S2CellIDs` buffers the input before S2 conversion, but the code skips buffering for antimeridian-crossing geometries (see `S2Utils.toS2Region`, `crossesAntimeridian`). Please mention this exception here since it changes behavior (no extra boundary cells / no LineString corridor) for those inputs. ```suggestion normally JTS-buffers the input by an upper bound on the great-circle/chord deviation before converting it to an S2 region. The result is a covering that always *contains* the original planar geometry, at the cost of a small number of extra boundary cells. Antimeridian-crossing geometries are an exception: they are converted without this extra buffer, so they do not gain those additional boundary cells from buffering. Inputs that are themselves spherical (e.g. polygons whose edges are explicitly meridians or the equator) see no additional cells. For `LineString` and `MultiLineString` inputs, when this buffer is applied, it turns the line into a polygon corridor, so the returned cells cover a thin strip *around* the line rather than only cells the line geometrically passes through. Antimeridian- crossing lines are again an exception because buffering is skipped for them. Use a sufficiently fine `level` to keep the corridor narrow. ``` ########## common/src/main/java/org/apache/sedona/common/utils/S2Utils.java: ########## @@ -107,13 +107,267 @@ public static Polygon toJTSPolygon(S2CellId cellId) { return new GeometryFactory().createPolygon(coords); } + /** + * Convert a JTS planar geometry into an S2Region whose lat/lng projection is guaranteed to + * contain the input geometry. + * + * <p>Why a buffer is needed: Sedona geometries are planar — an edge between two vertices is a + * straight line in (lng, lat) space — but S2 connects the same vertices with a great-circle arc + * on the sphere. The two interpretations agree at the vertices but diverge along the edges (e.g. + * the great-circle arc between two points at the same northern latitude bulges northward, leaving + * the parallel that would form the planar chord). If we hand the JTS vertices to S2 directly, the + * spherical polygon's interior is *smaller* than the planar polygon's interior along + * non-meridional edges, so the S2 covering misses thin slivers of the original planar polygon + * (see GH-2857). + * + * <p>To compensate, we JTS-buffer the input by an upper bound on the worst-case great-circle + * deviation before converting to S2. A side effect for {@link LineString} inputs is that the + * buffer turns the line into a polygon corridor; downstream callers therefore see cells in a thin + * strip around the line rather than only cells the line geometrically traverses. + */ public static S2Region toS2Region(Geometry geom) throws IllegalArgumentException { + if (!(geom instanceof Polygon) && !(geom instanceof LineString)) { + throw new IllegalArgumentException( + "only Polygon or LineString (including LinearRing) types can be converted to S2Region"); + } + // JTS planar buffer doesn't understand antimeridian crossing — for inputs that + // straddle the antimeridian, buffering produces a polygon that goes the wrong way + // around the globe and explodes the S2 covering. Detect antimeridian crossing + // per-edge rather than via envelope width: an envelope-width heuristic fires for + // any polygon whose bbox spans > 180° of longitude even if no individual edge + // does (e.g. a tessellated wide polygon with intermediate vertices), and we'd + // incorrectly skip the buffer for those, reintroducing the GH-2857 miscoverage on + // their long non-meridional edges. Per-edge |Δlng| > 180° fires only when the + // input genuinely contains an edge that wraps the antimeridian. (For a coarse + // polygon with a single edge spanning > 180° both heuristics agree, and skipping + // the buffer is the safe choice — JTS buffer can't usefully process such an edge + // anyway.) + boolean spansAntimeridian = crossesAntimeridian(geom); + double eps = spansAntimeridian ? 0.0 : arcChordBufferDegrees(geom); + Geometry buffered = (eps > 0) ? geom.buffer(eps) : geom; + if (buffered instanceof Polygon) { + return S2Utils.toS2Polygon((Polygon) buffered); + } else if (buffered instanceof LineString) { + // Only reachable when eps == 0 (e.g. a single-point degenerate line). Normal lines + // become Polygon corridors after buffer and are handled above. + return S2Utils.toS2PolyLine((LineString) buffered); + } else if (buffered instanceof MultiPolygon && buffered.getNumGeometries() > 0) { + // JTS buffer of self-touching geometries can collapse to MultiPolygon. Build a single + // S2Polygon containing every component's loops so the resulting region still contains + // every part of the buffered geometry; dropping components would silently break the + // containment guarantee for the discarded shells. + List<S2Loop> loops = new ArrayList<>(); + for (int i = 0; i < buffered.getNumGeometries(); i++) { + Polygon p = (Polygon) buffered.getGeometryN(i); + loops.add(toS2Loop(p.getExteriorRing())); + for (int j = 0; j < p.getNumInteriorRing(); j++) { + loops.add(toS2Loop(p.getInteriorRingN(j))); + } + } + return new S2Polygon(loops); + } + throw new IllegalArgumentException( + "only Polygon or LineString (including LinearRing) types can be converted to S2Region"); + } + + /** + * Compute the JTS buffer amount (in degrees) needed so that the spherical interpretation of the + * buffered geometry fully contains the original planar geometry. + * + * <p>The buffer must be at least as large as the largest great-circle/chord deviation among the + * edges that S2 will eventually see. Polygons and lines need different bounds because JTS buffer + * affects their edges differently: + * + * <ul> + * <li><b>Polygon</b>: each existing edge is offset perpendicularly in place; corners get + * rounded into many short arcs, but no edge is dramatically lengthened. The buffered + * polygon's edges therefore have ~the same length as the originals, so the original + * polygon's per-edge deviation is a tight upper bound on what the buffered polygon's edges + * will exhibit. We use {@link #ringMaxDeviationDegrees}. + * <li><b>LineString</b>: buffering produces a corridor whose long top/bottom edges span the + * line's full envelope — far longer than any individual segment when consecutive segments + * are collinear (JTS often simplifies them away). Per-segment deviation severely + * under-bounds the corridor's actual edge deviation. We bound by virtual edges across the + * envelope via {@link #envelopeDeviationDegrees}. + * </ul> + * + * <p>The 1.5× safety multiplier absorbs numerical error and the small additional deviation the + * buffered polygon's own (slightly different) edges introduce on top of the bound. + */ + private static double arcChordBufferDegrees(Geometry geom) { + double maxDev = 0.0; if (geom instanceof Polygon) { - return S2Utils.toS2Polygon((Polygon) geom); + Polygon poly = (Polygon) geom; + maxDev = Math.max(maxDev, ringMaxDeviationDegrees(poly.getExteriorRing().getCoordinates())); + for (int i = 0; i < poly.getNumInteriorRing(); i++) { + maxDev = + Math.max(maxDev, ringMaxDeviationDegrees(poly.getInteriorRingN(i).getCoordinates())); + } } else if (geom instanceof LineString) { - return S2Utils.toS2PolyLine((LineString) geom); + maxDev = envelopeDeviationDegrees(geom); } - throw new IllegalArgumentException( - "only object of Polygon, LinearRing, LineString type can be converted to S2Region"); + return maxDev * 1.5; + } + + /** + * Conservative deviation upper bound for a geometry, derived from its bounding envelope rather + * than its actual edges. + * + * <p>Used for {@link LineString} inputs because, after JTS buffer, the corridor's long edges are + * not the line's segments — they connect the line's extreme endpoints (or close to it). To bound + * them we probe three virtual edges across the envelope: + * + * <ul> + * <li>The two diagonals (SW–NE and NW–SE) — diagonal great-circle arcs deviate more than + * east-west arcs of the same Δλ at high latitudes, and a buffered corridor's long edges can + * run in either direction depending on the line's orientation. + * <li>The worst-case east-west edge at whichever latitude (top or bottom of the envelope) has + * the larger absolute value — east-west arcs bulge poleward, so the deviation grows with + * |latitude|, and an envelope-spanning east-west arc is what a horizontal collinear line + * would buffer into. + * </ul> + * + * <p>The max across these three bounds the deviation any corridor edge could plausibly exhibit. + * This deliberately over-bounds zigzag lines whose actual corridor edges are short; the + * alternative (per-segment analysis) silently fails on collinear inputs. + */ + private static double envelopeDeviationDegrees(Geometry geom) { + Envelope env = geom.getEnvelopeInternal(); + if (env.isNull()) { + return 0.0; + } + if (crossesAntimeridian(geom)) { + // JTS envelopes don't understand antimeridian crossing — a line from (179, y) to + // (-179, y) reports a 358°-wide envelope even though the actual edge is 2° long. The + // envelope-corner virtual edges would then describe a near-globe-spanning chord and + // produce a meaninglessly huge deviation. For such inputs, fall back to per-segment + // analysis, which works directly off the actual edges and is correct regardless of + // antimeridian crossings. + return ringMaxDeviationDegrees(geom.getCoordinates()); + } Review Comment: In `envelopeDeviationDegrees`, the `if (crossesAntimeridian(geom)) { ... }` branch appears unreachable in the current design: `toS2Region` skips calling `arcChordBufferDegrees` (and therefore `envelopeDeviationDegrees`) entirely when `crossesAntimeridian(geom)` is true (`eps = 0.0`). Consider removing this dead branch (and its comment) or restructuring so the antimeridian handling lives in one place, to avoid misleading future readers about which paths are actually exercised. ```suggestion ``` ########## docs/api/sql/Spatial-Indexing/ST_S2CellIDs.md: ########## @@ -29,6 +29,36 @@ Return type: `Array<Long>` Since: `v1.4.0` +!!! note "Planar input, spherical cells" + Sedona geometry type objects are planar: an edge between two vertices is a straight line in + `(longitude, latitude)` space. S2 cells are spherical: an edge between two vertices is a + great-circle arc on the unit sphere. The two interpretations agree at the vertices but not + along the edges — for example, a great-circle arc connecting two points at the same + non-equatorial latitude bulges *toward the nearer pole* rather than following the parallel. + + Without compensation this would let the returned cells under-cover the original planar + geometry along long, non-meridional edges (the bug reported in + [GH-2857](https://github.com/apache/sedona/issues/2857)). To prevent that, `ST_S2CellIDs` + JTS-buffers the input by an upper bound on the great-circle/chord deviation before + converting it to an S2 region. The result is a covering that always *contains* the + original planar geometry, at the cost of a small number of extra boundary cells. Inputs + that are themselves spherical (e.g. polygons whose edges are explicitly meridians or the + equator) see no additional cells. + + For `LineString` and `MultiLineString` inputs the buffer turns the line into a polygon + corridor, so the returned cells cover a thin strip *around* the line rather than only + cells the line geometrically passes through. Use a sufficiently fine `level` to keep the + corridor narrow. + Review Comment: This note states that `ST_S2CellIDs` “JTS-buffers the input … before converting it to an S2 region”, but the implementation skips buffering for antimeridian-crossing geometries (see `S2Utils.toS2Region`, which sets `eps = 0.0` when an edge has |Δlng|>180). Please document that exception here as it affects output (no extra boundary cells / no LineString corridor) for those inputs. ```suggestion normally JTS-buffers the input by an upper bound on the great-circle/chord deviation before converting it to an S2 region. The result is a covering that always *contains* the original planar geometry, at the cost of a small number of extra boundary cells. Inputs that are themselves spherical (e.g. polygons whose edges are explicitly meridians or the equator) see no additional cells. Exception: buffering is skipped for geometries that cross the antimeridian (that is, an edge has `|Δlng| > 180`). For those inputs, the output reflects the unbuffered S2 region: there are no extra boundary cells from this compensation step. For `LineString` and `MultiLineString` inputs, when buffering is applied it turns the line into a polygon corridor, so the returned cells cover a thin strip *around* the line rather than only cells the line geometrically passes through. Use a sufficiently fine `level` to keep the corridor narrow. Antimeridian-crossing line inputs are not buffered, so they do not get this corridor. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
