Copilot commented on code in PR #2863:
URL: https://github.com/apache/sedona/pull/2863#discussion_r3145211104


##########
docs/api/snowflake/vector-data/Spatial-Indexing/ST_S2CellIDs.md:
##########
@@ -27,6 +27,36 @@ Format: `ST_S2CellIDs(geom: geometry, level: Int)`
 
 Return type: `Array<Long>`
 
+!!! note "Planar input, spherical cells"
+    Sedona geometry type objects are planar: an edge between two vertices is a 
straight line in
+    `(longitude, latitude)` space. S2 cells are spherical: an edge between two 
vertices is a
+    great-circle arc on the unit sphere. The two interpretations agree at the 
vertices but not
+    along the edges — for example, a great-circle arc connecting two points at 
the same
+    non-equatorial latitude bulges *toward the nearer pole* rather than 
following the parallel.
+
+    Without compensation this would let the returned cells under-cover the 
original planar
+    geometry along long, non-meridional edges (the bug reported in
+    [GH-2857](https://github.com/apache/sedona/issues/2857)). To prevent that, 
`ST_S2CellIDs`
+    JTS-buffers the input by an upper bound on the great-circle/chord 
deviation before
+    converting it to an S2 region. The result is a covering that always 
*contains* the
+    original planar geometry, at the cost of a small number of extra boundary 
cells. Inputs
+    that are themselves spherical (e.g. polygons whose edges are explicitly 
meridians or the
+    equator) see no additional cells.
+
+    For `LineString` and `MultiLineString` inputs the buffer turns the line 
into a polygon
+    corridor, so the returned cells cover a thin strip *around* the line 
rather than only
+    cells the line geometrically passes through. Use a sufficiently fine 
`level` to keep the
+    corridor narrow.

Review Comment:
   This note claims `ST_S2CellIDs` buffers inputs before converting to an S2 
region, but the implementation skips buffering for antimeridian-crossing 
geometries (see `S2Utils.toS2Region` / `crossesAntimeridian`). Please document 
that exception here because it affects output (no extra boundary cells / no 
LineString corridor) for those cases.
   ```suggestion
       JTS-buffers inputs that do **not** cross the antimeridian by an upper 
bound on the
       great-circle/chord deviation before converting them to an S2 region. The 
result is a
       covering that always *contains* the original planar geometry, at the 
cost of a small
       number of extra boundary cells. Inputs that are themselves spherical 
(e.g. polygons
       whose edges are explicitly meridians or the equator) see no additional 
cells. Geometries
       that cross the antimeridian are an exception: they are converted to an 
S2 region without
       this compensating buffer, so they do not get the extra boundary cells 
described above.
   
       For `LineString` and `MultiLineString` inputs where buffering is 
applied, the buffer turns
       the line into a polygon corridor, so the returned cells cover a thin 
strip *around* the
       line rather than only cells the line geometrically passes through. 
Antimeridian-crossing
       lines are the exception because the buffer is skipped for them. Use a 
sufficiently fine
       `level` to keep the corridor narrow.
   ```



##########
common/src/main/java/org/apache/sedona/common/utils/S2Utils.java:
##########
@@ -107,13 +107,267 @@ public static Polygon toJTSPolygon(S2CellId cellId) {
     return new GeometryFactory().createPolygon(coords);
   }
 
+  /**
+   * Convert a JTS planar geometry into an S2Region whose lat/lng projection 
is guaranteed to
+   * contain the input geometry.
+   *
+   * <p>Why a buffer is needed: Sedona geometries are planar — an edge between 
two vertices is a
+   * straight line in (lng, lat) space — but S2 connects the same vertices 
with a great-circle arc
+   * on the sphere. The two interpretations agree at the vertices but diverge 
along the edges (e.g.
+   * the great-circle arc between two points at the same northern latitude 
bulges northward, leaving
+   * the parallel that would form the planar chord). If we hand the JTS 
vertices to S2 directly, the
+   * spherical polygon's interior is *smaller* than the planar polygon's 
interior along
+   * non-meridional edges, so the S2 covering misses thin slivers of the 
original planar polygon
+   * (see GH-2857).
+   *
+   * <p>To compensate, we JTS-buffer the input by an upper bound on the 
worst-case great-circle
+   * deviation before converting to S2. A side effect for {@link LineString} 
inputs is that the
+   * buffer turns the line into a polygon corridor; downstream callers 
therefore see cells in a thin
+   * strip around the line rather than only cells the line geometrically 
traverses.
+   */

Review Comment:
   `toS2Region`’s Javadoc says the input is JTS-buffered before converting to 
S2, but the implementation explicitly skips buffering when 
`crossesAntimeridian(geom)` is true (`eps = 0.0`). Please update the Javadoc to 
document this exception (and the rationale), otherwise the “guaranteed to 
contain”/buffer description is misleading for antimeridian-crossing inputs.



##########
docs/api/flink/Spatial-Indexing/ST_S2CellIDs.md:
##########
@@ -29,6 +29,36 @@ Return type: `Array<Long>`
 
 Since: `v1.4.0`
 
+!!! note "Planar input, spherical cells"
+    Sedona geometry type objects are planar: an edge between two vertices is a 
straight line in
+    `(longitude, latitude)` space. S2 cells are spherical: an edge between two 
vertices is a
+    great-circle arc on the unit sphere. The two interpretations agree at the 
vertices but not
+    along the edges — for example, a great-circle arc connecting two points at 
the same
+    non-equatorial latitude bulges *toward the nearer pole* rather than 
following the parallel.
+
+    Without compensation this would let the returned cells under-cover the 
original planar
+    geometry along long, non-meridional edges (the bug reported in
+    [GH-2857](https://github.com/apache/sedona/issues/2857)). To prevent that, 
`ST_S2CellIDs`
+    JTS-buffers the input by an upper bound on the great-circle/chord 
deviation before
+    converting it to an S2 region. The result is a covering that always 
*contains* the
+    original planar geometry, at the cost of a small number of extra boundary 
cells. Inputs
+    that are themselves spherical (e.g. polygons whose edges are explicitly 
meridians or the
+    equator) see no additional cells.
+
+    For `LineString` and `MultiLineString` inputs the buffer turns the line 
into a polygon
+    corridor, so the returned cells cover a thin strip *around* the line 
rather than only
+    cells the line geometrically passes through. Use a sufficiently fine 
`level` to keep the
+    corridor narrow.

Review Comment:
   This note says `ST_S2CellIDs` buffers the input before S2 conversion, but 
the code skips buffering for antimeridian-crossing geometries (see 
`S2Utils.toS2Region`, `crossesAntimeridian`). Please mention this exception 
here since it changes behavior (no extra boundary cells / no LineString 
corridor) for those inputs.
   ```suggestion
       normally JTS-buffers the input by an upper bound on the 
great-circle/chord deviation
       before converting it to an S2 region. The result is a covering that 
always *contains*
       the original planar geometry, at the cost of a small number of extra 
boundary cells.
       Antimeridian-crossing geometries are an exception: they are converted 
without this
       extra buffer, so they do not gain those additional boundary cells from 
buffering.
       Inputs that are themselves spherical (e.g. polygons whose edges are 
explicitly
       meridians or the equator) see no additional cells.
   
       For `LineString` and `MultiLineString` inputs, when this buffer is 
applied, it turns
       the line into a polygon corridor, so the returned cells cover a thin 
strip *around*
       the line rather than only cells the line geometrically passes through. 
Antimeridian-
       crossing lines are again an exception because buffering is skipped for 
them. Use a
       sufficiently fine `level` to keep the corridor narrow.
   ```



##########
common/src/main/java/org/apache/sedona/common/utils/S2Utils.java:
##########
@@ -107,13 +107,267 @@ public static Polygon toJTSPolygon(S2CellId cellId) {
     return new GeometryFactory().createPolygon(coords);
   }
 
+  /**
+   * Convert a JTS planar geometry into an S2Region whose lat/lng projection 
is guaranteed to
+   * contain the input geometry.
+   *
+   * <p>Why a buffer is needed: Sedona geometries are planar — an edge between 
two vertices is a
+   * straight line in (lng, lat) space — but S2 connects the same vertices 
with a great-circle arc
+   * on the sphere. The two interpretations agree at the vertices but diverge 
along the edges (e.g.
+   * the great-circle arc between two points at the same northern latitude 
bulges northward, leaving
+   * the parallel that would form the planar chord). If we hand the JTS 
vertices to S2 directly, the
+   * spherical polygon's interior is *smaller* than the planar polygon's 
interior along
+   * non-meridional edges, so the S2 covering misses thin slivers of the 
original planar polygon
+   * (see GH-2857).
+   *
+   * <p>To compensate, we JTS-buffer the input by an upper bound on the 
worst-case great-circle
+   * deviation before converting to S2. A side effect for {@link LineString} 
inputs is that the
+   * buffer turns the line into a polygon corridor; downstream callers 
therefore see cells in a thin
+   * strip around the line rather than only cells the line geometrically 
traverses.
+   */
   public static S2Region toS2Region(Geometry geom) throws 
IllegalArgumentException {
+    if (!(geom instanceof Polygon) && !(geom instanceof LineString)) {
+      throw new IllegalArgumentException(
+          "only Polygon or LineString (including LinearRing) types can be 
converted to S2Region");
+    }
+    // JTS planar buffer doesn't understand antimeridian crossing — for inputs 
that
+    // straddle the antimeridian, buffering produces a polygon that goes the 
wrong way
+    // around the globe and explodes the S2 covering. Detect antimeridian 
crossing
+    // per-edge rather than via envelope width: an envelope-width heuristic 
fires for
+    // any polygon whose bbox spans > 180° of longitude even if no individual 
edge
+    // does (e.g. a tessellated wide polygon with intermediate vertices), and 
we'd
+    // incorrectly skip the buffer for those, reintroducing the GH-2857 
miscoverage on
+    // their long non-meridional edges. Per-edge |Δlng| > 180° fires only when 
the
+    // input genuinely contains an edge that wraps the antimeridian. (For a 
coarse
+    // polygon with a single edge spanning > 180° both heuristics agree, and 
skipping
+    // the buffer is the safe choice — JTS buffer can't usefully process such 
an edge
+    // anyway.)
+    boolean spansAntimeridian = crossesAntimeridian(geom);
+    double eps = spansAntimeridian ? 0.0 : arcChordBufferDegrees(geom);
+    Geometry buffered = (eps > 0) ? geom.buffer(eps) : geom;
+    if (buffered instanceof Polygon) {
+      return S2Utils.toS2Polygon((Polygon) buffered);
+    } else if (buffered instanceof LineString) {
+      // Only reachable when eps == 0 (e.g. a single-point degenerate line). 
Normal lines
+      // become Polygon corridors after buffer and are handled above.
+      return S2Utils.toS2PolyLine((LineString) buffered);
+    } else if (buffered instanceof MultiPolygon && buffered.getNumGeometries() 
> 0) {
+      // JTS buffer of self-touching geometries can collapse to MultiPolygon. 
Build a single
+      // S2Polygon containing every component's loops so the resulting region 
still contains
+      // every part of the buffered geometry; dropping components would 
silently break the
+      // containment guarantee for the discarded shells.
+      List<S2Loop> loops = new ArrayList<>();
+      for (int i = 0; i < buffered.getNumGeometries(); i++) {
+        Polygon p = (Polygon) buffered.getGeometryN(i);
+        loops.add(toS2Loop(p.getExteriorRing()));
+        for (int j = 0; j < p.getNumInteriorRing(); j++) {
+          loops.add(toS2Loop(p.getInteriorRingN(j)));
+        }
+      }
+      return new S2Polygon(loops);
+    }
+    throw new IllegalArgumentException(
+        "only Polygon or LineString (including LinearRing) types can be 
converted to S2Region");
+  }
+
+  /**
+   * Compute the JTS buffer amount (in degrees) needed so that the spherical 
interpretation of the
+   * buffered geometry fully contains the original planar geometry.
+   *
+   * <p>The buffer must be at least as large as the largest great-circle/chord 
deviation among the
+   * edges that S2 will eventually see. Polygons and lines need different 
bounds because JTS buffer
+   * affects their edges differently:
+   *
+   * <ul>
+   *   <li><b>Polygon</b>: each existing edge is offset perpendicularly in 
place; corners get
+   *       rounded into many short arcs, but no edge is dramatically 
lengthened. The buffered
+   *       polygon's edges therefore have ~the same length as the originals, 
so the original
+   *       polygon's per-edge deviation is a tight upper bound on what the 
buffered polygon's edges
+   *       will exhibit. We use {@link #ringMaxDeviationDegrees}.
+   *   <li><b>LineString</b>: buffering produces a corridor whose long 
top/bottom edges span the
+   *       line's full envelope — far longer than any individual segment when 
consecutive segments
+   *       are collinear (JTS often simplifies them away). Per-segment 
deviation severely
+   *       under-bounds the corridor's actual edge deviation. We bound by 
virtual edges across the
+   *       envelope via {@link #envelopeDeviationDegrees}.
+   * </ul>
+   *
+   * <p>The 1.5× safety multiplier absorbs numerical error and the small 
additional deviation the
+   * buffered polygon's own (slightly different) edges introduce on top of the 
bound.
+   */
+  private static double arcChordBufferDegrees(Geometry geom) {
+    double maxDev = 0.0;
     if (geom instanceof Polygon) {
-      return S2Utils.toS2Polygon((Polygon) geom);
+      Polygon poly = (Polygon) geom;
+      maxDev = Math.max(maxDev, 
ringMaxDeviationDegrees(poly.getExteriorRing().getCoordinates()));
+      for (int i = 0; i < poly.getNumInteriorRing(); i++) {
+        maxDev =
+            Math.max(maxDev, 
ringMaxDeviationDegrees(poly.getInteriorRingN(i).getCoordinates()));
+      }
     } else if (geom instanceof LineString) {
-      return S2Utils.toS2PolyLine((LineString) geom);
+      maxDev = envelopeDeviationDegrees(geom);
     }
-    throw new IllegalArgumentException(
-        "only object of Polygon, LinearRing, LineString type can be converted 
to S2Region");
+    return maxDev * 1.5;
+  }
+
+  /**
+   * Conservative deviation upper bound for a geometry, derived from its 
bounding envelope rather
+   * than its actual edges.
+   *
+   * <p>Used for {@link LineString} inputs because, after JTS buffer, the 
corridor's long edges are
+   * not the line's segments — they connect the line's extreme endpoints (or 
close to it). To bound
+   * them we probe three virtual edges across the envelope:
+   *
+   * <ul>
+   *   <li>The two diagonals (SW–NE and NW–SE) — diagonal great-circle arcs 
deviate more than
+   *       east-west arcs of the same Δλ at high latitudes, and a buffered 
corridor's long edges can
+   *       run in either direction depending on the line's orientation.
+   *   <li>The worst-case east-west edge at whichever latitude (top or bottom 
of the envelope) has
+   *       the larger absolute value — east-west arcs bulge poleward, so the 
deviation grows with
+   *       |latitude|, and an envelope-spanning east-west arc is what a 
horizontal collinear line
+   *       would buffer into.
+   * </ul>
+   *
+   * <p>The max across these three bounds the deviation any corridor edge 
could plausibly exhibit.
+   * This deliberately over-bounds zigzag lines whose actual corridor edges 
are short; the
+   * alternative (per-segment analysis) silently fails on collinear inputs.
+   */
+  private static double envelopeDeviationDegrees(Geometry geom) {
+    Envelope env = geom.getEnvelopeInternal();
+    if (env.isNull()) {
+      return 0.0;
+    }
+    if (crossesAntimeridian(geom)) {
+      // JTS envelopes don't understand antimeridian crossing — a line from 
(179, y) to
+      // (-179, y) reports a 358°-wide envelope even though the actual edge is 
2° long. The
+      // envelope-corner virtual edges would then describe a 
near-globe-spanning chord and
+      // produce a meaninglessly huge deviation. For such inputs, fall back to 
per-segment
+      // analysis, which works directly off the actual edges and is correct 
regardless of
+      // antimeridian crossings.
+      return ringMaxDeviationDegrees(geom.getCoordinates());
+    }

Review Comment:
   In `envelopeDeviationDegrees`, the `if (crossesAntimeridian(geom)) { ... }` 
branch appears unreachable in the current design: `toS2Region` skips calling 
`arcChordBufferDegrees` (and therefore `envelopeDeviationDegrees`) entirely 
when `crossesAntimeridian(geom)` is true (`eps = 0.0`). Consider removing this 
dead branch (and its comment) or restructuring so the antimeridian handling 
lives in one place, to avoid misleading future readers about which paths are 
actually exercised.
   ```suggestion
   
   ```



##########
docs/api/sql/Spatial-Indexing/ST_S2CellIDs.md:
##########
@@ -29,6 +29,36 @@ Return type: `Array<Long>`
 
 Since: `v1.4.0`
 
+!!! note "Planar input, spherical cells"
+    Sedona geometry type objects are planar: an edge between two vertices is a 
straight line in
+    `(longitude, latitude)` space. S2 cells are spherical: an edge between two 
vertices is a
+    great-circle arc on the unit sphere. The two interpretations agree at the 
vertices but not
+    along the edges — for example, a great-circle arc connecting two points at 
the same
+    non-equatorial latitude bulges *toward the nearer pole* rather than 
following the parallel.
+
+    Without compensation this would let the returned cells under-cover the 
original planar
+    geometry along long, non-meridional edges (the bug reported in
+    [GH-2857](https://github.com/apache/sedona/issues/2857)). To prevent that, 
`ST_S2CellIDs`
+    JTS-buffers the input by an upper bound on the great-circle/chord 
deviation before
+    converting it to an S2 region. The result is a covering that always 
*contains* the
+    original planar geometry, at the cost of a small number of extra boundary 
cells. Inputs
+    that are themselves spherical (e.g. polygons whose edges are explicitly 
meridians or the
+    equator) see no additional cells.
+
+    For `LineString` and `MultiLineString` inputs the buffer turns the line 
into a polygon
+    corridor, so the returned cells cover a thin strip *around* the line 
rather than only
+    cells the line geometrically passes through. Use a sufficiently fine 
`level` to keep the
+    corridor narrow.
+

Review Comment:
   This note states that `ST_S2CellIDs` “JTS-buffers the input … before 
converting it to an S2 region”, but the implementation skips buffering for 
antimeridian-crossing geometries (see `S2Utils.toS2Region`, which sets `eps = 
0.0` when an edge has |Δlng|>180). Please document that exception here as it 
affects output (no extra boundary cells / no LineString corridor) for those 
inputs.
   ```suggestion
       normally JTS-buffers the input by an upper bound on the 
great-circle/chord deviation
       before converting it to an S2 region. The result is a covering that 
always *contains*
       the original planar geometry, at the cost of a small number of extra 
boundary cells.
       Inputs that are themselves spherical (e.g. polygons whose edges are 
explicitly meridians
       or the equator) see no additional cells.
   
       Exception: buffering is skipped for geometries that cross the 
antimeridian (that is, an
       edge has `|Δlng| > 180`). For those inputs, the output reflects the 
unbuffered S2
       region: there are no extra boundary cells from this compensation step.
   
       For `LineString` and `MultiLineString` inputs, when buffering is applied 
it turns the
       line into a polygon corridor, so the returned cells cover a thin strip 
*around* the line
       rather than only cells the line geometrically passes through. Use a 
sufficiently fine
       `level` to keep the corridor narrow. Antimeridian-crossing line inputs 
are not buffered,
       so they do not get this corridor.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to