Copilot commented on code in PR #2863:
URL: https://github.com/apache/sedona/pull/2863#discussion_r3144386793
##########
common/src/test/java/org/apache/sedona/common/FunctionsTest.java:
##########
@@ -1081,6 +1081,163 @@ public void testS2ToGeom() {
assertTrue(polygons[100].intersects(target));
}
+ @Test
+ public void testS2CoverageContainsInput() throws ParseException {
+ String wkt =
+ "POLYGON ((-102.060778 39.9999603, -102.0535384 40.0119065, -101.98532
40.0122906, "
+ + "-95.30829 40.009008, -95.2456364 39.9564784, -95.1982467
39.9455019, "
+ + "-95.1964657 39.9113444, -95.1460439 39.9142017, -95.1316877
39.8855881, "
+ + "-95.087643 39.8717975, -95.0389987 39.8749063, -95.0146232
39.9088422, "
+ + "-94.9403146 39.906409, -94.9183761 39.8846514, -94.9329504
39.8578468, "
+ + "-94.8824331 39.8409102, -94.8675709 39.8227528, -94.878404
39.787242, "
+ + "-94.9266292 39.7786779, -94.9070076 39.7679231, -94.8631596
39.779774, "
+ + "-94.8497103 39.7604914, -94.8545514 39.7397163, -94.8985678
39.7150641, "
+ + "-94.9584897 39.7331377, -94.9637588 39.6814526, -95.0187723
39.6615712, "
+ + "-95.0456083 39.6252182, -95.0430365 39.5826542, -95.0650104
39.5677387, "
+ + "-95.0985514 39.570063, -95.1012286 39.5462821, -95.0459187
39.5064755, "
+ + "-95.0320969 39.4709074, -94.976457 39.4475392, -94.9362514
39.3964717, "
+ + "-94.8781205 39.3949417, -94.8733156 39.3663291, -94.9014695
39.3495654, "
+ + "-94.8963639 39.3135051, -94.8237994 39.2609956, -94.8194328
39.2178517, "
+ + "-94.7789019 39.214907, -94.7398645 39.1789812, -94.6785238
39.1931279, "
+ + "-94.6533801 39.1816662, -94.6517728 39.1640754, -94.605515
39.1696807, "
+ + "-94.5791896 39.1504025, -94.5983384 39.1134256, -94.6095667
36.9948123, "
+ + "-102.045765 36.9847897, -102.060778 39.9999603))";
+ Geometry input = geomFromWKT(wkt, 0);
+ Long[] cellIds = Functions.s2CellIDs(input, 12);
+ Geometry[] polygons =
+
Functions.s2ToGeom(Arrays.stream(cellIds).mapToLong(Long::longValue).toArray());
+ Geometry coverage = Functions.union(polygons);
+ if (!coverage.covers(input)) {
+ // Only compute the (expensive, ~50k-cell) difference on failure, to
surface a useful
+ // diagnostic. Happy path skips the full geometric union/difference work.
Review Comment:
The comment says the happy path “skips the full geometric union/difference
work”, but the union is computed unconditionally just above; only the
`difference` is skipped. Please reword this to avoid implying that the union is
avoided on success.
```suggestion
// diagnostic. The happy path still computes the union above, but
skips this difference.
```
##########
common/src/main/java/org/apache/sedona/common/utils/S2Utils.java:
##########
@@ -107,13 +107,218 @@ public static Polygon toJTSPolygon(S2CellId cellId) {
return new GeometryFactory().createPolygon(coords);
}
+ /**
+ * Convert a JTS planar geometry into an S2Region whose lat/lng projection
is guaranteed to
+ * contain the input geometry.
+ *
+ * <p>Why a buffer is needed: Sedona geometries are planar — an edge between
two vertices is a
+ * straight line in (lng, lat) space — but S2 connects the same vertices
with a great-circle arc
+ * on the sphere. The two interpretations agree at the vertices but diverge
along the edges (e.g.
+ * the great-circle arc between two points at the same northern latitude
bulges northward, leaving
+ * the parallel that would form the planar chord). If we hand the JTS
vertices to S2 directly, the
+ * spherical polygon's interior is *smaller* than the planar polygon's
interior along
+ * non-meridional edges, so the S2 covering misses thin slivers of the
original planar polygon
+ * (see GH-2857).
+ *
+ * <p>To compensate, we JTS-buffer the input by an upper bound on the
worst-case great-circle
+ * deviation before converting to S2. A side effect for {@link LineString}
inputs is that the
+ * buffer turns the line into a polygon corridor; downstream callers
therefore see cells in a thin
+ * strip around the line rather than only cells the line geometrically
traverses.
+ */
public static S2Region toS2Region(Geometry geom) throws
IllegalArgumentException {
- if (geom instanceof Polygon) {
- return S2Utils.toS2Polygon((Polygon) geom);
- } else if (geom instanceof LineString) {
- return S2Utils.toS2PolyLine((LineString) geom);
+ if (!(geom instanceof Polygon) && !(geom instanceof LineString)) {
+ throw new IllegalArgumentException(
+ "only object of Polygon, LinearRing, LineString type can be
converted to S2Region");
+ }
+ // JTS planar buffer doesn't understand antimeridian crossing — for inputs
whose
+ // envelope spans more than 180° in longitude (which only happens when the
input
+ // straddles the antimeridian, since real-world inputs don't span half the
globe in
+ // lng), buffering produces a polygon that goes the wrong way around the
globe and
+ // explodes the S2 covering. Skip the buffer for those inputs; the
original GH-2857
+ // miscoverage only affects edges that go the "long" way in (lng, lat)
space, which
+ // these inputs by definition do not.
+ boolean spansAntimeridian = geom.getEnvelopeInternal().getWidth() > 180.0;
+ double eps = spansAntimeridian ? 0.0 : arcChordBufferDegrees(geom);
+ Geometry buffered = (eps > 0) ? geom.buffer(eps) : geom;
Review Comment:
The antimeridian detection/comment is based on `envelope.getWidth() > 180`,
but that condition can also be true for geometries that simply span a very wide
longitude range without actually straddling the antimeridian. In that case
`eps` is forced to 0 and the buffer/containment fix is skipped, potentially
reintroducing under-coverage for long non-meridional edges. Consider detecting
antimeridian crossing per-edge (e.g., any segment with |Δlng| > 180 after
wrapping) or otherwise updating the heuristic/comment so it doesn’t assert that
width>180 “only happens” for antimeridian straddles.
##########
common/src/test/java/org/apache/sedona/common/FunctionsTest.java:
##########
@@ -1081,6 +1081,163 @@ public void testS2ToGeom() {
assertTrue(polygons[100].intersects(target));
}
+ @Test
+ public void testS2CoverageContainsInput() throws ParseException {
+ String wkt =
+ "POLYGON ((-102.060778 39.9999603, -102.0535384 40.0119065, -101.98532
40.0122906, "
+ + "-95.30829 40.009008, -95.2456364 39.9564784, -95.1982467
39.9455019, "
+ + "-95.1964657 39.9113444, -95.1460439 39.9142017, -95.1316877
39.8855881, "
+ + "-95.087643 39.8717975, -95.0389987 39.8749063, -95.0146232
39.9088422, "
+ + "-94.9403146 39.906409, -94.9183761 39.8846514, -94.9329504
39.8578468, "
+ + "-94.8824331 39.8409102, -94.8675709 39.8227528, -94.878404
39.787242, "
+ + "-94.9266292 39.7786779, -94.9070076 39.7679231, -94.8631596
39.779774, "
+ + "-94.8497103 39.7604914, -94.8545514 39.7397163, -94.8985678
39.7150641, "
+ + "-94.9584897 39.7331377, -94.9637588 39.6814526, -95.0187723
39.6615712, "
+ + "-95.0456083 39.6252182, -95.0430365 39.5826542, -95.0650104
39.5677387, "
+ + "-95.0985514 39.570063, -95.1012286 39.5462821, -95.0459187
39.5064755, "
+ + "-95.0320969 39.4709074, -94.976457 39.4475392, -94.9362514
39.3964717, "
+ + "-94.8781205 39.3949417, -94.8733156 39.3663291, -94.9014695
39.3495654, "
+ + "-94.8963639 39.3135051, -94.8237994 39.2609956, -94.8194328
39.2178517, "
+ + "-94.7789019 39.214907, -94.7398645 39.1789812, -94.6785238
39.1931279, "
+ + "-94.6533801 39.1816662, -94.6517728 39.1640754, -94.605515
39.1696807, "
+ + "-94.5791896 39.1504025, -94.5983384 39.1134256, -94.6095667
36.9948123, "
+ + "-102.045765 36.9847897, -102.060778 39.9999603))";
+ Geometry input = geomFromWKT(wkt, 0);
+ Long[] cellIds = Functions.s2CellIDs(input, 12);
+ Geometry[] polygons =
+
Functions.s2ToGeom(Arrays.stream(cellIds).mapToLong(Long::longValue).toArray());
+ Geometry coverage = Functions.union(polygons);
+ if (!coverage.covers(input)) {
Review Comment:
This test always computes `coverage = Functions.union(polygons)` where
`polygons` can be ~50k cell geometries at level 12, and several other new tests
do the same. `Functions.union(Geometry[])` delegates to
`GeometryCollection.union()`, which is typically expensive for tens of
thousands of polygons and can significantly slow/flake CI. Consider replacing
the full union with a cheaper containment strategy (e.g., spatial index over
cell polygons + point/segment sampling on the input), or at least gating the
union behind a faster preliminary check so the heavy union work only runs when
needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]