alexanderdeseranno-tomtom opened a new issue, #2857: URL: https://github.com/apache/sedona/issues/2857
## Expected behavior For any polygon `g` and S2 level `L`, unioning the geometries returned by ```sql ST_S2ToGeom(ST_S2CellIDs(g, L)) ``` should produce a region that fully contains `g`. An S2 covering is allowed to overshoot (include area outside `g`) but must never undershoot. Every point inside `g` must be inside at least one returned cell geometry. ## Actual behavior At level 12 the union of the reconstructed cell geometries is missing part of the input polygon. ``` POLYGON ((-102.060778 39.9999603, -102.0535384 40.0119065, -101.98532 40.0122906, -95.30829 40.009008, -95.2456364 39.9564784, -95.1982467 39.9455019, -95.1964657 39.9113444, -95.1460439 39.9142017, -95.1316877 39.8855881, -95.087643 39.8717975, -95.0389987 39.8749063, -95.0146232 39.9088422, -94.9403146 39.906409, -94.9183761 39.8846514, -94.9329504 39.8578468, -94.8824331 39.8409102, -94.8675709 39.8227528, -94.878404 39.787242, -94.9266292 39.7786779, -94.9070076 39.7679231, -94.8631596 39.779774, -94.8497103 39.7604914, -94.8545514 39.7397163, -94.8985678 39.7150641, -94.9584897 39.7331377, -94.9637588 39.6814526, -95.0187723 39.6615712, -95.0456083 39.6252182, -95.0430365 39.5826542, -95.0650104 39.5677387, -95.0985514 39.570063, -95.1012286 39.5462821, -95.0459187 39.5064755, -95.0320969 39.4709074, -94.976457 39.4475392, -94.9362514 39.3964717, -94.8781205 39.3949417, -94.8733156 39.3663291, -94.9014695 39.3495654, -94.8963639 39.3135051, -94.8237994 39.2609956, -94.8194328 39.2178517, -94.7789019 39.214907, -94.7398645 39.1789812, -94.6785238 39.1931279, -94.6533801 39.1816662, -94.6517728 39.1640754, -94.605515 39.1696807, -94.5791896 39.1504025, -94.5983384 39.1134256, -94.6095667 36.9948123, -102.045765 36.9847897, -102.060778 39.9999603)) ``` ## Measured results (Sedona 1.8.1) | Metric | Value | | -------------------------------- | ------------------- | | S2 level | 12 | | Cells returned by `ST_S2CellIDs` | 49,086 | | Original polygon area | 22.1941 deg² | | Uncovered area | **0.20349708 deg²** | | Uncovered fraction | **~0.917%** | | `coverage.covers(input)` | **false** | The uncovered slivers are concentrated along the southern boundary <img width="1295" height="455" alt="Image" src="https://github.com/user-attachments/assets/af241f77-0eb1-4859-92d8-1c9cc814c55d" /> ## Reproducer ### `pom.xml` ```xml <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"> <modelVersion>4.0.0</modelVersion> <groupId>indexingtest</groupId> <artifactId>sedona-s2-coverage-bug</artifactId> <version>1.0.0-SNAPSHOT</version> <properties> <maven.compiler.source>11</maven.compiler.source> <maven.compiler.target>11</maven.compiler.target> <scala.binary.version>2.12</scala.binary.version> <scala.version>2.12.18</scala.version> <spark.version>3.5.3</spark.version> <sedona.version>1.8.1</sedona.version> <scalatest.version>3.2.19</scalatest.version> </properties> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.sedona</groupId> <artifactId>sedona-spark-shaded-3.5_${scala.binary.version}</artifactId> <version>${sedona.version}</version> </dependency> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_${scala.binary.version}</artifactId> <version>${scalatest.version}</version> <scope>test</scope> </dependency> </dependencies> <build> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>4.9.2</version> <executions><execution><goals><goal>testCompile</goal></goals></execution></executions> <configuration><scalaVersion>${scala.version}</scalaVersion></configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>3.2.5</version> <configuration><skipTests>true</skipTests></configuration> </plugin> <plugin> <groupId>org.scalatest</groupId> <artifactId>scalatest-maven-plugin</artifactId> <version>2.2.0</version> <configuration> <argLine> --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED </argLine> </configuration> <executions><execution><id>test</id><goals><goal>test</goal></goals></execution></executions> </plugin> </plugins> </build> </project> ``` ### `src/test/scala/indexingtest/S2CoverageBugSpec.scala` ```scala package indexingtest import org.apache.sedona.spark.SedonaContext import org.apache.spark.sql.SparkSession import org.locationtech.jts.geom.Geometry import org.locationtech.jts.io.WKTReader import org.scalatest.BeforeAndAfterAll import org.scalatest.funsuite.AnyFunSuite class S2CoverageBugSpec extends AnyFunSuite with BeforeAndAfterAll { private val wkt: String = "POLYGON ((-102.060778 39.9999603, -102.0535384 40.0119065, -101.98532 40.0122906, " + "-95.30829 40.009008, -95.2456364 39.9564784, -95.1982467 39.9455019, " + "-95.1964657 39.9113444, -95.1460439 39.9142017, -95.1316877 39.8855881, " + "-95.087643 39.8717975, -95.0389987 39.8749063, -95.0146232 39.9088422, " + "-94.9403146 39.906409, -94.9183761 39.8846514, -94.9329504 39.8578468, " + "-94.8824331 39.8409102, -94.8675709 39.8227528, -94.878404 39.787242, " + "-94.9266292 39.7786779, -94.9070076 39.7679231, -94.8631596 39.779774, " + "-94.8497103 39.7604914, -94.8545514 39.7397163, -94.8985678 39.7150641, " + "-94.9584897 39.7331377, -94.9637588 39.6814526, -95.0187723 39.6615712, " + "-95.0456083 39.6252182, -95.0430365 39.5826542, -95.0650104 39.5677387, " + "-95.0985514 39.570063, -95.1012286 39.5462821, -95.0459187 39.5064755, " + "-95.0320969 39.4709074, -94.976457 39.4475392, -94.9362514 39.3964717, " + "-94.8781205 39.3949417, -94.8733156 39.3663291, -94.9014695 39.3495654, " + "-94.8963639 39.3135051, -94.8237994 39.2609956, -94.8194328 39.2178517, " + "-94.7789019 39.214907, -94.7398645 39.1789812, -94.6785238 39.1931279, " + "-94.6533801 39.1816662, -94.6517728 39.1640754, -94.605515 39.1696807, " + "-94.5791896 39.1504025, -94.5983384 39.1134256, -94.6095667 36.9948123, " + "-102.045765 36.9847897, -102.060778 39.9999603))" private val s2Level: Int = 12 private var spark: SparkSession = _ override def beforeAll(): Unit = { val builder = SedonaContext.builder() .master("local[2]") .appName("s2-coverage-bug") .config("spark.sql.shuffle.partitions", "2") .config("spark.ui.enabled", "false") spark = SedonaContext.create(builder.getOrCreate()) } override def afterAll(): Unit = { if (spark != null) spark.stop() } test(s"ST_S2CellIDs + ST_S2ToGeom at level $s2Level fully covers the input polygon") { val sql = spark import sql.implicits._ Seq(wkt).toDF("wkt").createOrReplaceTempView("input") val coverageWkt: String = spark.sql( s""" |WITH g AS (SELECT ST_GeomFromWKT(wkt) AS geom FROM input), |cells AS (SELECT ST_S2CellIDs(geom, $s2Level) AS ids FROM g), |cell_geoms AS (SELECT explode(ST_S2ToGeom(ids)) AS cell_geom FROM cells) |SELECT ST_AsText(ST_Union_Aggr(cell_geom)) AS coverage_wkt FROM cell_geoms |""".stripMargin).collect().head.getString(0) val reader = new WKTReader() val original: Geometry = reader.read(wkt) val coverage: Geometry = reader.read(coverageWkt) val uncovered = original.difference(coverage) val uncoveredArea = uncovered.getArea val uncoveredFraction = uncoveredArea / original.getArea val covers = coverage.covers(original) println(s"Uncovered area: $uncoveredArea deg^2") println(s"Uncovered fraction: $uncoveredFraction") println(s"covers(input): $covers") assert(covers, f"Coverage does NOT contain the input. Missing $uncoveredArea%.8f deg^2 (${uncoveredFraction * 100}%.6f%%).") } } ``` Test fails with: ``` Coverage does NOT contain the input. Missing 0.20349708 deg^2 (0.916897%). ``` ## Environment - `sedona-spark-shaded-3.5_2.12`: reproduced on both `1.7.0` and `1.8.1` - Apache Spark: `3.5.3` - Scala: `2.12.18` - Java: OpenJDK 11 - OS: Ubuntu 24.04 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
