alexanderdeseranno-tomtom opened a new issue, #2857:
URL: https://github.com/apache/sedona/issues/2857

   ## Expected behavior
   
   For any polygon `g` and S2 level `L`, unioning the geometries returned by
   
   ```sql
   ST_S2ToGeom(ST_S2CellIDs(g, L))
   ```
   
   should produce a region that fully contains `g`. An S2 covering is allowed 
to overshoot (include area outside `g`) but must never undershoot. Every point 
inside `g` must be inside at least one returned cell geometry.
   
   ## Actual behavior
   
   At level 12 the union of the reconstructed cell geometries is missing part 
of the input polygon. 
   ```
   POLYGON ((-102.060778 39.9999603, -102.0535384 40.0119065, -101.98532 
40.0122906, -95.30829 40.009008, -95.2456364 39.9564784, -95.1982467 
39.9455019, -95.1964657 39.9113444, -95.1460439 39.9142017, -95.1316877 
39.8855881, -95.087643 39.8717975, -95.0389987 39.8749063, -95.0146232 
39.9088422, -94.9403146 39.906409, -94.9183761 39.8846514, -94.9329504 
39.8578468, -94.8824331 39.8409102, -94.8675709 39.8227528, -94.878404 
39.787242, -94.9266292 39.7786779, -94.9070076 39.7679231, -94.8631596 
39.779774, -94.8497103 39.7604914, -94.8545514 39.7397163, -94.8985678 
39.7150641, -94.9584897 39.7331377, -94.9637588 39.6814526, -95.0187723 
39.6615712, -95.0456083 39.6252182, -95.0430365 39.5826542, -95.0650104 
39.5677387, -95.0985514 39.570063, -95.1012286 39.5462821, -95.0459187 
39.5064755, -95.0320969 39.4709074, -94.976457 39.4475392, -94.9362514 
39.3964717, -94.8781205 39.3949417, -94.8733156 39.3663291, -94.9014695 
39.3495654, -94.8963639 39.3135051, -94.8237994 39.2609956, -94.8194328
  39.2178517, -94.7789019 39.214907, -94.7398645 39.1789812, -94.6785238 
39.1931279, -94.6533801 39.1816662, -94.6517728 39.1640754, -94.605515 
39.1696807, -94.5791896 39.1504025, -94.5983384 39.1134256, -94.6095667 
36.9948123, -102.045765 36.9847897, -102.060778 39.9999603))
   ```
   
   ## Measured results (Sedona 1.8.1)
   
   | Metric                           | Value               |
   | -------------------------------- | ------------------- |
   | S2 level                         | 12                  |
   | Cells returned by `ST_S2CellIDs` | 49,086              |
   | Original polygon area            | 22.1941 deg²        |
   | Uncovered area                   | **0.20349708 deg²** |
   | Uncovered fraction               | **~0.917%**         |
   | `coverage.covers(input)`         | **false**           |
   
   The uncovered slivers are concentrated along the southern boundary
   
   <img width="1295" height="455" alt="Image" 
src="https://github.com/user-attachments/assets/af241f77-0eb1-4859-92d8-1c9cc814c55d";
 />
   
   ## Reproducer
   
   ### `pom.xml`
   
   ```xml
   <?xml version="1.0" encoding="UTF-8"?>
   <project xmlns="http://maven.apache.org/POM/4.0.0";>
       <modelVersion>4.0.0</modelVersion>
       <groupId>indexingtest</groupId>
       <artifactId>sedona-s2-coverage-bug</artifactId>
       <version>1.0.0-SNAPSHOT</version>
   
       <properties>
           <maven.compiler.source>11</maven.compiler.source>
           <maven.compiler.target>11</maven.compiler.target>
           <scala.binary.version>2.12</scala.binary.version>
           <scala.version>2.12.18</scala.version>
           <spark.version>3.5.3</spark.version>
           <sedona.version>1.8.1</sedona.version>
           <scalatest.version>3.2.19</scalatest.version>
       </properties>
   
       <dependencies>
           <dependency>
               <groupId>org.scala-lang</groupId>
               <artifactId>scala-library</artifactId>
               <version>${scala.version}</version>
           </dependency>
           <dependency>
               <groupId>org.apache.spark</groupId>
               <artifactId>spark-sql_${scala.binary.version}</artifactId>
               <version>${spark.version}</version>
           </dependency>
           <dependency>
               <groupId>org.apache.sedona</groupId>
               
<artifactId>sedona-spark-shaded-3.5_${scala.binary.version}</artifactId>
               <version>${sedona.version}</version>
           </dependency>
           <dependency>
               <groupId>org.scalatest</groupId>
               <artifactId>scalatest_${scala.binary.version}</artifactId>
               <version>${scalatest.version}</version>
               <scope>test</scope>
           </dependency>
       </dependencies>
   
       <build>
           <testSourceDirectory>src/test/scala</testSourceDirectory>
           <plugins>
               <plugin>
                   <groupId>net.alchim31.maven</groupId>
                   <artifactId>scala-maven-plugin</artifactId>
                   <version>4.9.2</version>
                   
<executions><execution><goals><goal>testCompile</goal></goals></execution></executions>
                   
<configuration><scalaVersion>${scala.version}</scalaVersion></configuration>
               </plugin>
               <plugin>
                   <groupId>org.apache.maven.plugins</groupId>
                   <artifactId>maven-surefire-plugin</artifactId>
                   <version>3.2.5</version>
                   <configuration><skipTests>true</skipTests></configuration>
               </plugin>
               <plugin>
                   <groupId>org.scalatest</groupId>
                   <artifactId>scalatest-maven-plugin</artifactId>
                   <version>2.2.0</version>
                   <configuration>
                       <argLine>
                           --add-opens=java.base/java.lang=ALL-UNNAMED
                           --add-opens=java.base/java.lang.invoke=ALL-UNNAMED
                           --add-opens=java.base/java.lang.reflect=ALL-UNNAMED
                           --add-opens=java.base/java.io=ALL-UNNAMED
                           --add-opens=java.base/java.net=ALL-UNNAMED
                           --add-opens=java.base/java.nio=ALL-UNNAMED
                           --add-opens=java.base/java.util=ALL-UNNAMED
                           
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
                           
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
                           --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
                           --add-opens=java.base/sun.nio.cs=ALL-UNNAMED
                           --add-opens=java.base/sun.security.action=ALL-UNNAMED
                           --add-opens=java.base/sun.util.calendar=ALL-UNNAMED
                       </argLine>
                   </configuration>
                   
<executions><execution><id>test</id><goals><goal>test</goal></goals></execution></executions>
               </plugin>
           </plugins>
       </build>
   </project>
   ```
   
   ### `src/test/scala/indexingtest/S2CoverageBugSpec.scala`
   
   ```scala
   package indexingtest
   
   import org.apache.sedona.spark.SedonaContext
   import org.apache.spark.sql.SparkSession
   import org.locationtech.jts.geom.Geometry
   import org.locationtech.jts.io.WKTReader
   import org.scalatest.BeforeAndAfterAll
   import org.scalatest.funsuite.AnyFunSuite
   
   class S2CoverageBugSpec extends AnyFunSuite with BeforeAndAfterAll {
   
     private val wkt: String =
       "POLYGON ((-102.060778 39.9999603, -102.0535384 40.0119065, -101.98532 
40.0122906, " +
         "-95.30829 40.009008, -95.2456364 39.9564784, -95.1982467 39.9455019, 
" +
         "-95.1964657 39.9113444, -95.1460439 39.9142017, -95.1316877 
39.8855881, " +
         "-95.087643 39.8717975, -95.0389987 39.8749063, -95.0146232 
39.9088422, " +
         "-94.9403146 39.906409, -94.9183761 39.8846514, -94.9329504 
39.8578468, " +
         "-94.8824331 39.8409102, -94.8675709 39.8227528, -94.878404 39.787242, 
" +
         "-94.9266292 39.7786779, -94.9070076 39.7679231, -94.8631596 
39.779774, " +
         "-94.8497103 39.7604914, -94.8545514 39.7397163, -94.8985678 
39.7150641, " +
         "-94.9584897 39.7331377, -94.9637588 39.6814526, -95.0187723 
39.6615712, " +
         "-95.0456083 39.6252182, -95.0430365 39.5826542, -95.0650104 
39.5677387, " +
         "-95.0985514 39.570063, -95.1012286 39.5462821, -95.0459187 
39.5064755, " +
         "-95.0320969 39.4709074, -94.976457 39.4475392, -94.9362514 
39.3964717, " +
         "-94.8781205 39.3949417, -94.8733156 39.3663291, -94.9014695 
39.3495654, " +
         "-94.8963639 39.3135051, -94.8237994 39.2609956, -94.8194328 
39.2178517, " +
         "-94.7789019 39.214907, -94.7398645 39.1789812, -94.6785238 
39.1931279, " +
         "-94.6533801 39.1816662, -94.6517728 39.1640754, -94.605515 
39.1696807, " +
         "-94.5791896 39.1504025, -94.5983384 39.1134256, -94.6095667 
36.9948123, " +
         "-102.045765 36.9847897, -102.060778 39.9999603))"
   
     private val s2Level: Int = 12
     private var spark: SparkSession = _
   
     override def beforeAll(): Unit = {
       val builder = SedonaContext.builder()
         .master("local[2]")
         .appName("s2-coverage-bug")
         .config("spark.sql.shuffle.partitions", "2")
         .config("spark.ui.enabled", "false")
       spark = SedonaContext.create(builder.getOrCreate())
     }
   
     override def afterAll(): Unit = {
       if (spark != null) spark.stop()
     }
   
     test(s"ST_S2CellIDs + ST_S2ToGeom at level $s2Level fully covers the input 
polygon") {
       val sql = spark
       import sql.implicits._
   
       Seq(wkt).toDF("wkt").createOrReplaceTempView("input")
   
       val coverageWkt: String = spark.sql(
         s"""
            |WITH g AS (SELECT ST_GeomFromWKT(wkt) AS geom FROM input),
            |cells AS (SELECT ST_S2CellIDs(geom, $s2Level) AS ids FROM g),
            |cell_geoms AS (SELECT explode(ST_S2ToGeom(ids)) AS cell_geom FROM 
cells)
            |SELECT ST_AsText(ST_Union_Aggr(cell_geom)) AS coverage_wkt FROM 
cell_geoms
            |""".stripMargin).collect().head.getString(0)
   
       val reader = new WKTReader()
       val original: Geometry = reader.read(wkt)
       val coverage: Geometry = reader.read(coverageWkt)
   
       val uncovered = original.difference(coverage)
       val uncoveredArea = uncovered.getArea
       val uncoveredFraction = uncoveredArea / original.getArea
       val covers = coverage.covers(original)
   
       println(s"Uncovered area:     $uncoveredArea deg^2")
       println(s"Uncovered fraction: $uncoveredFraction")
       println(s"covers(input):      $covers")
   
       assert(covers,
         f"Coverage does NOT contain the input. Missing $uncoveredArea%.8f 
deg^2 (${uncoveredFraction * 100}%.6f%%).")
     }
   }
   ```
   
   Test fails with:
   
   ```
   Coverage does NOT contain the input. Missing 0.20349708 deg^2 (0.916897%).
   ```
   
   ## Environment
   
   - `sedona-spark-shaded-3.5_2.12`: reproduced on both `1.7.0` and `1.8.1`
   - Apache Spark: `3.5.3`
   - Scala: `2.12.18`
   - Java: OpenJDK 11
   - OS: Ubuntu 24.04 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to