zhangfengcdt opened a new pull request, #2831:
URL: https://github.com/apache/sedona/pull/2831

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes 
#<issue_number>
   
   ## What changes were proposed in this PR?
   
   Implements WKB-based Geography serialization (Option B: WKB with Cached S2) 
and a full set of Geography ST functions.
   
   **Core architecture:**
     - WKBGeography — stores WKB bytes as primary representation with 
lazy-parsed JTS, S2, and ShapeIndex caches (double-checked locking for thread 
safety)
     - GeographyWKBSerializer — WKB serializer with 0xFF format byte, 
backward-compatible with legacy S2-native format
     - GeographyUDT, implicits.scala, GeometrySerde — switched to WKBSerializer 
for all serialization paths
   
   **Geography functions (13 new):**
     - Level 1 (JTS): ST_AsText, ST_NPoints, ST_GeometryType, ST_NumGeometries, 
ST_Centroid
     - Level 2 (JTS + Spheroid): ST_Distance, ST_Area, ST_Length
     - Level 3 (S2): ST_MaxDistance, ST_ClosestPoint, ST_Contains, 
ST_Intersects, ST_Equals
   
   **Performance:**
     - ST_Distance uses S2ClosestEdgeQuery for true geometry-to-geometry 
distance (consistent with sedona-db)
     - ShapeIndex cached in WKBGeography — 2-6x faster for repeated S2 
operations
     - Configurable spark.sedona.geography.eagerShapeIndex for predicate-heavy 
workloads
     - JMH benchmark module with 4 benchmark classes (single-call, serializer 
comparison, GeoParquet scenario, batch processing)
   
   **Docs**: API docs for all 13 new functions in docs/api/sql/geography/
   
   **Note**: Geography-aware spatial join partitioning using S2 cells will be 
in a separate PR
   
   ## How was this patch tested?
   
     - 1032 unit tests pass in common module (28 new in WKBGeographyTest, 24 in 
FunctionTest)
     - GeographyFunctionTest.scala — 34 Spark SQL integration tests covering 
constructors, structural functions, metrics, predicates, DataFrame API, and 
serialization round-trips
     - JMH benchmarks verified across point, linestring, polygon (16/64/500 
vertices) with GeoParquet scenario showing zero performance penalty vs 
S2-parse-from-WKB path
   
   ## Did this PR include necessary documentation updates?
   
   - Yes, I have updated the documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to