jiayuasu opened a new pull request, #2770:
URL: https://github.com/apache/sedona/pull/2770

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes #2768
   
   ## What changes were proposed in this PR?
   
   Several GeoSeries methods use `len(self) == 0` as an early-return guard for 
empty input. Under the hood, `len()` on a Pandas-on-Spark Series calls 
`DataFrame.count()`, which triggers a full Spark scan of all rows.
   
   This PR adds a private `_is_empty()` helper method that uses 
`self._internal.spark_frame.take(1)` instead, which short-circuits after 
finding a single row rather than counting all rows.
   
   All 6 occurrences of `len(self) == 0` in `geoseries.py` are replaced:
   
   - `crs` (getter)
   - `build_area()`
   - `polygonize()`
   - `union_all()`
   - `intersection_all()`
   - `total_bounds`
   
   ## How was this patch tested?
   
   Existing tests for all affected methods (`test_build_area`, 
`test_polygonize`, `test_union_all`, `test_intersection_all`, 
`test_total_bounds`, `test_crs`, `test_empty_list`) were run and pass.
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public API so no need to change the 
documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to