jiayuasu opened a new pull request, #2889:
URL: https://github.com/apache/sedona/pull/2889

   ## Summary
   
   Issue: #2700. Milestone: 1.9.1. Follows #2876 and #2879.
   
   Adds the next workflow notebook in the docker-image refresh series. The 
`sedona.spark.geopandas` package mirrors the public GeoPandas API and runs on 
top of `pyspark.pandas` / Spark; this notebook shows what the *scale up your 
geopandas script with Sedona* path actually looks like end-to-end.
   
   Workflow on the Natural Earth countries shapefile **already shipped** with 
the docker image — no new data, no network:
   
   1. SedonaContext setup, including `spark.sql.ansi.enabled=false` 
(pyspark.pandas, the backend for `sedona.spark.geopandas`, does not yet 
tolerate Spark 4.x ANSI mode).
   2. `read_file(..., format="shapefile")` — drop-in replacement for 
`geopandas.read_file`.
   3. Vanilla GeoPandas idioms: boolean filtering by `CONTINENT`, `.geometry`, 
`.centroid`, `.convex_hull`, `.area`, `.total_bounds`.
   4. Voronoi catchments via SQL aggregator: 
`ST_VoronoiPolygons(ST_Collect_Agg(ST_Centroid(geometry)))`. Calls out that 
`GeoSeries.voronoi_polygons()` runs *per row*, which is wrong for "one diagram 
from many points".
   5. `clip_by_rect(xmin, ymin, xmax, ymax)` (new in 1.9) to crop the Voronoi 
result to a continental bbox.
   6. `to_geopandas()` round-trip + `matplotlib` for the final plot.
   7. `<gdf>.spark.frame()` to drop into SQL on the same dataframe — uses 
`ST_DistanceSpheroid` for "closest African capital to (0°N, 0°E)".
   
   Notebook is structured as numbered markdown sections (`## 1.` through `## 
7.`) with the prose in markdown and the code cells minimal — matches the 
convention established for `01-mobility-pulse`.
   
   ## Verification
   
   End-to-end through the local mirror of `docker/test-notebooks.sh` (matched 
stack: Python 3.10, `pyspark==4.0.1`, `apache-sedona==1.9.0`, JDK 17, 
`DRIVER_MEM=4g`, `local[*]`, Sedona JAR via `PYSPARK_SUBMIT_ARGS` Maven coords).
   
   ```
   PASS  05-geopandas-on-spark  15s elapsed
   ```
   
   Output sanity-checked: São Tomé and Principe is the African country closest 
to (0°N, 0°E) at 750.1 km, with Ghana / Togo / Côte d'Ivoire next — 
geographically correct.
   
   ## Test plan
   
   - [ ] `docker build -f docker/sedona-docker.dockerfile -t sedona:dev .` 
succeeds.
   - [ ] `docker run --rm sedona:dev /opt/sedona/docker/test-notebooks.sh` 
exits 0 — every shipped notebook (00, 01, 05) passes.
   - [ ] `docker run --rm -e SEDONA_NOTEBOOK_OFFLINE=1 sedona:dev 
/opt/sedona/docker/test-notebooks.sh` exits 0 — 05 runs (no network), 01 is 
skipped.
   - [ ] Manual smoke: open in JupyterLab, run all cells, eyeball the 
matplotlib figure showing Africa with Voronoi catchments overlay.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to