jiayuasu opened a new pull request, #2889: URL: https://github.com/apache/sedona/pull/2889
## Summary Issue: #2700. Milestone: 1.9.1. Follows #2876 and #2879. Adds the next workflow notebook in the docker-image refresh series. The `sedona.spark.geopandas` package mirrors the public GeoPandas API and runs on top of `pyspark.pandas` / Spark; this notebook shows what the *scale up your geopandas script with Sedona* path actually looks like end-to-end. Workflow on the Natural Earth countries shapefile **already shipped** with the docker image — no new data, no network: 1. SedonaContext setup, including `spark.sql.ansi.enabled=false` (pyspark.pandas, the backend for `sedona.spark.geopandas`, does not yet tolerate Spark 4.x ANSI mode). 2. `read_file(..., format="shapefile")` — drop-in replacement for `geopandas.read_file`. 3. Vanilla GeoPandas idioms: boolean filtering by `CONTINENT`, `.geometry`, `.centroid`, `.convex_hull`, `.area`, `.total_bounds`. 4. Voronoi catchments via SQL aggregator: `ST_VoronoiPolygons(ST_Collect_Agg(ST_Centroid(geometry)))`. Calls out that `GeoSeries.voronoi_polygons()` runs *per row*, which is wrong for "one diagram from many points". 5. `clip_by_rect(xmin, ymin, xmax, ymax)` (new in 1.9) to crop the Voronoi result to a continental bbox. 6. `to_geopandas()` round-trip + `matplotlib` for the final plot. 7. `<gdf>.spark.frame()` to drop into SQL on the same dataframe — uses `ST_DistanceSpheroid` for "closest African capital to (0°N, 0°E)". Notebook is structured as numbered markdown sections (`## 1.` through `## 7.`) with the prose in markdown and the code cells minimal — matches the convention established for `01-mobility-pulse`. ## Verification End-to-end through the local mirror of `docker/test-notebooks.sh` (matched stack: Python 3.10, `pyspark==4.0.1`, `apache-sedona==1.9.0`, JDK 17, `DRIVER_MEM=4g`, `local[*]`, Sedona JAR via `PYSPARK_SUBMIT_ARGS` Maven coords). ``` PASS 05-geopandas-on-spark 15s elapsed ``` Output sanity-checked: São Tomé and Principe is the African country closest to (0°N, 0°E) at 750.1 km, with Ghana / Togo / Côte d'Ivoire next — geographically correct. ## Test plan - [ ] `docker build -f docker/sedona-docker.dockerfile -t sedona:dev .` succeeds. - [ ] `docker run --rm sedona:dev /opt/sedona/docker/test-notebooks.sh` exits 0 — every shipped notebook (00, 01, 05) passes. - [ ] `docker run --rm -e SEDONA_NOTEBOOK_OFFLINE=1 sedona:dev /opt/sedona/docker/test-notebooks.sh` exits 0 — 05 runs (no network), 01 is skipped. - [ ] Manual smoke: open in JupyterLab, run all cells, eyeball the matplotlib figure showing Africa with Voronoi catchments overlay. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
