jiayuasu opened a new pull request, #2900: URL: https://github.com/apache/sedona/pull/2900
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes part of #2700. ## What changes were proposed in this PR? Continues the docker-image notebook refresh series (issue #2700, milestone 1.9.1). Adds the first notebook in the series that mixes raster algebra, raster→vector zonal aggregation, and vector-on-vector distance joins in one pipeline — the workflow GeoPandas alone can't do. `docs/usecases/03-fire-risk-fusion.ipynb` answers: > **Given a county's terrain steepness, fuel load, building footprints, and road network, score every building for wildfire risk weighted by distance from the nearest evacuation route.** End-to-end: 1. SedonaContext setup. 2. Synthesize `slope.tif` + `fuel.tif` (256×256 single-band float32 tiled GeoTIFFs in `/tmp/fire-risk/`). Slope highest in the east, fuel highest in the north. 3. Load both with `sedona.read.format("raster")` (auto-tiling, GH-2672), keep `(x, y)` tile-index columns, join on those, compute composite risk via two-raster `RS_MapAlgebra` (`0.5 * slope + 0.5 * fuel`). The same SQL works for single-tile inputs and for multi-tile DEM-sized scenes. 4. Build a 4×4 grid of building polygons + two bisector `LINESTRING` roads as Spark DataFrames. 5. Compute each building's distance to its nearest road via `MIN(ST_DistanceSpheroid)` over the building × road cross product (metres regardless of EPSG:4326 lon/lat units). 6. Score: `RS_ZonalStats(composite, footprint, 'mean') × (1 + min(dist_km, 5) / 5)`. Multiplicative form means a building only ranks high when it has *both* high terrain risk and poor road access. 7. Rank, write top-5 as GeoParquet 1.1 (auto covering-bbox + projjson), round-trip read back to verify. 8. matplotlib panel: composite risk as basemap, building footprints filled by `risk_score` (red = high), roads overlaid, top-5 buildings labelled. Built-in ground truth: slope-east + fuel-north synthesis means corner buildings carry the highest composite risk; the multiplicative evacuation factor then favours buildings far from the bisector roads. **B33 (NE corner) should rank top**, which the harness confirms. Notebook is structured as numbered markdown sections (`## 1.` through `## 7.`), matching the convention from the prior notebooks. Notebook intro flags `**Requires Sedona ≥ 1.9.0**` for the auto-tiling raster reader. No new data shipped. No network required. ## How was this patch tested? End-to-end through the local mirror of `docker/test-notebooks.sh` (matched docker stack: Python 3.10, `pyspark==4.0.1`, `apache-sedona==1.9.0`, JDK 17, `local[*]`, `DRIVER_MEM=4g`, Sedona JAR via `PYSPARK_SUBMIT_ARGS` Maven coords). ``` PASS 03-fire-risk-fusion 19s elapsed ``` Output sanity-checked: | bid | mean_risk | dist_km | risk_score | |---|---|---|---| | **B33** | 0.8772 | 5.21 | **1.7543** | | B23 | 0.7511 | 4.29 | 1.396 | | B32 | 0.7524 | 3.42 | 1.2675 | | B13 | 0.6218 | 4.29 | 1.1558 | | B31 | 0.6226 | 3.42 | 1.0489 | | … | | | | | B00 | 0.1244 | 5.21 | 0.2488 | Top-ranked building **B33** (north-east corner) matches the synthesis design; the entire NE quadrant clusters at the top of the ranking; B00 (SW corner) ranks bottom with mean_risk≈0.12 (lowest slope + lowest fuel). GeoParquet top-5 round-trip read back identical rows. The Docker-build CI workflow (path-filter widening landed in #2889) will run on this PR and execute `test-notebooks.sh` against the built image, so the in-container PASS line lands directly in CI. ## Did this PR include necessary documentation updates? - The notebook is itself the documentation; intro markdown calls out `**Requires Sedona ≥ 1.9.0**` and explains both the multi-tile join pattern and the multiplicative score rationale. - No new data shipped, so no `docs/usecases/data/README.md` updates. - No public API changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
