[PR] [GH-2700] Add 03-fire-risk-fusion notebook: raster + vector fusion [sedona]

via GitHub Sun, 03 May 2026 22:50:21 -0700


jiayuasu opened a new pull request, #2900:
URL: https://github.com/apache/sedona/pull/2900


   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes part 
of #2700.
   
   ## What changes were proposed in this PR?
   
   Continues the docker-image notebook refresh series (issue #2700, milestone 
1.9.1). Adds the first notebook in the series that mixes raster algebra, 
raster→vector zonal aggregation, and vector-on-vector distance joins in one 
pipeline — the workflow GeoPandas alone can't do.
   
   `docs/usecases/03-fire-risk-fusion.ipynb` answers:
   
   > **Given a county's terrain steepness, fuel load, building footprints, and 
road network, score every building for wildfire risk weighted by distance from 
the nearest evacuation route.**
   
   End-to-end:
   
   1. SedonaContext setup.
   2. Synthesize `slope.tif` + `fuel.tif` (256×256 single-band float32 tiled 
GeoTIFFs in `/tmp/fire-risk/`). Slope highest in the east, fuel highest in the 
north.
   3. Load both with `sedona.read.format("raster")` (auto-tiling, GH-2672), 
keep `(x, y)` tile-index columns, join on those, compute composite risk via 
two-raster `RS_MapAlgebra` (`0.5 * slope + 0.5 * fuel`). The same SQL works for 
single-tile inputs and for multi-tile DEM-sized scenes.
   4. Build a 4×4 grid of building polygons + two bisector `LINESTRING` roads 
as Spark DataFrames.
   5. Compute each building's distance to its nearest road via 
`MIN(ST_DistanceSpheroid)` over the building × road cross product (metres 
regardless of EPSG:4326 lon/lat units).
   6. Score: `RS_ZonalStats(composite, footprint, 'mean') × (1 + min(dist_km, 
5) / 5)`. Multiplicative form means a building only ranks high when it has 
*both* high terrain risk and poor road access.
   7. Rank, write top-5 as GeoParquet 1.1 (auto covering-bbox + projjson), 
round-trip read back to verify.
   8. matplotlib panel: composite risk as basemap, building footprints filled 
by `risk_score` (red = high), roads overlaid, top-5 buildings labelled.
   
   Built-in ground truth: slope-east + fuel-north synthesis means corner 
buildings carry the highest composite risk; the multiplicative evacuation 
factor then favours buildings far from the bisector roads. **B33 (NE corner) 
should rank top**, which the harness confirms.
   
   Notebook is structured as numbered markdown sections (`## 1.` through `## 
7.`), matching the convention from the prior notebooks. Notebook intro flags 
`**Requires Sedona ≥ 1.9.0**` for the auto-tiling raster reader.
   
   No new data shipped. No network required.
   
   ## How was this patch tested?
   
   End-to-end through the local mirror of `docker/test-notebooks.sh` (matched 
docker stack: Python 3.10, `pyspark==4.0.1`, `apache-sedona==1.9.0`, JDK 17, 
`local[*]`, `DRIVER_MEM=4g`, Sedona JAR via `PYSPARK_SUBMIT_ARGS` Maven coords).
   
   ```
   PASS  03-fire-risk-fusion  19s elapsed
   ```
   
   Output sanity-checked:
   
   | bid | mean_risk | dist_km | risk_score |
   |---|---|---|---|
   | **B33** | 0.8772 | 5.21 | **1.7543** |
   | B23 | 0.7511 | 4.29 | 1.396  |
   | B32 | 0.7524 | 3.42 | 1.2675 |
   | B13 | 0.6218 | 4.29 | 1.1558 |
   | B31 | 0.6226 | 3.42 | 1.0489 |
   | … | | | |
   | B00 | 0.1244 | 5.21 | 0.2488 |
   
   Top-ranked building **B33** (north-east corner) matches the synthesis 
design; the entire NE quadrant clusters at the top of the ranking; B00 (SW 
corner) ranks bottom with mean_risk≈0.12 (lowest slope + lowest fuel). 
GeoParquet top-5 round-trip read back identical rows.
   
   The Docker-build CI workflow (path-filter widening landed in #2889) will run 
on this PR and execute `test-notebooks.sh` against the built image, so the 
in-container PASS line lands directly in CI.
   
   ## Did this PR include necessary documentation updates?
   
   - The notebook is itself the documentation; intro markdown calls out 
`**Requires Sedona ≥ 1.9.0**` and explains both the multi-tile join pattern and 
the multiplicative score rationale.
   - No new data shipped, so no `docs/usecases/data/README.md` updates.
   - No public API changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [GH-2700] Add 03-fire-risk-fusion notebook: raster + vector fusion [sedona]

Reply via email to