jiayuasu opened a new pull request, #2896: URL: https://github.com/apache/sedona/pull/2896
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes part of #2700. ## What changes were proposed in this PR? Continues the docker-image notebook refresh series (issue #2700, milestone 1.9.1). Adds the first raster-pipeline notebook in the series. `docs/usecases/02-vegetation-change.ipynb` answers: > **Between two satellite scenes a season apart, which farm parcels in this AOI greened up the most?** End-to-end on Sedona's 1.9 raster surface: 1. SedonaContext setup. 2. Synthesize two 256×256 red+NIR GeoTIFFs in `/tmp/veg-change/` (uint16, EPSG:4326, tiled GeoTIFF). The "before" scene is mostly bare; the "after" scene has a circular field of vegetation in the south-west corner with elevated NIR. Written with `tiled=True, blockxsize=256, blockysize=256` because the Sedona raster reader rejects strip-based GeoTIFFs as "too thin". 3. Load both with `sedona.read.format("raster")` — the new auto-tiling reader (GH-2672, 1.9.0). 4. Single-raster `RS_MapAlgebra` to compute NDVI per scene. 5. Two-raster `RS_MapAlgebra` to compute the per-pixel ΔNDVI delta. 6. 4×4 synthetic parcel grid + `RS_ZonalStats(rast, geom, 'mean')` — the canonical raster→vector aggregation. 7. `RS_Clip` on the top-ranked parcel for a close-up. 8. `RS_AsCOG` (GH-2652, 1.9.0) round-trip through a Cloud-Optimized GeoTIFF; read back via the same `raster` reader to prove it's valid for cloud-hosted streaming. 9. Four-panel matplotlib visualization (NDVI before, NDVI after, ΔNDVI with parcel grid, top-parcel close-up). The synthesized greening pattern places its peak in parcel **P10**, which is what the workflow ranks top — built-in ground truth for the answer. Notebook is structured as numbered markdown sections (`## 1.` through `## 9.`), matching the convention from `01-mobility-pulse` and `05-geopandas-on-spark`. Notebook intro flags `**Requires Sedona ≥ 1.9.0.**` explicitly because the auto-tiling raster reader and `RS_AsCOG` are 1.9-only. No new data shipped. No network required. ## How was this patch tested? End-to-end through the local mirror of `docker/test-notebooks.sh` (matched docker stack: Python 3.10, `pyspark==4.0.1`, `apache-sedona==1.9.0`, JDK 17, `local[*]`, `DRIVER_MEM=4g`, Sedona JAR via `PYSPARK_SUBMIT_ARGS` Maven coords). ``` PASS 02-vegetation-change 13s elapsed ``` Output sanity-checked: top-greening parcel `P10` matches the synthesized field location; COG round-trip read-back as 65×65 REAL_64BITS as expected; all `RS_*` results have the right dimensions. Three real failure modes surfaced and were fixed during local verification before this commit: 1. macOS `/tmp` pollution intercepted Spark's directory listing for the input glob → use a dedicated `/tmp/veg-change/` subdir for the synthetic rasters. 2. The `raster` data source schema is `[rast, x, y, name]` (not `path`); derive the scene label from `name`. 3. Sedona's reader rejects strip-based GeoTIFFs as "too thin"; pass `tiled=True, blockxsize=256, blockysize=256` to `rasterio.open`. The CI Docker-build workflow (path-filter widening landed in #2889) will run on this PR — the `apache/sedona:latest` matrix leg builds the image with this notebook bundled and runs `test-notebooks.sh` against it, so the in-container PASS line lands in CI. ## Did this PR include necessary documentation updates? - The notebook is itself the documentation; intro markdown calls out `**Requires Sedona ≥ 1.9.0.**` and lists the gotchas (tiled GeoTIFF requirement, `name` not `path` in the schema). - No new data shipped, so no `docs/usecases/data/README.md` updates. - No public API changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
