Re: [PR] [GH-2700] Add 02-vegetation-change notebook: end-to-end raster workflow [sedona]

via GitHub Sun, 03 May 2026 10:23:07 -0700


Copilot commented on code in PR #2896:
URL: https://github.com/apache/sedona/pull/2896#discussion_r3178487457



##########
docs/usecases/02-vegetation-change.ipynb:
##########
@@ -0,0 +1,300 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!--\n",
+    " Licensed to the Apache Software Foundation (ASF) under one\n",
+    " or more contributor license agreements.  See the NOTICE file\n",
+    " distributed with this work for additional information\n",
+    " regarding copyright ownership.  The ASF licenses this file\n",
+    " to you under the Apache License, Version 2.0 (the\n",
+    " \"License\"); you may not use this file except in compliance\n",
+    " with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    " Unless required by applicable law or agreed to in writing,\n",
+    " software distributed under the License is distributed on an\n",
+    " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    " KIND, either express or implied.  See the License for the\n",
+    " specific language governing permissions and limitations\n",
+    " under the License.\n",
+    "-->\n",
+    "\n",
+    "# Did this farmland green up?\n",
+    "\n",
+    "A complete raster pipeline. We answer:\n",
+    "\n",
+    "> **Between two satellite scenes a season apart, which farm parcels in 
this AOI greened up the most?**\n",
+    "\n",
+    "End-to-end on Sedona's raster surface: load two GeoTIFFs through the new 
tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take 
their difference for a \"greening\" delta raster, run `RS_ZonalStats` per 
parcel, rank, clip the delta raster to the top parcel with `RS_Clip`, 
round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized 
GeoTIFF, and visualize the four stages as a single matplotlib figure.\n",
+    "\n",
+    "The two source scenes are **synthesized** at the top of the notebook (256 
\u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully 
reproducible and ships no new bytes. The same code path runs unchanged against 
real Sentinel-2 chips \u2014 only the raster filenames change.\n",
+    "\n",
+    "**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling 
GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook 
will fail on older versions of the docker image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Connect to Spark through SedonaContext"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sedona.spark import SedonaContext\n",
+    "\n",
+    "config = 
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+    "sedona = SedonaContext.create(config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Synthesize two scenes a season apart\n",
+    "\n",
+    "We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and 
near-infrared reflectance over the same AOI. The \"before\" scene is mostly 
bare; the \"after\" scene has a circular field of vegetation in the south-west 
corner with elevated NIR. Pixel values are scaled to the same 0-10000 
reflectance range as Sentinel-2 surface-reflectance products so the NDVI math 
is identical to the real-world case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "import os\n\nimport numpy as np\nimport rasterio\nfrom 
rasterio.transform import from_bounds\n\nWORK = 
\"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50, 
-91.00, 41.60)  # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform 
= from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs = 
np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2  # 
circular field\n\n\ndef synth_scene(green_strength):\n    red = (1500 + 200 * 
rng.standard_normal((H, W))).clip(0, 10000)\n    nir = (1800 + 200 * 
rng.standard_normal((H, W))).clip(0, 10000)\n    nir = np.where(field_mask, nir 
+ green_strength, nir)\n    return red.astype(\"uint16\"), 
nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200), 
(\"after\", 4000)]:\n    red, nir = synth_scene(strength)\n    with 
rasterio.open(\n        f\"{WORK}/scene_{name}.tif\",\n        \"w\",\n        
driver=\"GTiff\",\n        tiled=True,\n        blockxsize=256,\n   
      blockysize=256,\n        height=H,\n        width=W,\n        count=2,\n  
      dtype=\"uint16\",\n        crs=\"EPSG:4326\",\n        
transform=transform,\n    ) as dst:\n        dst.write(red, 1)\n        
dst.write(nir, 2)\n        dst.set_band_description(1, \"red\")\n        
dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\", 
f\"{WORK}/scene_after.tif\"):\n    print(f\"{f}: {os.path.getsize(f)} bytes\")"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Load both scenes with the `raster` data source\n",
+    "\n",
+    "`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and 
yields one `Raster`-typed row per tile, sidestepping Spark's 2 GB record-size 
ceiling on large GeoTIFFs. Our scenes are tiny so each yields a single tile, 
but the call shape is identical for multi-gigabyte inputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "scenes = (\n    sedona.read.format(\"raster\")\n    
.load(f\"{WORK}/scene_*.tif\")\n    .selectExpr(\"split(name, '\\\\\\\\.')[0] 
as scene\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)"

Review Comment:
   The notebook text says the `raster` data source call shape is identical for 
multi-gigabyte inputs, but the next steps drop the tile coordinates and later 
assume exactly one row per scene. For multi-tile GeoTIFFs, selecting only 
`scene, rast` makes it impossible to align tiles across scenes for the 
two-raster `RS_MapAlgebra` step and will make the later scalar subqueries fail. 
Consider keeping the `x`/`y` (tile indices) columns from the raster reader and 
describing (or implementing) a join on `(x, y)` when computing ΔNDVI so the 
example actually generalizes to tiled inputs.



##########
docs/usecases/02-vegetation-change.ipynb:
##########
@@ -0,0 +1,300 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!--\n",
+    " Licensed to the Apache Software Foundation (ASF) under one\n",
+    " or more contributor license agreements.  See the NOTICE file\n",
+    " distributed with this work for additional information\n",
+    " regarding copyright ownership.  The ASF licenses this file\n",
+    " to you under the Apache License, Version 2.0 (the\n",
+    " \"License\"); you may not use this file except in compliance\n",
+    " with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    " Unless required by applicable law or agreed to in writing,\n",
+    " software distributed under the License is distributed on an\n",
+    " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    " KIND, either express or implied.  See the License for the\n",
+    " specific language governing permissions and limitations\n",
+    " under the License.\n",
+    "-->\n",
+    "\n",
+    "# Did this farmland green up?\n",
+    "\n",
+    "A complete raster pipeline. We answer:\n",
+    "\n",
+    "> **Between two satellite scenes a season apart, which farm parcels in 
this AOI greened up the most?**\n",
+    "\n",
+    "End-to-end on Sedona's raster surface: load two GeoTIFFs through the new 
tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take 
their difference for a \"greening\" delta raster, run `RS_ZonalStats` per 
parcel, rank, clip the delta raster to the top parcel with `RS_Clip`, 
round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized 
GeoTIFF, and visualize the four stages as a single matplotlib figure.\n",
+    "\n",
+    "The two source scenes are **synthesized** at the top of the notebook (256 
\u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully 
reproducible and ships no new bytes. The same code path runs unchanged against 
real Sentinel-2 chips \u2014 only the raster filenames change.\n",
+    "\n",
+    "**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling 
GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook 
will fail on older versions of the docker image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Connect to Spark through SedonaContext"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sedona.spark import SedonaContext\n",
+    "\n",
+    "config = 
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+    "sedona = SedonaContext.create(config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Synthesize two scenes a season apart\n",
+    "\n",
+    "We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and 
near-infrared reflectance over the same AOI. The \"before\" scene is mostly 
bare; the \"after\" scene has a circular field of vegetation in the south-west 
corner with elevated NIR. Pixel values are scaled to the same 0-10000 
reflectance range as Sentinel-2 surface-reflectance products so the NDVI math 
is identical to the real-world case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "import os\n\nimport numpy as np\nimport rasterio\nfrom 
rasterio.transform import from_bounds\n\nWORK = 
\"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50, 
-91.00, 41.60)  # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform 
= from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs = 
np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2  # 
circular field\n\n\ndef synth_scene(green_strength):\n    red = (1500 + 200 * 
rng.standard_normal((H, W))).clip(0, 10000)\n    nir = (1800 + 200 * 
rng.standard_normal((H, W))).clip(0, 10000)\n    nir = np.where(field_mask, nir 
+ green_strength, nir)\n    return red.astype(\"uint16\"), 
nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200), 
(\"after\", 4000)]:\n    red, nir = synth_scene(strength)\n    with 
rasterio.open(\n        f\"{WORK}/scene_{name}.tif\",\n        \"w\",\n        
driver=\"GTiff\",\n        tiled=True,\n        blockxsize=256,\n   
      blockysize=256,\n        height=H,\n        width=W,\n        count=2,\n  
      dtype=\"uint16\",\n        crs=\"EPSG:4326\",\n        
transform=transform,\n    ) as dst:\n        dst.write(red, 1)\n        
dst.write(nir, 2)\n        dst.set_band_description(1, \"red\")\n        
dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\", 
f\"{WORK}/scene_after.tif\"):\n    print(f\"{f}: {os.path.getsize(f)} bytes\")"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Load both scenes with the `raster` data source\n",
+    "\n",
+    "`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and 
yields one `Raster`-typed row per tile, sidestepping Spark's 2 GB record-size 
ceiling on large GeoTIFFs. Our scenes are tiny so each yields a single tile, 
but the call shape is identical for multi-gigabyte inputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "scenes = (\n    sedona.read.format(\"raster\")\n    
.load(f\"{WORK}/scene_*.tif\")\n    .selectExpr(\"split(name, '\\\\\\\\.')[0] 
as scene\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Compute NDVI per scene with `RS_MapAlgebra`\n",
+    "\n",
+    "`RS_MapAlgebra(raster, pixelType, script)` runs a single-raster pixel 
script. The script syntax (`out[0] = ...; rast[i]` indexing) is documented 
[here](../api/sql/Raster-map-algebra.md). We coerce the result to `D` (double) 
so the negative side of NDVI survives."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scenes.createOrReplaceTempView(\"scenes\")\n",
+    "\n",
+    "ndvi = sedona.sql(\"\"\"\n",
+    "    SELECT scene,\n",
+    "           RS_MapAlgebra(\n",
+    "               rast, 'D',\n",
+    "               'out[0] = (rast[1] - rast[0]) / (rast[1] + rast[0] + 
0.000001);'\n",
+    "           ) AS rast\n",
+    "    FROM scenes\n",
+    "\"\"\")\n",
+    "ndvi.cache()\n",
+    "ndvi.createOrReplaceTempView(\"ndvi\")\n",
+    "ndvi.selectExpr(\"scene\", \"RS_BandPixelType(rast, 1) as dtype\").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Compute the greening delta raster\n",
+    "\n",
+    "The two-raster form `RS_MapAlgebra(rast0, rast1, pixelType, script, 
noDataValue)` lets us subtract the before-NDVI from the after-NDVI in a single 
pass. The result is a per-pixel \u0394NDVI layer where positive values mean 
\"this pixel got greener\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "delta = sedona.sql(\"\"\"\n",
+    "    SELECT RS_MapAlgebra(\n",
+    "               (SELECT rast FROM ndvi WHERE scene = 'scene_after'),\n",
+    "               (SELECT rast FROM ndvi WHERE scene = 'scene_before'),\n",
+    "               'D',\n",
+    "               'out[0] = rast0[0] - rast1[0];',\n",
+    "               -9999.0\n",
+    "           ) AS rast\n",

Review Comment:
   This ΔNDVI computation relies on scalar subqueries `(SELECT rast FROM ndvi 
WHERE scene = ...)`, which will error if either scene produces more than one 
tile/row (the normal case for large rasters). To make this robust, compute 
ΔNDVI per tile by joining the "after" and "before" NDVI DataFrames on tile 
coordinates (e.g., `x`,`y`) and run `RS_MapAlgebra(after.rast, before.rast, 
...)` row-by-row, or explicitly state that the rest of the notebook assumes a 
single-tile raster.



##########
docs/usecases/02-vegetation-change.ipynb:
##########
@@ -0,0 +1,300 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!--\n",
+    " Licensed to the Apache Software Foundation (ASF) under one\n",
+    " or more contributor license agreements.  See the NOTICE file\n",
+    " distributed with this work for additional information\n",
+    " regarding copyright ownership.  The ASF licenses this file\n",
+    " to you under the Apache License, Version 2.0 (the\n",
+    " \"License\"); you may not use this file except in compliance\n",
+    " with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    " Unless required by applicable law or agreed to in writing,\n",
+    " software distributed under the License is distributed on an\n",
+    " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    " KIND, either express or implied.  See the License for the\n",
+    " specific language governing permissions and limitations\n",
+    " under the License.\n",
+    "-->\n",
+    "\n",
+    "# Did this farmland green up?\n",
+    "\n",
+    "A complete raster pipeline. We answer:\n",
+    "\n",
+    "> **Between two satellite scenes a season apart, which farm parcels in 
this AOI greened up the most?**\n",
+    "\n",
+    "End-to-end on Sedona's raster surface: load two GeoTIFFs through the new 
tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take 
their difference for a \"greening\" delta raster, run `RS_ZonalStats` per 
parcel, rank, clip the delta raster to the top parcel with `RS_Clip`, 
round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized 
GeoTIFF, and visualize the four stages as a single matplotlib figure.\n",
+    "\n",
+    "The two source scenes are **synthesized** at the top of the notebook (256 
\u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully 
reproducible and ships no new bytes. The same code path runs unchanged against 
real Sentinel-2 chips \u2014 only the raster filenames change.\n",
+    "\n",
+    "**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling 
GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook 
will fail on older versions of the docker image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Connect to Spark through SedonaContext"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sedona.spark import SedonaContext\n",
+    "\n",
+    "config = 
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+    "sedona = SedonaContext.create(config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Synthesize two scenes a season apart\n",
+    "\n",
+    "We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and 
near-infrared reflectance over the same AOI. The \"before\" scene is mostly 
bare; the \"after\" scene has a circular field of vegetation in the south-west 
corner with elevated NIR. Pixel values are scaled to the same 0-10000 
reflectance range as Sentinel-2 surface-reflectance products so the NDVI math 
is identical to the real-world case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "import os\n\nimport numpy as np\nimport rasterio\nfrom 
rasterio.transform import from_bounds\n\nWORK = 
\"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50, 
-91.00, 41.60)  # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform 
= from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs = 
np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2  # 
circular field\n\n\ndef synth_scene(green_strength):\n    red = (1500 + 200 * 
rng.standard_normal((H, W))).clip(0, 10000)\n    nir = (1800 + 200 * 
rng.standard_normal((H, W))).clip(0, 10000)\n    nir = np.where(field_mask, nir 
+ green_strength, nir)\n    return red.astype(\"uint16\"), 
nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200), 
(\"after\", 4000)]:\n    red, nir = synth_scene(strength)\n    with 
rasterio.open(\n        f\"{WORK}/scene_{name}.tif\",\n        \"w\",\n        
driver=\"GTiff\",\n        tiled=True,\n        blockxsize=256,\n   
      blockysize=256,\n        height=H,\n        width=W,\n        count=2,\n  
      dtype=\"uint16\",\n        crs=\"EPSG:4326\",\n        
transform=transform,\n    ) as dst:\n        dst.write(red, 1)\n        
dst.write(nir, 2)\n        dst.set_band_description(1, \"red\")\n        
dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\", 
f\"{WORK}/scene_after.tif\"):\n    print(f\"{f}: {os.path.getsize(f)} bytes\")"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Load both scenes with the `raster` data source\n",
+    "\n",
+    "`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and 
yields one `Raster`-typed row per tile, sidestepping Spark's 2 GB record-size 
ceiling on large GeoTIFFs. Our scenes are tiny so each yields a single tile, 
but the call shape is identical for multi-gigabyte inputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "scenes = (\n    sedona.read.format(\"raster\")\n    
.load(f\"{WORK}/scene_*.tif\")\n    .selectExpr(\"split(name, '\\\\\\\\.')[0] 
as scene\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Compute NDVI per scene with `RS_MapAlgebra`\n",
+    "\n",
+    "`RS_MapAlgebra(raster, pixelType, script)` runs a single-raster pixel 
script. The script syntax (`out[0] = ...; rast[i]` indexing) is documented 
[here](../api/sql/Raster-map-algebra.md). We coerce the result to `D` (double) 
so the negative side of NDVI survives."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scenes.createOrReplaceTempView(\"scenes\")\n",
+    "\n",
+    "ndvi = sedona.sql(\"\"\"\n",
+    "    SELECT scene,\n",
+    "           RS_MapAlgebra(\n",
+    "               rast, 'D',\n",
+    "               'out[0] = (rast[1] - rast[0]) / (rast[1] + rast[0] + 
0.000001);'\n",
+    "           ) AS rast\n",
+    "    FROM scenes\n",
+    "\"\"\")\n",
+    "ndvi.cache()\n",
+    "ndvi.createOrReplaceTempView(\"ndvi\")\n",
+    "ndvi.selectExpr(\"scene\", \"RS_BandPixelType(rast, 1) as dtype\").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Compute the greening delta raster\n",
+    "\n",
+    "The two-raster form `RS_MapAlgebra(rast0, rast1, pixelType, script, 
noDataValue)` lets us subtract the before-NDVI from the after-NDVI in a single 
pass. The result is a per-pixel \u0394NDVI layer where positive values mean 
\"this pixel got greener\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "delta = sedona.sql(\"\"\"\n",
+    "    SELECT RS_MapAlgebra(\n",
+    "               (SELECT rast FROM ndvi WHERE scene = 'scene_after'),\n",
+    "               (SELECT rast FROM ndvi WHERE scene = 'scene_before'),\n",
+    "               'D',\n",
+    "               'out[0] = rast0[0] - rast1[0];',\n",
+    "               -9999.0\n",
+    "           ) AS rast\n",
+    "\"\"\")\n",
+    "delta.cache()\n",
+    "delta.createOrReplaceTempView(\"delta\")\n",
+    "delta.selectExpr(\n",
+    "    \"RS_BandPixelType(rast, 1) as dtype\",\n",
+    "    \"RS_Width(rast) as w\",\n",
+    "    \"RS_Height(rast) as h\",\n",
+    ").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Synthesize parcel polygons and run `RS_ZonalStats`\n",
+    "\n",
+    "We build a 4 \u00d7 4 grid of square parcels covering the AOI and compute 
the mean \u0394NDVI per parcel. `RS_ZonalStats(raster, zone, statType)` is the 
canonical raster\u2192vector aggregation: every pixel inside `zone` contributes 
to the statistic. Parcels are then ranked by mean \u0394NDVI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pyspark.sql import Row\n",
+    "\n",
+    "xmin, ymin, xmax, ymax = AOI\n",
+    "step_x = (xmax - xmin) / 4\n",
+    "step_y = (ymax - ymin) / 4\n",
+    "parcel_rows = []\n",
+    "for ix in range(4):\n",
+    "    for iy in range(4):\n",
+    "        x0, x1 = xmin + ix * step_x, xmin + (ix + 1) * step_x\n",
+    "        y0, y1 = ymin + iy * step_y, ymin + (iy + 1) * step_y\n",
+    "        wkt = f\"POLYGON(({x0} {y0}, {x1} {y0}, {x1} {y1}, {x0} {y1}, 
{x0} {y0}))\"\n",
+    "        parcel_rows.append(Row(parcel_id=f\"P{ix}{iy}\", wkt=wkt))\n",
+    "\n",
+    "parcels = sedona.createDataFrame(parcel_rows).selectExpr(\n",
+    "    \"parcel_id\", \"ST_GeomFromText(wkt) as geom\"\n",
+    ")\n",
+    "parcels.createOrReplaceTempView(\"parcels\")\n",
+    "\n",
+    "ranked = sedona.sql(\"\"\"\n",
+    "    SELECT p.parcel_id,\n",
+    "           ROUND(RS_ZonalStats(d.rast, p.geom, 'mean'), 4) AS 
mean_delta_ndvi\n",
+    "    FROM parcels p, delta d\n",
+    "    ORDER BY mean_delta_ndvi DESC\n",
+    "\"\"\")\n",
+    "ranked.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Clip the delta raster to the top-greening parcel\n",
+    "\n",
+    "`RS_Clip(raster, band, geom)` crops the raster to the polygon's extent. 
We pick the parcel with the highest mean \u0394NDVI and zoom in on its delta 
values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "top_id = ranked.first()[\"parcel_id\"]\n",
+    "print(f\"top-greening parcel: {top_id}\")\n",
+    "\n",
+    "clipped = sedona.sql(f\"\"\"\n",
+    "    SELECT RS_Clip(d.rast, 1, p.geom) AS rast\n",
+    "    FROM delta d, parcels p\n",
+    "    WHERE p.parcel_id = '{top_id}'\n",
+    "\"\"\")\n",
+    "clipped.cache()\n",
+    "clipped.createOrReplaceTempView(\"clipped\")\n",
+    "clipped.selectExpr(\n",
+    "    \"RS_Width(rast) as w\",\n",
+    "    \"RS_Height(rast) as h\",\n",
+    ").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 8. Round-trip through `RS_AsCOG`\n",
+    "\n",
+    "`RS_AsCOG` returns the raster as a binary Cloud-Optimized GeoTIFF byte 
array. We persist it to disk, then read it back with the same `raster` data 
source we used to load the input \u2014 proving the output is a valid GeoTIFF 
that downstream tools can stream from object storage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "cog_bytes = clipped.selectExpr(\"RS_AsCOG(rast) as 
cog\").first()[\"cog\"]\nwith open(f\"{WORK}/delta_topparcel_cog.tif\", \"wb\") 
as fh:\n    fh.write(cog_bytes)\nprint(f\"COG size: {len(cog_bytes):,} 
bytes\")\n\nroundtrip = 
sedona.read.format(\"raster\").load(f\"{WORK}/delta_topparcel_cog.tif\")\nroundtrip.selectExpr(\n
    \"RS_Width(rast) as w\",\n    \"RS_Height(rast) as h\",\n    
\"RS_BandPixelType(rast, 1) as dtype\",\n).show()"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9. Visualize the pipeline as a 4-panel figure\n",
+    "\n",
+    "Read the rasters back via `rasterio` and lay them out side by side: NDVI 
before, NDVI after, \u0394NDVI across the full AOI with parcel boundaries 
overlaid, and the top-parcel close-up."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "import matplotlib.patches as mpatches\nimport matplotlib.pyplot 
as plt\n\n\ndef ndvi_arr(path):\n    with rasterio.open(path) as ds:\n        
red = ds.read(1).astype(\"float32\")\n        nir = 
ds.read(2).astype(\"float32\")\n        return (nir - red) / (nir + red + 
1e-6)\n\n\nndvi_before = ndvi_arr(f\"{WORK}/scene_before.tif\")\nndvi_after = 
ndvi_arr(f\"{WORK}/scene_after.tif\")\ndelta_arr = ndvi_after - 
ndvi_before\nwith rasterio.open(f\"{WORK}/delta_topparcel_cog.tif\") as ds:\n   
 top_arr = ds.read(1)\n    top_extent = (ds.bounds.left, ds.bounds.right, 
ds.bounds.bottom, ds.bounds.top)\n\nfig, axes = plt.subplots(1, 4, figsize=(16, 
4))\nextent = (AOI[0], AOI[2], AOI[1], AOI[3])\naxes[0].imshow(ndvi_before, 
vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[0].set_title(\"NDVI 
before\")\naxes[1].imshow(ndvi_after, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", 
extent=extent)\naxes[1].set_title(\"NDVI after\")\naxes[2].imshow(delta_arr, 
vmin=-0.5, vmax=0.5, cmap=\"PiY
 G\", extent=extent)\naxes[2].set_title(\"\u0394NDVI (after \u2212 
before)\")\naxes[3].imshow(top_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\", 
extent=top_extent)\naxes[3].set_title(f\"Top parcel ({top_id}) 
\u0394NDVI\")\nfor ax in axes:\n    ax.set_xticks([])\n    
ax.set_yticks([])\nfig.tight_layout()\nfig"

Review Comment:
   The section text says the ΔNDVI panel will have parcel boundaries overlaid, 
but the plotting code never draws the parcel grid, and `matplotlib.patches as 
mpatches` is imported but unused. Either add an overlay (e.g., draw the 4×4 
parcel rectangles on `axes[2]`) or update the markdown description and remove 
the unused import so the visualization matches what the notebook claims.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2700] Add 02-vegetation-change notebook: end-to-end raster workflow [sedona]

Reply via email to