Copilot commented on code in PR #2896:
URL: https://github.com/apache/sedona/pull/2896#discussion_r3178487457
##########
docs/usecases/02-vegetation-change.ipynb:
##########
@@ -0,0 +1,300 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "<!--\n",
+ " Licensed to the Apache Software Foundation (ASF) under one\n",
+ " or more contributor license agreements. See the NOTICE file\n",
+ " distributed with this work for additional information\n",
+ " regarding copyright ownership. The ASF licenses this file\n",
+ " to you under the Apache License, Version 2.0 (the\n",
+ " \"License\"); you may not use this file except in compliance\n",
+ " with the License. You may obtain a copy of the License at\n",
+ "\n",
+ " http://www.apache.org/licenses/LICENSE-2.0\n",
+ "\n",
+ " Unless required by applicable law or agreed to in writing,\n",
+ " software distributed under the License is distributed on an\n",
+ " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ " KIND, either express or implied. See the License for the\n",
+ " specific language governing permissions and limitations\n",
+ " under the License.\n",
+ "-->\n",
+ "\n",
+ "# Did this farmland green up?\n",
+ "\n",
+ "A complete raster pipeline. We answer:\n",
+ "\n",
+ "> **Between two satellite scenes a season apart, which farm parcels in
this AOI greened up the most?**\n",
+ "\n",
+ "End-to-end on Sedona's raster surface: load two GeoTIFFs through the new
tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take
their difference for a \"greening\" delta raster, run `RS_ZonalStats` per
parcel, rank, clip the delta raster to the top parcel with `RS_Clip`,
round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized
GeoTIFF, and visualize the four stages as a single matplotlib figure.\n",
+ "\n",
+ "The two source scenes are **synthesized** at the top of the notebook (256
\u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully
reproducible and ships no new bytes. The same code path runs unchanged against
real Sentinel-2 chips \u2014 only the raster filenames change.\n",
+ "\n",
+ "**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling
GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook
will fail on older versions of the docker image."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Connect to Spark through SedonaContext"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sedona.spark import SedonaContext\n",
+ "\n",
+ "config =
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+ "sedona = SedonaContext.create(config)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Synthesize two scenes a season apart\n",
+ "\n",
+ "We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and
near-infrared reflectance over the same AOI. The \"before\" scene is mostly
bare; the \"after\" scene has a circular field of vegetation in the south-west
corner with elevated NIR. Pixel values are scaled to the same 0-10000
reflectance range as Sentinel-2 surface-reflectance products so the NDVI math
is identical to the real-world case."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "import os\n\nimport numpy as np\nimport rasterio\nfrom
rasterio.transform import from_bounds\n\nWORK =
\"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50,
-91.00, 41.60) # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform
= from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs =
np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2 #
circular field\n\n\ndef synth_scene(green_strength):\n red = (1500 + 200 *
rng.standard_normal((H, W))).clip(0, 10000)\n nir = (1800 + 200 *
rng.standard_normal((H, W))).clip(0, 10000)\n nir = np.where(field_mask, nir
+ green_strength, nir)\n return red.astype(\"uint16\"),
nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200),
(\"after\", 4000)]:\n red, nir = synth_scene(strength)\n with
rasterio.open(\n f\"{WORK}/scene_{name}.tif\",\n \"w\",\n
driver=\"GTiff\",\n tiled=True,\n blockxsize=256,\n
blockysize=256,\n height=H,\n width=W,\n count=2,\n
dtype=\"uint16\",\n crs=\"EPSG:4326\",\n
transform=transform,\n ) as dst:\n dst.write(red, 1)\n
dst.write(nir, 2)\n dst.set_band_description(1, \"red\")\n
dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\",
f\"{WORK}/scene_after.tif\"):\n print(f\"{f}: {os.path.getsize(f)} bytes\")"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. Load both scenes with the `raster` data source\n",
+ "\n",
+ "`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and
yields one `Raster`-typed row per tile, sidestepping Spark's 2 GB record-size
ceiling on large GeoTIFFs. Our scenes are tiny so each yields a single tile,
but the call shape is identical for multi-gigabyte inputs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "scenes = (\n sedona.read.format(\"raster\")\n
.load(f\"{WORK}/scene_*.tif\")\n .selectExpr(\"split(name, '\\\\\\\\.')[0]
as scene\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)"
Review Comment:
The notebook text says the `raster` data source call shape is identical for
multi-gigabyte inputs, but the next steps drop the tile coordinates and later
assume exactly one row per scene. For multi-tile GeoTIFFs, selecting only
`scene, rast` makes it impossible to align tiles across scenes for the
two-raster `RS_MapAlgebra` step and will make the later scalar subqueries fail.
Consider keeping the `x`/`y` (tile indices) columns from the raster reader and
describing (or implementing) a join on `(x, y)` when computing ΔNDVI so the
example actually generalizes to tiled inputs.
##########
docs/usecases/02-vegetation-change.ipynb:
##########
@@ -0,0 +1,300 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "<!--\n",
+ " Licensed to the Apache Software Foundation (ASF) under one\n",
+ " or more contributor license agreements. See the NOTICE file\n",
+ " distributed with this work for additional information\n",
+ " regarding copyright ownership. The ASF licenses this file\n",
+ " to you under the Apache License, Version 2.0 (the\n",
+ " \"License\"); you may not use this file except in compliance\n",
+ " with the License. You may obtain a copy of the License at\n",
+ "\n",
+ " http://www.apache.org/licenses/LICENSE-2.0\n",
+ "\n",
+ " Unless required by applicable law or agreed to in writing,\n",
+ " software distributed under the License is distributed on an\n",
+ " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ " KIND, either express or implied. See the License for the\n",
+ " specific language governing permissions and limitations\n",
+ " under the License.\n",
+ "-->\n",
+ "\n",
+ "# Did this farmland green up?\n",
+ "\n",
+ "A complete raster pipeline. We answer:\n",
+ "\n",
+ "> **Between two satellite scenes a season apart, which farm parcels in
this AOI greened up the most?**\n",
+ "\n",
+ "End-to-end on Sedona's raster surface: load two GeoTIFFs through the new
tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take
their difference for a \"greening\" delta raster, run `RS_ZonalStats` per
parcel, rank, clip the delta raster to the top parcel with `RS_Clip`,
round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized
GeoTIFF, and visualize the four stages as a single matplotlib figure.\n",
+ "\n",
+ "The two source scenes are **synthesized** at the top of the notebook (256
\u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully
reproducible and ships no new bytes. The same code path runs unchanged against
real Sentinel-2 chips \u2014 only the raster filenames change.\n",
+ "\n",
+ "**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling
GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook
will fail on older versions of the docker image."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Connect to Spark through SedonaContext"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sedona.spark import SedonaContext\n",
+ "\n",
+ "config =
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+ "sedona = SedonaContext.create(config)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Synthesize two scenes a season apart\n",
+ "\n",
+ "We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and
near-infrared reflectance over the same AOI. The \"before\" scene is mostly
bare; the \"after\" scene has a circular field of vegetation in the south-west
corner with elevated NIR. Pixel values are scaled to the same 0-10000
reflectance range as Sentinel-2 surface-reflectance products so the NDVI math
is identical to the real-world case."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "import os\n\nimport numpy as np\nimport rasterio\nfrom
rasterio.transform import from_bounds\n\nWORK =
\"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50,
-91.00, 41.60) # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform
= from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs =
np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2 #
circular field\n\n\ndef synth_scene(green_strength):\n red = (1500 + 200 *
rng.standard_normal((H, W))).clip(0, 10000)\n nir = (1800 + 200 *
rng.standard_normal((H, W))).clip(0, 10000)\n nir = np.where(field_mask, nir
+ green_strength, nir)\n return red.astype(\"uint16\"),
nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200),
(\"after\", 4000)]:\n red, nir = synth_scene(strength)\n with
rasterio.open(\n f\"{WORK}/scene_{name}.tif\",\n \"w\",\n
driver=\"GTiff\",\n tiled=True,\n blockxsize=256,\n
blockysize=256,\n height=H,\n width=W,\n count=2,\n
dtype=\"uint16\",\n crs=\"EPSG:4326\",\n
transform=transform,\n ) as dst:\n dst.write(red, 1)\n
dst.write(nir, 2)\n dst.set_band_description(1, \"red\")\n
dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\",
f\"{WORK}/scene_after.tif\"):\n print(f\"{f}: {os.path.getsize(f)} bytes\")"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. Load both scenes with the `raster` data source\n",
+ "\n",
+ "`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and
yields one `Raster`-typed row per tile, sidestepping Spark's 2 GB record-size
ceiling on large GeoTIFFs. Our scenes are tiny so each yields a single tile,
but the call shape is identical for multi-gigabyte inputs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "scenes = (\n sedona.read.format(\"raster\")\n
.load(f\"{WORK}/scene_*.tif\")\n .selectExpr(\"split(name, '\\\\\\\\.')[0]
as scene\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 4. Compute NDVI per scene with `RS_MapAlgebra`\n",
+ "\n",
+ "`RS_MapAlgebra(raster, pixelType, script)` runs a single-raster pixel
script. The script syntax (`out[0] = ...; rast[i]` indexing) is documented
[here](../api/sql/Raster-map-algebra.md). We coerce the result to `D` (double)
so the negative side of NDVI survives."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "scenes.createOrReplaceTempView(\"scenes\")\n",
+ "\n",
+ "ndvi = sedona.sql(\"\"\"\n",
+ " SELECT scene,\n",
+ " RS_MapAlgebra(\n",
+ " rast, 'D',\n",
+ " 'out[0] = (rast[1] - rast[0]) / (rast[1] + rast[0] +
0.000001);'\n",
+ " ) AS rast\n",
+ " FROM scenes\n",
+ "\"\"\")\n",
+ "ndvi.cache()\n",
+ "ndvi.createOrReplaceTempView(\"ndvi\")\n",
+ "ndvi.selectExpr(\"scene\", \"RS_BandPixelType(rast, 1) as dtype\").show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5. Compute the greening delta raster\n",
+ "\n",
+ "The two-raster form `RS_MapAlgebra(rast0, rast1, pixelType, script,
noDataValue)` lets us subtract the before-NDVI from the after-NDVI in a single
pass. The result is a per-pixel \u0394NDVI layer where positive values mean
\"this pixel got greener\"."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "delta = sedona.sql(\"\"\"\n",
+ " SELECT RS_MapAlgebra(\n",
+ " (SELECT rast FROM ndvi WHERE scene = 'scene_after'),\n",
+ " (SELECT rast FROM ndvi WHERE scene = 'scene_before'),\n",
+ " 'D',\n",
+ " 'out[0] = rast0[0] - rast1[0];',\n",
+ " -9999.0\n",
+ " ) AS rast\n",
Review Comment:
This ΔNDVI computation relies on scalar subqueries `(SELECT rast FROM ndvi
WHERE scene = ...)`, which will error if either scene produces more than one
tile/row (the normal case for large rasters). To make this robust, compute
ΔNDVI per tile by joining the "after" and "before" NDVI DataFrames on tile
coordinates (e.g., `x`,`y`) and run `RS_MapAlgebra(after.rast, before.rast,
...)` row-by-row, or explicitly state that the rest of the notebook assumes a
single-tile raster.
##########
docs/usecases/02-vegetation-change.ipynb:
##########
@@ -0,0 +1,300 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "<!--\n",
+ " Licensed to the Apache Software Foundation (ASF) under one\n",
+ " or more contributor license agreements. See the NOTICE file\n",
+ " distributed with this work for additional information\n",
+ " regarding copyright ownership. The ASF licenses this file\n",
+ " to you under the Apache License, Version 2.0 (the\n",
+ " \"License\"); you may not use this file except in compliance\n",
+ " with the License. You may obtain a copy of the License at\n",
+ "\n",
+ " http://www.apache.org/licenses/LICENSE-2.0\n",
+ "\n",
+ " Unless required by applicable law or agreed to in writing,\n",
+ " software distributed under the License is distributed on an\n",
+ " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ " KIND, either express or implied. See the License for the\n",
+ " specific language governing permissions and limitations\n",
+ " under the License.\n",
+ "-->\n",
+ "\n",
+ "# Did this farmland green up?\n",
+ "\n",
+ "A complete raster pipeline. We answer:\n",
+ "\n",
+ "> **Between two satellite scenes a season apart, which farm parcels in
this AOI greened up the most?**\n",
+ "\n",
+ "End-to-end on Sedona's raster surface: load two GeoTIFFs through the new
tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take
their difference for a \"greening\" delta raster, run `RS_ZonalStats` per
parcel, rank, clip the delta raster to the top parcel with `RS_Clip`,
round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized
GeoTIFF, and visualize the four stages as a single matplotlib figure.\n",
+ "\n",
+ "The two source scenes are **synthesized** at the top of the notebook (256
\u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully
reproducible and ships no new bytes. The same code path runs unchanged against
real Sentinel-2 chips \u2014 only the raster filenames change.\n",
+ "\n",
+ "**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling
GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook
will fail on older versions of the docker image."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Connect to Spark through SedonaContext"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sedona.spark import SedonaContext\n",
+ "\n",
+ "config =
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+ "sedona = SedonaContext.create(config)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Synthesize two scenes a season apart\n",
+ "\n",
+ "We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and
near-infrared reflectance over the same AOI. The \"before\" scene is mostly
bare; the \"after\" scene has a circular field of vegetation in the south-west
corner with elevated NIR. Pixel values are scaled to the same 0-10000
reflectance range as Sentinel-2 surface-reflectance products so the NDVI math
is identical to the real-world case."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "import os\n\nimport numpy as np\nimport rasterio\nfrom
rasterio.transform import from_bounds\n\nWORK =
\"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50,
-91.00, 41.60) # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform
= from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs =
np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2 #
circular field\n\n\ndef synth_scene(green_strength):\n red = (1500 + 200 *
rng.standard_normal((H, W))).clip(0, 10000)\n nir = (1800 + 200 *
rng.standard_normal((H, W))).clip(0, 10000)\n nir = np.where(field_mask, nir
+ green_strength, nir)\n return red.astype(\"uint16\"),
nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200),
(\"after\", 4000)]:\n red, nir = synth_scene(strength)\n with
rasterio.open(\n f\"{WORK}/scene_{name}.tif\",\n \"w\",\n
driver=\"GTiff\",\n tiled=True,\n blockxsize=256,\n
blockysize=256,\n height=H,\n width=W,\n count=2,\n
dtype=\"uint16\",\n crs=\"EPSG:4326\",\n
transform=transform,\n ) as dst:\n dst.write(red, 1)\n
dst.write(nir, 2)\n dst.set_band_description(1, \"red\")\n
dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\",
f\"{WORK}/scene_after.tif\"):\n print(f\"{f}: {os.path.getsize(f)} bytes\")"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. Load both scenes with the `raster` data source\n",
+ "\n",
+ "`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and
yields one `Raster`-typed row per tile, sidestepping Spark's 2 GB record-size
ceiling on large GeoTIFFs. Our scenes are tiny so each yields a single tile,
but the call shape is identical for multi-gigabyte inputs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "scenes = (\n sedona.read.format(\"raster\")\n
.load(f\"{WORK}/scene_*.tif\")\n .selectExpr(\"split(name, '\\\\\\\\.')[0]
as scene\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 4. Compute NDVI per scene with `RS_MapAlgebra`\n",
+ "\n",
+ "`RS_MapAlgebra(raster, pixelType, script)` runs a single-raster pixel
script. The script syntax (`out[0] = ...; rast[i]` indexing) is documented
[here](../api/sql/Raster-map-algebra.md). We coerce the result to `D` (double)
so the negative side of NDVI survives."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "scenes.createOrReplaceTempView(\"scenes\")\n",
+ "\n",
+ "ndvi = sedona.sql(\"\"\"\n",
+ " SELECT scene,\n",
+ " RS_MapAlgebra(\n",
+ " rast, 'D',\n",
+ " 'out[0] = (rast[1] - rast[0]) / (rast[1] + rast[0] +
0.000001);'\n",
+ " ) AS rast\n",
+ " FROM scenes\n",
+ "\"\"\")\n",
+ "ndvi.cache()\n",
+ "ndvi.createOrReplaceTempView(\"ndvi\")\n",
+ "ndvi.selectExpr(\"scene\", \"RS_BandPixelType(rast, 1) as dtype\").show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5. Compute the greening delta raster\n",
+ "\n",
+ "The two-raster form `RS_MapAlgebra(rast0, rast1, pixelType, script,
noDataValue)` lets us subtract the before-NDVI from the after-NDVI in a single
pass. The result is a per-pixel \u0394NDVI layer where positive values mean
\"this pixel got greener\"."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "delta = sedona.sql(\"\"\"\n",
+ " SELECT RS_MapAlgebra(\n",
+ " (SELECT rast FROM ndvi WHERE scene = 'scene_after'),\n",
+ " (SELECT rast FROM ndvi WHERE scene = 'scene_before'),\n",
+ " 'D',\n",
+ " 'out[0] = rast0[0] - rast1[0];',\n",
+ " -9999.0\n",
+ " ) AS rast\n",
+ "\"\"\")\n",
+ "delta.cache()\n",
+ "delta.createOrReplaceTempView(\"delta\")\n",
+ "delta.selectExpr(\n",
+ " \"RS_BandPixelType(rast, 1) as dtype\",\n",
+ " \"RS_Width(rast) as w\",\n",
+ " \"RS_Height(rast) as h\",\n",
+ ").show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 6. Synthesize parcel polygons and run `RS_ZonalStats`\n",
+ "\n",
+ "We build a 4 \u00d7 4 grid of square parcels covering the AOI and compute
the mean \u0394NDVI per parcel. `RS_ZonalStats(raster, zone, statType)` is the
canonical raster\u2192vector aggregation: every pixel inside `zone` contributes
to the statistic. Parcels are then ranked by mean \u0394NDVI."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from pyspark.sql import Row\n",
+ "\n",
+ "xmin, ymin, xmax, ymax = AOI\n",
+ "step_x = (xmax - xmin) / 4\n",
+ "step_y = (ymax - ymin) / 4\n",
+ "parcel_rows = []\n",
+ "for ix in range(4):\n",
+ " for iy in range(4):\n",
+ " x0, x1 = xmin + ix * step_x, xmin + (ix + 1) * step_x\n",
+ " y0, y1 = ymin + iy * step_y, ymin + (iy + 1) * step_y\n",
+ " wkt = f\"POLYGON(({x0} {y0}, {x1} {y0}, {x1} {y1}, {x0} {y1},
{x0} {y0}))\"\n",
+ " parcel_rows.append(Row(parcel_id=f\"P{ix}{iy}\", wkt=wkt))\n",
+ "\n",
+ "parcels = sedona.createDataFrame(parcel_rows).selectExpr(\n",
+ " \"parcel_id\", \"ST_GeomFromText(wkt) as geom\"\n",
+ ")\n",
+ "parcels.createOrReplaceTempView(\"parcels\")\n",
+ "\n",
+ "ranked = sedona.sql(\"\"\"\n",
+ " SELECT p.parcel_id,\n",
+ " ROUND(RS_ZonalStats(d.rast, p.geom, 'mean'), 4) AS
mean_delta_ndvi\n",
+ " FROM parcels p, delta d\n",
+ " ORDER BY mean_delta_ndvi DESC\n",
+ "\"\"\")\n",
+ "ranked.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7. Clip the delta raster to the top-greening parcel\n",
+ "\n",
+ "`RS_Clip(raster, band, geom)` crops the raster to the polygon's extent.
We pick the parcel with the highest mean \u0394NDVI and zoom in on its delta
values."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "top_id = ranked.first()[\"parcel_id\"]\n",
+ "print(f\"top-greening parcel: {top_id}\")\n",
+ "\n",
+ "clipped = sedona.sql(f\"\"\"\n",
+ " SELECT RS_Clip(d.rast, 1, p.geom) AS rast\n",
+ " FROM delta d, parcels p\n",
+ " WHERE p.parcel_id = '{top_id}'\n",
+ "\"\"\")\n",
+ "clipped.cache()\n",
+ "clipped.createOrReplaceTempView(\"clipped\")\n",
+ "clipped.selectExpr(\n",
+ " \"RS_Width(rast) as w\",\n",
+ " \"RS_Height(rast) as h\",\n",
+ ").show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 8. Round-trip through `RS_AsCOG`\n",
+ "\n",
+ "`RS_AsCOG` returns the raster as a binary Cloud-Optimized GeoTIFF byte
array. We persist it to disk, then read it back with the same `raster` data
source we used to load the input \u2014 proving the output is a valid GeoTIFF
that downstream tools can stream from object storage."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "cog_bytes = clipped.selectExpr(\"RS_AsCOG(rast) as
cog\").first()[\"cog\"]\nwith open(f\"{WORK}/delta_topparcel_cog.tif\", \"wb\")
as fh:\n fh.write(cog_bytes)\nprint(f\"COG size: {len(cog_bytes):,}
bytes\")\n\nroundtrip =
sedona.read.format(\"raster\").load(f\"{WORK}/delta_topparcel_cog.tif\")\nroundtrip.selectExpr(\n
\"RS_Width(rast) as w\",\n \"RS_Height(rast) as h\",\n
\"RS_BandPixelType(rast, 1) as dtype\",\n).show()"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 9. Visualize the pipeline as a 4-panel figure\n",
+ "\n",
+ "Read the rasters back via `rasterio` and lay them out side by side: NDVI
before, NDVI after, \u0394NDVI across the full AOI with parcel boundaries
overlaid, and the top-parcel close-up."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": "import matplotlib.patches as mpatches\nimport matplotlib.pyplot
as plt\n\n\ndef ndvi_arr(path):\n with rasterio.open(path) as ds:\n
red = ds.read(1).astype(\"float32\")\n nir =
ds.read(2).astype(\"float32\")\n return (nir - red) / (nir + red +
1e-6)\n\n\nndvi_before = ndvi_arr(f\"{WORK}/scene_before.tif\")\nndvi_after =
ndvi_arr(f\"{WORK}/scene_after.tif\")\ndelta_arr = ndvi_after -
ndvi_before\nwith rasterio.open(f\"{WORK}/delta_topparcel_cog.tif\") as ds:\n
top_arr = ds.read(1)\n top_extent = (ds.bounds.left, ds.bounds.right,
ds.bounds.bottom, ds.bounds.top)\n\nfig, axes = plt.subplots(1, 4, figsize=(16,
4))\nextent = (AOI[0], AOI[2], AOI[1], AOI[3])\naxes[0].imshow(ndvi_before,
vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[0].set_title(\"NDVI
before\")\naxes[1].imshow(ndvi_after, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\",
extent=extent)\naxes[1].set_title(\"NDVI after\")\naxes[2].imshow(delta_arr,
vmin=-0.5, vmax=0.5, cmap=\"PiY
G\", extent=extent)\naxes[2].set_title(\"\u0394NDVI (after \u2212
before)\")\naxes[3].imshow(top_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\",
extent=top_extent)\naxes[3].set_title(f\"Top parcel ({top_id})
\u0394NDVI\")\nfor ax in axes:\n ax.set_xticks([])\n
ax.set_yticks([])\nfig.tight_layout()\nfig"
Review Comment:
The section text says the ΔNDVI panel will have parcel boundaries overlaid,
but the plotting code never draws the parcel grid, and `matplotlib.patches as
mpatches` is imported but unused. Either add an overlay (e.g., draw the 4×4
parcel rectangles on `axes[2]`) or update the markdown description and remove
the unused import so the visualization matches what the notebook claims.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]