Re: [PR] [GH-2700] Add 03-fire-risk-fusion notebook: raster + vector fusion [sedona]

via GitHub Sun, 03 May 2026 22:59:26 -0700


Copilot commented on code in PR #2900:
URL: https://github.com/apache/sedona/pull/2900#discussion_r3179614439



##########
docs/usecases/03-fire-risk-fusion.ipynb:
##########
@@ -0,0 +1,404 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!--\n",
+    " Licensed to the Apache Software Foundation (ASF) under one\n",
+    " or more contributor license agreements.  See the NOTICE file\n",
+    " distributed with this work for additional information\n",
+    " regarding copyright ownership.  The ASF licenses this file\n",
+    " to you under the Apache License, Version 2.0 (the\n",
+    " \"License\"); you may not use this file except in compliance\n",
+    " with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    " Unless required by applicable law or agreed to in writing,\n",
+    " software distributed under the License is distributed on an\n",
+    " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    " KIND, either express or implied.  See the License for the\n",
+    " specific language governing permissions and limitations\n",
+    " under the License.\n",
+    "-->\n",
+    "\n",
+    "# Wildfire risk score per building\n",
+    "\n",
+    "A raster + vector fusion workflow. We answer:\n",
+    "\n",
+    "> **Given a county's terrain steepness, fuel load, building footprints, 
and road network, score every building for wildfire risk weighted by distance 
from the nearest evacuation route.**\n",
+    "\n",
+    "This is the workflow GeoPandas alone can't do — it mixes raster algebra, 
raster→vector aggregation, and vector-on-vector distance joins in one pipeline. 
Sedona makes it one SQL block.\n",
+    "\n",
+    "**Pipeline:**\n",
+    "1. Synthesize a slope raster and a fuel-load raster.\n",
+    "2. Combine into a composite risk raster with two-raster 
`RS_MapAlgebra`.\n",
+    "3. Synthesize building footprints (4×4 grid of polygons) and a small road 
network.\n",
+    "4. Score each building with `RS_ZonalStats(composite, footprint, 'mean')` 
× evacuation factor from `ST_DistanceSpheroid` to the nearest road.\n",
+    "5. Rank, write top-5 as GeoParquet, plot.\n",
+    "\n",
+    "All inputs are synthesized in the notebook so it's fully reproducible and 
ships no new bytes. Swap the synthesis cell for 
`sedona.read.format(\"raster\").load(\"...\")` over a real DEM-derived slope 
raster and a NLCD-derived fuel raster; everything below is unchanged.\n",
+    "\n",
+    "**Requires Sedona ≥ 1.9.0** for the auto-tiling raster reader and 
proj4sedona-backed `ST_Transform`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Connect to Spark through SedonaContext"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sedona.spark import SedonaContext\n",
+    "\n",
+    "config = 
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+    "sedona = SedonaContext.create(config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Synthesize the raster inputs\n",
+    "\n",
+    "Two 256×256 single-band rasters over a 0.1° AOI in eastern Iowa:\n",
+    "\n",
+    "- `slope.tif` — normalized 0-1, highest along the eastern edge (think: 
foothills rising toward the east).\n",
+    "- `fuel.tif` — normalized 0-1, highest along the northern edge (think: 
dense vegetation belt across the top of the AOI).\n",
+    "\n",
+    "Both written as **tiled** GeoTIFFs so the Sedona raster reader (which 
rejects strip-based files as \"too thin\") can ingest them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "import numpy as np\n",
+    "import rasterio\n",
+    "from rasterio.transform import from_bounds\n",
+    "\n",
+    "WORK = \"/tmp/fire-risk\"\n",
+    "os.makedirs(WORK, exist_ok=True)\n",
+    "\n",
+    "AOI = (-91.10, 41.50, -91.00, 41.60)  # (xmin, ymin, xmax, ymax) 
EPSG:4326\n",
+    "W = H = 256\n",
+    "transform = from_bounds(*AOI, W, H)\n",
+    "rng = np.random.default_rng(7)\n",
+    "\n",
+    "ys, xs = np.mgrid[0:H, 0:W]\n",
+    "slope = (xs / (W - 1)) + 0.05 * rng.standard_normal((H, W))\n",
+    "fuel = ((H - 1 - ys) / (H - 1)) + 0.05 * rng.standard_normal((H, W))\n",
+    "slope = slope.clip(0, 1).astype(\"float32\")\n",
+    "fuel = fuel.clip(0, 1).astype(\"float32\")\n",
+    "\n",
+    "for name, arr in [(\"slope\", slope), (\"fuel\", fuel)]:\n",
+    "    with rasterio.open(\n",
+    "        f\"{WORK}/{name}.tif\",\n",
+    "        \"w\",\n",
+    "        driver=\"GTiff\",\n",
+    "        tiled=True,\n",
+    "        blockxsize=256,\n",
+    "        blockysize=256,\n",
+    "        height=H,\n",
+    "        width=W,\n",
+    "        count=1,\n",
+    "        dtype=\"float32\",\n",
+    "        crs=\"EPSG:4326\",\n",
+    "        transform=transform,\n",
+    "    ) as dst:\n",
+    "        dst.write(arr, 1)\n",
+    "        dst.set_band_description(1, name)\n",
+    "    print(f\"{name}.tif: min={arr.min():.2f} max={arr.max():.2f} 
mean={arr.mean():.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Combine slope and fuel into a composite risk raster\n",
+    "\n",
+    "Load both rasters with the auto-tiling reader, keep the `(x, y)` 
tile-index columns, then join on those for a tile-aligned two-raster 
`RS_MapAlgebra`. The same query works for single-tile inputs (as here) or for a 
DEM-sized scene that yields thousands of tiles per layer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rasters = (\n",
+    "    sedona.read.format(\"raster\")\n",
+    "    .load(f\"{WORK}/*.tif\")\n",
+    "    .selectExpr(\"split(name, '\\\\\\\\.')[0] as layer\", \"x\", \"y\", 
\"rast\")\n",
+    ")\n",
+    "rasters.createOrReplaceTempView(\"rasters\")\n",
+    "\n",
+    "composite = sedona.sql(\"\"\"\n",
+    "    SELECT s.x, s.y,\n",
+    "           RS_MapAlgebra(\n",
+    "               s.rast, f.rast, 'D',\n",
+    "               'out[0] = 0.5 * rast0[0] + 0.5 * rast1[0];',\n",
+    "               -9999.0\n",
+    "           ) AS rast\n",
+    "    FROM (SELECT x, y, rast FROM rasters WHERE layer = 'slope') s\n",
+    "    JOIN (SELECT x, y, rast FROM rasters WHERE layer = 'fuel')  f\n",
+    "      ON s.x = f.x AND s.y = f.y\n",
+    "\"\"\")\n",
+    "composite.cache()\n",
+    "composite.createOrReplaceTempView(\"composite\")\n",
+    "composite.selectExpr(\n",
+    "    \"x\",\n",
+    "    \"y\",\n",
+    "    \"RS_BandPixelType(rast, 1) as dtype\",\n",
+    "    \"RS_Width(rast) as w\",\n",
+    "    \"RS_Height(rast) as h\",\n",
+    ").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Synthesize building footprints and road network\n",
+    "\n",
+    "Sixteen small square buildings on a 4×4 grid across the AOI, plus two 
roads (one east-west across the middle, one north-south down the centre). 
Buildings are 0.005° × 0.005° (~ 500 m × 500 m at this latitude); roads are 
simple `LINESTRING`s that bisect the AOI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pyspark.sql import Row\n",
+    "\n",
+    "xmin, ymin, xmax, ymax = AOI\n",
+    "step_x = (xmax - xmin) / 4\n",
+    "step_y = (ymax - ymin) / 4\n",
+    "half = 0.0025  # building half-width in degrees\n",
+    "\n",
+    "building_rows = []\n",
+    "for ix in range(4):\n",
+    "    for iy in range(4):\n",
+    "        cx = xmin + (ix + 0.5) * step_x\n",
+    "        cy = ymin + (iy + 0.5) * step_y\n",
+    "        wkt = (\n",
+    "            f\"POLYGON(({cx-half} {cy-half}, {cx+half} {cy-half}, \"\n",
+    "            f\"{cx+half} {cy+half}, {cx-half} {cy+half}, {cx-half} 
{cy-half}))\"\n",
+    "        )\n",
+    "        building_rows.append(Row(bid=f\"B{ix}{iy}\", wkt=wkt))\n",
+    "\n",
+    "buildings = sedona.createDataFrame(building_rows).selectExpr(\n",
+    "    \"bid\", \"ST_GeomFromText(wkt) as geom\"\n",
+    ")\n",
+    "buildings.createOrReplaceTempView(\"buildings\")\n",
+    "\n",
+    "mid_x = (xmin + xmax) / 2\n",
+    "mid_y = (ymin + ymax) / 2\n",
+    "road_rows = [\n",
+    "    Row(road_id=\"EW\", wkt=f\"LINESTRING({xmin} {mid_y}, {xmax} 
{mid_y})\"),\n",
+    "    Row(road_id=\"NS\", wkt=f\"LINESTRING({mid_x} {ymin}, {mid_x} 
{ymax})\"),\n",
+    "]\n",
+    "roads = sedona.createDataFrame(road_rows).selectExpr(\n",
+    "    \"road_id\", \"ST_GeomFromText(wkt) as geom\"\n",
+    ")\n",
+    "roads.createOrReplaceTempView(\"roads\")\n",
+    "\n",
+    "print(f\"{buildings.count()} buildings, {roads.count()} roads\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Score every building\n",
+    "\n",
+    "Two SQL passes:\n",
+    "\n",
+    "- **Distance to nearest road** — group `(building × road)` cross product 
by building, take the `MIN(ST_DistanceSpheroid)`. Spheroidal distance returns 
metres regardless of the geometries' EPSG:4326 lon/lat units.\n",
+    "- **Risk score** — `mean composite risk × (1 + min(dist_to_road_km, 5) / 
5)`. The clamp at 5 km caps the evacuation penalty so a single distant building 
doesn't dominate; the multiplicative form means a building only ranks high when 
it has *both* high terrain risk and poor road access."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "buildings_with_dist = sedona.sql(\"\"\"\n",
+    "    SELECT b.bid, b.geom,\n",
+    "           MIN(ST_DistanceSpheroid(b.geom, r.geom)) AS dist_m\n",
+    "    FROM buildings b, roads r\n",
+    "    GROUP BY b.bid, b.geom\n",

Review Comment:
   Computing the nearest road via an unconstrained `buildings × roads` 
cartesian product will not scale to the county-sized OSM/Overture example 
described in this notebook. Sedona only optimizes distance joins when the 
distance appears in the join predicate (or via `ST_KNN`), so this stays 
O(buildings × roads) and will explode on realistic road/building counts.



##########
docs/usecases/03-fire-risk-fusion.ipynb:
##########
@@ -0,0 +1,404 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!--\n",
+    " Licensed to the Apache Software Foundation (ASF) under one\n",
+    " or more contributor license agreements.  See the NOTICE file\n",
+    " distributed with this work for additional information\n",
+    " regarding copyright ownership.  The ASF licenses this file\n",
+    " to you under the Apache License, Version 2.0 (the\n",
+    " \"License\"); you may not use this file except in compliance\n",
+    " with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    " Unless required by applicable law or agreed to in writing,\n",
+    " software distributed under the License is distributed on an\n",
+    " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    " KIND, either express or implied.  See the License for the\n",
+    " specific language governing permissions and limitations\n",
+    " under the License.\n",
+    "-->\n",
+    "\n",
+    "# Wildfire risk score per building\n",
+    "\n",
+    "A raster + vector fusion workflow. We answer:\n",
+    "\n",
+    "> **Given a county's terrain steepness, fuel load, building footprints, 
and road network, score every building for wildfire risk weighted by distance 
from the nearest evacuation route.**\n",
+    "\n",
+    "This is the workflow GeoPandas alone can't do — it mixes raster algebra, 
raster→vector aggregation, and vector-on-vector distance joins in one pipeline. 
Sedona makes it one SQL block.\n",
+    "\n",
+    "**Pipeline:**\n",
+    "1. Synthesize a slope raster and a fuel-load raster.\n",
+    "2. Combine into a composite risk raster with two-raster 
`RS_MapAlgebra`.\n",
+    "3. Synthesize building footprints (4×4 grid of polygons) and a small road 
network.\n",
+    "4. Score each building with `RS_ZonalStats(composite, footprint, 'mean')` 
× evacuation factor from `ST_DistanceSpheroid` to the nearest road.\n",
+    "5. Rank, write top-5 as GeoParquet, plot.\n",
+    "\n",
+    "All inputs are synthesized in the notebook so it's fully reproducible and 
ships no new bytes. Swap the synthesis cell for 
`sedona.read.format(\"raster\").load(\"...\")` over a real DEM-derived slope 
raster and a NLCD-derived fuel raster; everything below is unchanged.\n",
+    "\n",
+    "**Requires Sedona ≥ 1.9.0** for the auto-tiling raster reader and 
proj4sedona-backed `ST_Transform`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Connect to Spark through SedonaContext"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sedona.spark import SedonaContext\n",
+    "\n",
+    "config = 
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+    "sedona = SedonaContext.create(config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Synthesize the raster inputs\n",
+    "\n",
+    "Two 256×256 single-band rasters over a 0.1° AOI in eastern Iowa:\n",
+    "\n",
+    "- `slope.tif` — normalized 0-1, highest along the eastern edge (think: 
foothills rising toward the east).\n",
+    "- `fuel.tif` — normalized 0-1, highest along the northern edge (think: 
dense vegetation belt across the top of the AOI).\n",
+    "\n",
+    "Both written as **tiled** GeoTIFFs so the Sedona raster reader (which 
rejects strip-based files as \"too thin\") can ingest them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "import numpy as np\n",
+    "import rasterio\n",
+    "from rasterio.transform import from_bounds\n",
+    "\n",
+    "WORK = \"/tmp/fire-risk\"\n",
+    "os.makedirs(WORK, exist_ok=True)\n",
+    "\n",
+    "AOI = (-91.10, 41.50, -91.00, 41.60)  # (xmin, ymin, xmax, ymax) 
EPSG:4326\n",
+    "W = H = 256\n",
+    "transform = from_bounds(*AOI, W, H)\n",
+    "rng = np.random.default_rng(7)\n",
+    "\n",
+    "ys, xs = np.mgrid[0:H, 0:W]\n",
+    "slope = (xs / (W - 1)) + 0.05 * rng.standard_normal((H, W))\n",
+    "fuel = ((H - 1 - ys) / (H - 1)) + 0.05 * rng.standard_normal((H, W))\n",
+    "slope = slope.clip(0, 1).astype(\"float32\")\n",
+    "fuel = fuel.clip(0, 1).astype(\"float32\")\n",
+    "\n",
+    "for name, arr in [(\"slope\", slope), (\"fuel\", fuel)]:\n",
+    "    with rasterio.open(\n",
+    "        f\"{WORK}/{name}.tif\",\n",
+    "        \"w\",\n",
+    "        driver=\"GTiff\",\n",
+    "        tiled=True,\n",
+    "        blockxsize=256,\n",
+    "        blockysize=256,\n",
+    "        height=H,\n",
+    "        width=W,\n",
+    "        count=1,\n",
+    "        dtype=\"float32\",\n",
+    "        crs=\"EPSG:4326\",\n",
+    "        transform=transform,\n",
+    "    ) as dst:\n",
+    "        dst.write(arr, 1)\n",
+    "        dst.set_band_description(1, name)\n",
+    "    print(f\"{name}.tif: min={arr.min():.2f} max={arr.max():.2f} 
mean={arr.mean():.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Combine slope and fuel into a composite risk raster\n",
+    "\n",
+    "Load both rasters with the auto-tiling reader, keep the `(x, y)` 
tile-index columns, then join on those for a tile-aligned two-raster 
`RS_MapAlgebra`. The same query works for single-tile inputs (as here) or for a 
DEM-sized scene that yields thousands of tiles per layer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rasters = (\n",
+    "    sedona.read.format(\"raster\")\n",
+    "    .load(f\"{WORK}/*.tif\")\n",
+    "    .selectExpr(\"split(name, '\\\\\\\\.')[0] as layer\", \"x\", \"y\", 
\"rast\")\n",
+    ")\n",
+    "rasters.createOrReplaceTempView(\"rasters\")\n",
+    "\n",
+    "composite = sedona.sql(\"\"\"\n",
+    "    SELECT s.x, s.y,\n",
+    "           RS_MapAlgebra(\n",
+    "               s.rast, f.rast, 'D',\n",
+    "               'out[0] = 0.5 * rast0[0] + 0.5 * rast1[0];',\n",
+    "               -9999.0\n",
+    "           ) AS rast\n",
+    "    FROM (SELECT x, y, rast FROM rasters WHERE layer = 'slope') s\n",
+    "    JOIN (SELECT x, y, rast FROM rasters WHERE layer = 'fuel')  f\n",
+    "      ON s.x = f.x AND s.y = f.y\n",
+    "\"\"\")\n",
+    "composite.cache()\n",
+    "composite.createOrReplaceTempView(\"composite\")\n",
+    "composite.selectExpr(\n",
+    "    \"x\",\n",
+    "    \"y\",\n",
+    "    \"RS_BandPixelType(rast, 1) as dtype\",\n",
+    "    \"RS_Width(rast) as w\",\n",
+    "    \"RS_Height(rast) as h\",\n",
+    ").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Synthesize building footprints and road network\n",
+    "\n",
+    "Sixteen small square buildings on a 4×4 grid across the AOI, plus two 
roads (one east-west across the middle, one north-south down the centre). 
Buildings are 0.005° × 0.005° (~ 500 m × 500 m at this latitude); roads are 
simple `LINESTRING`s that bisect the AOI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pyspark.sql import Row\n",
+    "\n",
+    "xmin, ymin, xmax, ymax = AOI\n",
+    "step_x = (xmax - xmin) / 4\n",
+    "step_y = (ymax - ymin) / 4\n",
+    "half = 0.0025  # building half-width in degrees\n",
+    "\n",
+    "building_rows = []\n",
+    "for ix in range(4):\n",
+    "    for iy in range(4):\n",
+    "        cx = xmin + (ix + 0.5) * step_x\n",
+    "        cy = ymin + (iy + 0.5) * step_y\n",
+    "        wkt = (\n",
+    "            f\"POLYGON(({cx-half} {cy-half}, {cx+half} {cy-half}, \"\n",
+    "            f\"{cx+half} {cy+half}, {cx-half} {cy+half}, {cx-half} 
{cy-half}))\"\n",
+    "        )\n",
+    "        building_rows.append(Row(bid=f\"B{ix}{iy}\", wkt=wkt))\n",
+    "\n",
+    "buildings = sedona.createDataFrame(building_rows).selectExpr(\n",
+    "    \"bid\", \"ST_GeomFromText(wkt) as geom\"\n",
+    ")\n",
+    "buildings.createOrReplaceTempView(\"buildings\")\n",
+    "\n",
+    "mid_x = (xmin + xmax) / 2\n",
+    "mid_y = (ymin + ymax) / 2\n",
+    "road_rows = [\n",
+    "    Row(road_id=\"EW\", wkt=f\"LINESTRING({xmin} {mid_y}, {xmax} 
{mid_y})\"),\n",
+    "    Row(road_id=\"NS\", wkt=f\"LINESTRING({mid_x} {ymin}, {mid_x} 
{ymax})\"),\n",
+    "]\n",
+    "roads = sedona.createDataFrame(road_rows).selectExpr(\n",
+    "    \"road_id\", \"ST_GeomFromText(wkt) as geom\"\n",
+    ")\n",
+    "roads.createOrReplaceTempView(\"roads\")\n",
+    "\n",
+    "print(f\"{buildings.count()} buildings, {roads.count()} roads\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Score every building\n",
+    "\n",
+    "Two SQL passes:\n",
+    "\n",
+    "- **Distance to nearest road** — group `(building × road)` cross product 
by building, take the `MIN(ST_DistanceSpheroid)`. Spheroidal distance returns 
metres regardless of the geometries' EPSG:4326 lon/lat units.\n",
+    "- **Risk score** — `mean composite risk × (1 + min(dist_to_road_km, 5) / 
5)`. The clamp at 5 km caps the evacuation penalty so a single distant building 
doesn't dominate; the multiplicative form means a building only ranks high when 
it has *both* high terrain risk and poor road access."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "buildings_with_dist = sedona.sql(\"\"\"\n",
+    "    SELECT b.bid, b.geom,\n",
+    "           MIN(ST_DistanceSpheroid(b.geom, r.geom)) AS dist_m\n",
+    "    FROM buildings b, roads r\n",
+    "    GROUP BY b.bid, b.geom\n",
+    "\"\"\")\n",
+    "buildings_with_dist.createOrReplaceTempView(\"buildings_with_dist\")\n",
+    "\n",
+    "scored = sedona.sql(\"\"\"\n",
+    "    SELECT b.bid,\n",
+    "           ROUND(RS_ZonalStats(c.rast, b.geom, 'mean'), 4) AS 
mean_risk,\n",
+    "           ROUND(b.dist_m / 1000.0, 2) AS dist_km,\n",
+    "           ROUND(\n",
+    "               RS_ZonalStats(c.rast, b.geom, 'mean')\n",
+    "                 * (1.0 + LEAST(b.dist_m, 5000.0) / 5000.0),\n",
+    "               4\n",
+    "           ) AS risk_score,\n",
+    "           b.geom\n",
+    "    FROM buildings_with_dist b, composite c\n",

Review Comment:
   This query only produces correct building scores while `composite` has a 
single tile. As soon as the raster reader returns multiple tiles, each building 
emits one row per tile and `RS_ZonalStats(..., 'mean')` becomes a per-tile mean 
instead of a building-wide mean, so the ranking can contain duplicate buildings 
and silently wrong scores despite the surrounding text claiming the same SQL 
works for tiled DEM-sized inputs.
   



##########
docs/usecases/03-fire-risk-fusion.ipynb:
##########
@@ -0,0 +1,404 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!--\n",
+    " Licensed to the Apache Software Foundation (ASF) under one\n",
+    " or more contributor license agreements.  See the NOTICE file\n",
+    " distributed with this work for additional information\n",
+    " regarding copyright ownership.  The ASF licenses this file\n",
+    " to you under the Apache License, Version 2.0 (the\n",
+    " \"License\"); you may not use this file except in compliance\n",
+    " with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    " Unless required by applicable law or agreed to in writing,\n",
+    " software distributed under the License is distributed on an\n",
+    " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    " KIND, either express or implied.  See the License for the\n",
+    " specific language governing permissions and limitations\n",
+    " under the License.\n",
+    "-->\n",
+    "\n",
+    "# Wildfire risk score per building\n",
+    "\n",
+    "A raster + vector fusion workflow. We answer:\n",
+    "\n",
+    "> **Given a county's terrain steepness, fuel load, building footprints, 
and road network, score every building for wildfire risk weighted by distance 
from the nearest evacuation route.**\n",
+    "\n",
+    "This is the workflow GeoPandas alone can't do — it mixes raster algebra, 
raster→vector aggregation, and vector-on-vector distance joins in one pipeline. 
Sedona makes it one SQL block.\n",
+    "\n",
+    "**Pipeline:**\n",
+    "1. Synthesize a slope raster and a fuel-load raster.\n",
+    "2. Combine into a composite risk raster with two-raster 
`RS_MapAlgebra`.\n",
+    "3. Synthesize building footprints (4×4 grid of polygons) and a small road 
network.\n",
+    "4. Score each building with `RS_ZonalStats(composite, footprint, 'mean')` 
× evacuation factor from `ST_DistanceSpheroid` to the nearest road.\n",
+    "5. Rank, write top-5 as GeoParquet, plot.\n",
+    "\n",
+    "All inputs are synthesized in the notebook so it's fully reproducible and 
ships no new bytes. Swap the synthesis cell for 
`sedona.read.format(\"raster\").load(\"...\")` over a real DEM-derived slope 
raster and a NLCD-derived fuel raster; everything below is unchanged.\n",
+    "\n",
+    "**Requires Sedona ≥ 1.9.0** for the auto-tiling raster reader and 
proj4sedona-backed `ST_Transform`."

Review Comment:
   This requirement note points at `ST_Transform`, but the notebook never calls 
`ST_Transform` anywhere. Please reference the actual 1.9-only features used 
here; otherwise readers get the wrong explanation for why the notebook needs 
Sedona 1.9.
   



##########
docs/usecases/03-fire-risk-fusion.ipynb:
##########
@@ -0,0 +1,404 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!--\n",
+    " Licensed to the Apache Software Foundation (ASF) under one\n",
+    " or more contributor license agreements.  See the NOTICE file\n",
+    " distributed with this work for additional information\n",
+    " regarding copyright ownership.  The ASF licenses this file\n",
+    " to you under the Apache License, Version 2.0 (the\n",
+    " \"License\"); you may not use this file except in compliance\n",
+    " with the License.  You may obtain a copy of the License at\n",
+    "\n",
+    "   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "\n",
+    " Unless required by applicable law or agreed to in writing,\n",
+    " software distributed under the License is distributed on an\n",
+    " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    " KIND, either express or implied.  See the License for the\n",
+    " specific language governing permissions and limitations\n",
+    " under the License.\n",
+    "-->\n",
+    "\n",
+    "# Wildfire risk score per building\n",
+    "\n",
+    "A raster + vector fusion workflow. We answer:\n",
+    "\n",
+    "> **Given a county's terrain steepness, fuel load, building footprints, 
and road network, score every building for wildfire risk weighted by distance 
from the nearest evacuation route.**\n",
+    "\n",
+    "This is the workflow GeoPandas alone can't do — it mixes raster algebra, 
raster→vector aggregation, and vector-on-vector distance joins in one pipeline. 
Sedona makes it one SQL block.\n",
+    "\n",
+    "**Pipeline:**\n",
+    "1. Synthesize a slope raster and a fuel-load raster.\n",
+    "2. Combine into a composite risk raster with two-raster 
`RS_MapAlgebra`.\n",
+    "3. Synthesize building footprints (4×4 grid of polygons) and a small road 
network.\n",
+    "4. Score each building with `RS_ZonalStats(composite, footprint, 'mean')` 
× evacuation factor from `ST_DistanceSpheroid` to the nearest road.\n",
+    "5. Rank, write top-5 as GeoParquet, plot.\n",
+    "\n",
+    "All inputs are synthesized in the notebook so it's fully reproducible and 
ships no new bytes. Swap the synthesis cell for 
`sedona.read.format(\"raster\").load(\"...\")` over a real DEM-derived slope 
raster and a NLCD-derived fuel raster; everything below is unchanged.\n",
+    "\n",
+    "**Requires Sedona ≥ 1.9.0** for the auto-tiling raster reader and 
proj4sedona-backed `ST_Transform`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Connect to Spark through SedonaContext"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sedona.spark import SedonaContext\n",
+    "\n",
+    "config = 
SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n",
+    "sedona = SedonaContext.create(config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Synthesize the raster inputs\n",
+    "\n",
+    "Two 256×256 single-band rasters over a 0.1° AOI in eastern Iowa:\n",
+    "\n",
+    "- `slope.tif` — normalized 0-1, highest along the eastern edge (think: 
foothills rising toward the east).\n",
+    "- `fuel.tif` — normalized 0-1, highest along the northern edge (think: 
dense vegetation belt across the top of the AOI).\n",
+    "\n",
+    "Both written as **tiled** GeoTIFFs so the Sedona raster reader (which 
rejects strip-based files as \"too thin\") can ingest them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "import numpy as np\n",
+    "import rasterio\n",
+    "from rasterio.transform import from_bounds\n",
+    "\n",
+    "WORK = \"/tmp/fire-risk\"\n",
+    "os.makedirs(WORK, exist_ok=True)\n",
+    "\n",
+    "AOI = (-91.10, 41.50, -91.00, 41.60)  # (xmin, ymin, xmax, ymax) 
EPSG:4326\n",
+    "W = H = 256\n",
+    "transform = from_bounds(*AOI, W, H)\n",
+    "rng = np.random.default_rng(7)\n",
+    "\n",
+    "ys, xs = np.mgrid[0:H, 0:W]\n",
+    "slope = (xs / (W - 1)) + 0.05 * rng.standard_normal((H, W))\n",
+    "fuel = ((H - 1 - ys) / (H - 1)) + 0.05 * rng.standard_normal((H, W))\n",
+    "slope = slope.clip(0, 1).astype(\"float32\")\n",
+    "fuel = fuel.clip(0, 1).astype(\"float32\")\n",
+    "\n",
+    "for name, arr in [(\"slope\", slope), (\"fuel\", fuel)]:\n",
+    "    with rasterio.open(\n",
+    "        f\"{WORK}/{name}.tif\",\n",
+    "        \"w\",\n",
+    "        driver=\"GTiff\",\n",
+    "        tiled=True,\n",
+    "        blockxsize=256,\n",
+    "        blockysize=256,\n",
+    "        height=H,\n",
+    "        width=W,\n",
+    "        count=1,\n",
+    "        dtype=\"float32\",\n",
+    "        crs=\"EPSG:4326\",\n",
+    "        transform=transform,\n",
+    "    ) as dst:\n",
+    "        dst.write(arr, 1)\n",
+    "        dst.set_band_description(1, name)\n",
+    "    print(f\"{name}.tif: min={arr.min():.2f} max={arr.max():.2f} 
mean={arr.mean():.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Combine slope and fuel into a composite risk raster\n",
+    "\n",
+    "Load both rasters with the auto-tiling reader, keep the `(x, y)` 
tile-index columns, then join on those for a tile-aligned two-raster 
`RS_MapAlgebra`. The same query works for single-tile inputs (as here) or for a 
DEM-sized scene that yields thousands of tiles per layer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rasters = (\n",
+    "    sedona.read.format(\"raster\")\n",
+    "    .load(f\"{WORK}/*.tif\")\n",
+    "    .selectExpr(\"split(name, '\\\\\\\\.')[0] as layer\", \"x\", \"y\", 
\"rast\")\n",
+    ")\n",
+    "rasters.createOrReplaceTempView(\"rasters\")\n",
+    "\n",
+    "composite = sedona.sql(\"\"\"\n",
+    "    SELECT s.x, s.y,\n",
+    "           RS_MapAlgebra(\n",
+    "               s.rast, f.rast, 'D',\n",
+    "               'out[0] = 0.5 * rast0[0] + 0.5 * rast1[0];',\n",
+    "               -9999.0\n",
+    "           ) AS rast\n",
+    "    FROM (SELECT x, y, rast FROM rasters WHERE layer = 'slope') s\n",
+    "    JOIN (SELECT x, y, rast FROM rasters WHERE layer = 'fuel')  f\n",
+    "      ON s.x = f.x AND s.y = f.y\n",
+    "\"\"\")\n",
+    "composite.cache()\n",
+    "composite.createOrReplaceTempView(\"composite\")\n",
+    "composite.selectExpr(\n",
+    "    \"x\",\n",
+    "    \"y\",\n",
+    "    \"RS_BandPixelType(rast, 1) as dtype\",\n",
+    "    \"RS_Width(rast) as w\",\n",
+    "    \"RS_Height(rast) as h\",\n",
+    ").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Synthesize building footprints and road network\n",
+    "\n",
+    "Sixteen small square buildings on a 4×4 grid across the AOI, plus two 
roads (one east-west across the middle, one north-south down the centre). 
Buildings are 0.005° × 0.005° (~ 500 m × 500 m at this latitude); roads are 
simple `LINESTRING`s that bisect the AOI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pyspark.sql import Row\n",
+    "\n",
+    "xmin, ymin, xmax, ymax = AOI\n",
+    "step_x = (xmax - xmin) / 4\n",
+    "step_y = (ymax - ymin) / 4\n",
+    "half = 0.0025  # building half-width in degrees\n",
+    "\n",
+    "building_rows = []\n",
+    "for ix in range(4):\n",
+    "    for iy in range(4):\n",
+    "        cx = xmin + (ix + 0.5) * step_x\n",
+    "        cy = ymin + (iy + 0.5) * step_y\n",
+    "        wkt = (\n",
+    "            f\"POLYGON(({cx-half} {cy-half}, {cx+half} {cy-half}, \"\n",
+    "            f\"{cx+half} {cy+half}, {cx-half} {cy+half}, {cx-half} 
{cy-half}))\"\n",
+    "        )\n",
+    "        building_rows.append(Row(bid=f\"B{ix}{iy}\", wkt=wkt))\n",
+    "\n",
+    "buildings = sedona.createDataFrame(building_rows).selectExpr(\n",
+    "    \"bid\", \"ST_GeomFromText(wkt) as geom\"\n",
+    ")\n",

Review Comment:
   These building geometries are created without an SRID, so they keep SRID 0. 
When `top5` is written as GeoParquet later, Sedona will write `crs: null` 
instead of auto-generating PROJJSON metadata, which means the notebook does not 
actually produce the 4326 CRS metadata promised in section 6.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2700] Add 03-fire-risk-fusion notebook: raster + vector fusion [sedona]

Reply via email to