james-willis opened a new issue, #746:
URL: https://github.com/apache/sedona-db/issues/746
### Problem
SedonaDB's raster type models data as 2D spatial grids (width × height) with
bands as a flat list. This can't represent multi-dimensional
geospatial datasets — climate models with time dimensions, hyperspectral
imagery, atmospheric profiles with pressure levels, or Zarr/NetCDF datacubes.
Users must flatten into 2D+band (losing semantics) or leave SedonaDB.
### Approach
Upgrade each band's data from a 2D tile to an N-D chunk with named
dimensions and shape. The band/variable structure is preserved — a Zarr
variable or GeoTIFF band maps to a band, but each band can now have shape
`[time=12, y=256, x=256]` instead of just `[y=256, x=256]`. Legacy rasters load
as bands with shape `[y, x]` — zero change for existing workloads.
### Key decisions
1. **Band = variable, each band is N-D** — Zarr's `temperature`, `pressure`,
`wind_u` become 3 bands, each an N-D chunk. GeoTIFF bands map directly. Band
math (`in[0]`, `in[1]`) unchanged.
2. **Named dimensions per band** — Each band stores `dim_names` + `shape`.
`y`/`x` (or `lat`/`lon`) are the spatial axes with hard-coded meaning; all
others (`time`, `wavelength`, `pressure`, ...) are arbitrary. Bands may have
different dimension sets but must agree on shared dimension sizes.
3. **`RS_DimToBand` / `RS_BandToDim`** — Bridge between "everything is a
dimension" (Zarr) and the band model. `RS_DimToBand(raster, 'wavelength')`
promotes a within-band dimension into separate bands so standard band math
works.
4. **Two execution paths** — Native impls for metadata, coordinate
conversion, predicates, and new N-D functions. GDAL-backed impls for
compute-heavy spatial ops (clip, zonal stats, map algebra) — these extract
y/x slices and operate per spatial slice.
5. **Single schema** — Legacy 2D schema retired. All loaders produce N-D
layout directly. No runtime schema detection needed.
6. **Trait-based band storage** — `NdBandRef` trait with `nd_buffer()`
(returns raw buffer + shape + strides for zero-copy access) and
`contiguous_data()` (flat bytes, copies only if strided). Implementations:
`InMemoryBand` (Phase 1), `ZarrBand` + `LazySlicedBand` (Phase 2),
`GeoTiffBand` (Phase 2/3). Strided views are just `InMemoryBand` with
non-standard strides — Arrow BinaryView refcounting handles lifetime.
7. **Affine transform** — Single `transform: List<Float64>` (GDAL
GeoTransform convention) at raster level. Applies to y/x dims only.
8. **OutDb references** — Single `outdb_uri` field per band with
scheme-based dispatch (`zarr://...`, `geotiff://...`).
### Arrow schema
```rust
Struct {
crs: Utf8View,
transform: List, -- [origin_x, scale_x, skew_x, origin_y, skew_y,
scale_y]
bands: List<Struct {
name: Utf8, -- e.g. "temperature" (nullable)
dim_names: List, -- ["time", "y", "x"]
shape: List, -- [12, 256, 256]
data_type: UInt32,
nodata: Binary,
strides: List, -- per-dim byte strides
offset: UInt64,
outdb_uri: Utf8, -- "zarr://s3://bucket/store#var/0.0.0"
(nullable)
data: BinaryView, -- row-major N-D array (eager) or empty
(lazy)
}>
}
```
### Core traits
```rust
pub struct NdBuffer<'a> {
pub buffer: &'a [u8],
pub shape: &'a [u64],
pub strides: &'a [i64],
pub offset: u64,
pub data_type: BandDataType,
}
pub trait NdBandRef {
fn ndim(&self) -> usize;
fn dim_names(&self) -> &[&str];
fn shape(&self) -> &[u64];
fn dim_size(&self, name: &str) -> Option<u64>;
fn data_type(&self) -> BandDataType;
fn nodata(&self) -> Option<&[u8]>;
/// Raw buffer + strides — for zero-copy consumers (numpy, Arrow FFI).
/// Triggers load for lazy impls.
fn nd_buffer(&self) -> Result<NdBuffer<'_>>;
/// Contiguous row-major bytes — copies only if strides are non-standard.
/// Most RS_* functions use this and never think about strides.
fn contiguous_data(&self) -> Result<Cow<'_, [u8]>>;
}
```
## Phases
Phase 1 (this issue): N-D schema, NdRasterRef/NdBandRef traits,
InMemoryBand, reimplement all 33 SedonaDB RS_* functions against traits, new
N-D functions (RS_NumDimensions, RS_DimNames, RS_DimSize, RS_Shape, RS_Slice,
RS_DimToBand, RS_BandToDim). Strides always contiguous. Crates: sedona-schema,
sedona-raster, sedona-raster-functions.
Phase 2: Zarr I/O. Add ZarrBand (lazy load on first access) and
LazySlicedBand (wraps lazy band + slice spec so RS_DimToBand stays lazy).
Chunk-level caching inside impls. `RS_NormalizedDifference(RS_DimToBand(data,
'wavelength'), 77, 54)` loads only the chunks for wavelengths 77 and 54.
Phase 3: N-D aggregations (reduce along a dimension), coordinate label
arrays, dimension algebra.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]