james-willis opened a new pull request, #812:
URL: https://github.com/apache/sedona-db/pull/812
## Summary
Adds `parse_outdb_source(uri) -> (Cow<str>, u32)` in
`rust/sedona-raster-gdal/src/source_uri.rs`. Given an outdb URI it
returns the GDAL-side URI plus a 1-based source band index, with two
recognised forms:
- **SedonaDB convention** — `<uri>#band=N` → `(uri without fragment, N)`.
`rsplit_once('#')` is used so a trailing `#band=N` wins over an earlier
`#anchor` in the URI.
- **GDAL native subdataset URI** — `HDF5:\"x.h5\":/var`,
`NETCDF:\"file.nc\":var`, `GTIFF_DIR:N:foo.tif`, … → `(uri verbatim, 1)`.
We don't try to interpret the GDAL grammar; we just pass it through.
- **Plain URI / no fragment / malformed fragment** → `(uri, 1)`.
Returns `Cow::Borrowed` for both common cases (no allocation when the
input has no fragment, and only the trimmed prefix is borrowed when a
`#band=N` is present).
This PR is **purely additive** — no existing call sites are rewired.
The first caller will appear in a follow-up that ports the GDAL loader
to read the source band selector from `outdb_uri` instead of a separate
column. Landing the parser separately keeps the URI-grammar review
small and isolated.
## Why encode the source band index inside `outdb_uri`?
[#787](https://github.com/apache/sedona-db/pull/787) introduced an
`outdb_band_id: u32` column alongside `outdb_url: Utf8`. The follow-up
work on N-D rasters needs the schema to also support GDAL native
subdataset URIs (HDF5 groups, multi-page GeoTIFFs, NetCDF variables).
Those URIs already carry the source-selector inline, so a typed
sibling column would be redundant *and* couldn't represent them.
Folding everything into one `outdb_uri` field plus this parser keeps
the schema honest about what the URI is — \"the thing GDAL opens\" —
and lets every call site go through one single source of truth for
fragment + GDAL-subdataset conventions.
## Test plan
- [x] `cargo test -p sedona-raster-gdal` — 43 unit tests + 1 doctest pass
- [x] `cargo build -p sedona-schema -p sedona-raster -p sedona-raster-gdal
-p sedona-raster-functions -p sedona-testing`
- [x] `cargo test -p sedona-schema -p sedona-raster -p sedona-raster-gdal
-p sedona-raster-functions -p sedona-testing` — 315 unit tests + 4 doctests, 0
failures
- [x] `cargo clippy -p sedona-schema -p sedona-raster -p sedona-raster-gdal
-p sedona-raster-functions -p sedona-testing --all-targets -- -D warnings` —
clean
- [x] `cargo fmt --all --check` — clean
- [x] `pre-commit run --files rust/sedona-raster-gdal/src/source_uri.rs
rust/sedona-raster-gdal/src/lib.rs` — clean
The 23 unit tests in `source_uri::tests` cover: default band 1; explicit
`band=N`; `band=0` and negative/overflow/non-numeric values pass through
to band 1; URL query strings preserved before fragment; local paths;
HDF5/NETCDF/GTIFF_DIR subdataset passthrough; trailing `#band=N` winning
over earlier `#anchor`; empty URI; `Cow::Borrowed` invariants.
Cc @kristinholmquist for visibility — the parser will become the
single read-site for the `#band=N` fragment convention introduced in
#787, and is the spot to push back if any URI shapes you expect to
flow through aren't accepted yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]