james-willis opened a new pull request, #849:
URL: https://github.com/apache/sedona-db/pull/849

   ## Summary
   
   Make `BandRef::nd_buffer()`, `contiguous_data()`, and `data()` Just Work for 
OutDb bands (bands whose Arrow `data` column is empty and whose `outdb_uri` 
points elsewhere). Today those methods return `NotYetImplemented`, blocking 
every UDF and downstream consumer from reading OutDb pixel bytes — the entire 
reason OutDb references exist as a schema feature is moot without a working 
byte path.
   
   Approach: a single statically-typed function-pointer hook in 
`sedona-raster`, populated by `sedona-raster-gdal` at session bootstrap. No 
trait, no HashMap-keyed registry, no async — \"compiled in, not pluggable\". 
The only abstraction introduced is the one the crate-boundary forces 
(`sedona-raster-gdal` depends on `sedona-raster`, not the reverse, so a direct 
call from `BandRefImpl` to GDAL is impossible).
   
   ## Changes
   
   - **New** `rust/sedona-raster/src/outdb_loader.rs` — `OutDbLoadRequest`, 
`OutDbBandLoader` fn-pointer type, `OnceLock`, `set_outdb_band_loader`, 
internal `load_outdb` dispatcher.
   - **Modified** `rust/sedona-raster/src/array.rs` — `BandRefImpl` gains 
`outdb_loaded: OnceCell<Vec<u8>>` (lifetime anchor, not a cross-band cache). 
`nd_buffer()`, `contiguous_data()`, `data()` route through a new 
`source_bytes()` helper that zero-copies from the Arrow `data` column for 
schema-InDb bands and falls back to the loader hook (anchored in 
`outdb_loaded`) for schema-OutDb bands. The legacy `data()` accessor collapses 
errors to `&[]` to preserve the pre-N-D contract.
   - **New** `rust/sedona-raster-gdal/src/outdb_loader.rs` — `gdal_load` impl 
using the existing `GDALDatasetCache::get_or_create_outdb_source` (thread-local 
LRU, VSI translation, `#band=N` fragment handling). `pub fn 
register_outdb_loader()` registers it into the hook.
   - **Modified** `rust/sedona/src/context.rs` — calls 
`sedona_raster_gdal::register_outdb_loader()` from 
`SedonaContext::new_from_context()` next to the existing function-set 
registration.
   
   ## Tests
   
   `sedona-raster` (mock loader): five `outdb_loader` unit tests cover loader 
registration, returned-bytes round-trip, per-band caching, missing-uri error, 
undersized-loader-output error, and loader-failure propagation.
   
   `sedona-raster-gdal` (real GDAL): three integration tests write tiny 
GeoTIFFs to a temp dir and verify the end-to-end byte path:
   
   - `loads_outdb_band_bytes_from_geotiff` — 4×3 single-band tiff, 
`band.nd_buffer().buffer` matches the file bytes.
   - `second_call_on_same_band_reuses_cache` — verifies 
`BandRefImpl.outdb_loaded` reuse on a second `nd_buffer()` call.
   - `band_fragment_selects_correct_band` — two-band tiff with `#band=N` 
fragment selects the correct band.
   
   ## Relationship to PR #813 (view machinery)
   
   This PR is **independent of PR-D** (#813 — view machinery / `materialized` 
cell / strided walk) and based directly on `main`. The view-composition path 
remains rejected at construction in `RasterRefImpl::band()`, so the OutDb byte 
path here only needs to handle the identity-view case.
   
   If PR-D lands first, this PR will need a small follow-up to integrate OutDb 
bytes with the strided walk in `data()` (the byte-access surface is the soft 
conflict — both PRs rewrite `data()` / `nd_buffer()` / `contiguous_data()`). If 
this PR lands first, PR-D folds the second `OnceCell` into its existing 
materialization pattern. Either order is manageable; there is no hard 
duplication.
   
   ## Test plan
   
   - [x] \`cargo test -p sedona-raster --lib\` (74 passing)
   - [x] \`cargo test -p sedona-raster-gdal --lib outdb_loader\` (3 passing)
   - [x] \`cargo clippy --all-targets -p sedona-raster -p sedona-raster-gdal -p 
sedona -- -D warnings\`
   - [x] \`cargo fmt --all -- --check\`
   - [ ] Smoke test: end-to-end SQL against an OutDb raster reading bytes 
through an `RS_*` kernel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to