paleolimbot commented on code in PR #697: URL: https://github.com/apache/sedona-db/pull/697#discussion_r3024679001
########## c/sedona-gdal/src/gdal.rs: ########## @@ -0,0 +1,237 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! High-level convenience wrapper around [`GdalApi`]. +//! +//! [`Gdal`] bundles a `&'static GdalApi` reference and exposes ergonomic +//! methods that delegate to the lower-level constructors and free functions +//! scattered across the crate, eliminating the need to pass `api` explicitly +//! at every call site. + +use crate::config; +use crate::dataset::Dataset; +use crate::driver::{Driver, DriverManager}; +use crate::errors::Result; +use crate::gdal_api::GdalApi; +use crate::gdal_dyn_bindgen::{GDALOpenFlags, OGRFieldType}; +use crate::mem::create_mem_dataset; +use crate::raster::polygonize::{polygonize, PolygonizeOptions}; +use crate::raster::rasterband::RasterBand; +use crate::raster::rasterize::{rasterize, RasterizeOptions}; +use crate::raster::rasterize_affine::rasterize_affine; +use crate::raster::types::DatasetOptions; +use crate::raster::types::GdalDataType; +use crate::spatial_ref::SpatialRef; +use crate::vector::feature::FieldDefn; +use crate::vector::geometry::Geometry; +use crate::vector::layer::Layer; +use crate::vrt::VrtDataset; +use crate::vsi; + +/// High-level convenience wrapper around [`GdalApi`]. +/// +/// Stores a `&'static GdalApi` reference and provides ergonomic methods that +/// delegate to the various constructors and free functions in the crate. +pub struct Gdal { + api: &'static GdalApi, +} + +impl Gdal { + /// Create a `Gdal` wrapper for the given API reference. + pub(crate) fn new(api: &'static GdalApi) -> Self { + Self { api } + } + + // -- Info ---------------------------------------------------------------- + + /// Return the name of the loaded GDAL library. + pub fn name(&self) -> &str { + self.api.name() + } + + /// Fetch GDAL version information for a standard request key. + pub fn version_info(&self, request: &str) -> String { Review Comment: These are all a bit light on the documentation (maybe they could link to the underlying implementation's docs which I seem to remember have a bit more depth?). Also OK to defer since this is mostly intended to be for internal usage. ########## c/sedona-gdal/src/gdal.rs: ########## @@ -0,0 +1,237 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! High-level convenience wrapper around [`GdalApi`]. +//! +//! [`Gdal`] bundles a `&'static GdalApi` reference and exposes ergonomic +//! methods that delegate to the lower-level constructors and free functions +//! scattered across the crate, eliminating the need to pass `api` explicitly +//! at every call site. + +use crate::config; +use crate::dataset::Dataset; +use crate::driver::{Driver, DriverManager}; +use crate::errors::Result; +use crate::gdal_api::GdalApi; +use crate::gdal_dyn_bindgen::{GDALOpenFlags, OGRFieldType}; +use crate::mem::create_mem_dataset; +use crate::raster::polygonize::{polygonize, PolygonizeOptions}; +use crate::raster::rasterband::RasterBand; +use crate::raster::rasterize::{rasterize, RasterizeOptions}; +use crate::raster::rasterize_affine::rasterize_affine; +use crate::raster::types::DatasetOptions; +use crate::raster::types::GdalDataType; +use crate::spatial_ref::SpatialRef; +use crate::vector::feature::FieldDefn; +use crate::vector::geometry::Geometry; +use crate::vector::layer::Layer; +use crate::vrt::VrtDataset; +use crate::vsi; + +/// High-level convenience wrapper around [`GdalApi`]. +/// +/// Stores a `&'static GdalApi` reference and provides ergonomic methods that +/// delegate to the various constructors and free functions in the crate. +pub struct Gdal { + api: &'static GdalApi, +} + +impl Gdal { Review Comment: I do like the idea of a high-level object that dishes out other safe high-level objects, although putting it all here involves repeating some documentation (or omitting it) and I'm not sure it's needed given that the constructors it refers to are pub anyway. Reexporting the structs or functions that are part of the intended external API might be another way to expose the API without loosing the docs? ########## c/sedona-gdal/src/dataset.rs: ########## @@ -300,11 +300,11 @@ impl Dataset { /// /// # Safety /// - /// `data_ptr` must point to valid band data that outlives this dataset. + /// `data_ptr` must point to valid mutable band data that outlives this dataset. pub unsafe fn add_band_with_data( &self, data_type: RustGdalDataType, - data_ptr: *const u8, + data_ptr: *mut u8, Review Comment: Is this actually required to be mutable? (This seems like a strange constraint, and also one that will result in it not being particularly useful to us) ########## c/sedona-gdal/src/mem.rs: ########## @@ -0,0 +1,441 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! High-level builder for creating in-memory (MEM) GDAL datasets. +//! +//! [`MemDatasetBuilder`] provides a fluent, type-safe API for constructing GDAL MEM +//! datasets with zero-copy band attachment, optional geo-transform, projection, and +//! per-band nodata values. + +use crate::dataset::Dataset; +use crate::errors::Result; +use crate::gdal::Gdal; +use crate::gdal_api::{call_gdal_api, GdalApi}; +use crate::gdal_dyn_bindgen::CE_Failure; +use crate::raster::types::GdalDataType; + +/// Nodata value for a raster band. +/// +/// GDAL has three separate APIs for setting nodata depending on the band data type: +/// - [`f64`] for most types (UInt8 through Float64, excluding Int64/UInt64) +/// - [`i64`] for Int64 bands +/// - [`u64`] for UInt64 bands +/// +/// This enum encapsulates the three nodata value representations exposed by GDAL. +#[derive(Debug, Clone, Copy, PartialEq)] +pub enum Nodata { + F64(f64), + I64(i64), + U64(u64), +} + +/// A band specification for [`MemDatasetBuilder`]. +struct MemBand { + data_type: GdalDataType, + data_ptr: *mut u8, + pixel_offset: Option<i64>, + line_offset: Option<i64>, + nodata: Option<Nodata>, +} + +/// A builder for constructing in-memory (MEM) GDAL datasets. +/// +/// This creates datasets using `MEMDataset::Create` (bypassing GDAL's open-dataset-list +/// mutex for better concurrency) and attaches bands via `GDALAddBand` with `DATAPOINTER` +/// options for zero-copy operation. +/// +/// # Safety +/// +/// All `add_band*` methods are `unsafe` because the caller must ensure that the +/// provided data pointers remain valid for the lifetime of the built [`Dataset`], +/// satisfy the alignment requirements of the band data type, and refer to writable +/// memory if GDAL may write through the attached `DATAPOINTER` band. +pub struct MemDatasetBuilder { + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: Option<GdalDataType>, + bands: Vec<MemBand>, + geo_transform: Option<[f64; 6]>, + projection: Option<String>, +} + +impl MemDatasetBuilder { + /// Create a builder for a MEM dataset with the given dimensions. + pub fn new(width: usize, height: usize) -> Self { + Self { + width, + height, + n_owned_bands: 0, + owned_bands_data_type: None, + bands: Vec::new(), + geo_transform: None, + projection: None, + } + } + + /// Create a builder for a MEM dataset with GDAL-owned bands. + pub fn new_with_owned_bands( + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: GdalDataType, + ) -> Self { + Self { + width, + height, + n_owned_bands, + owned_bands_data_type: Some(owned_bands_data_type), + bands: Vec::new(), + geo_transform: None, + projection: None, + } + } + + /// Create a MEM dataset with GDAL-owned bands. + /// This is a safe shortcut for `new_with_owned_bands(...).build(gdal)`. + pub fn create( + gdal: &Gdal, + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: GdalDataType, + ) -> Result<Dataset> { + // SAFETY: `new_with_owned_bands` creates a builder with zero external bands, + // so no data pointers need to outlive the dataset. + unsafe { + Self::new_with_owned_bands(width, height, n_owned_bands, owned_bands_data_type) + .build(gdal) + } + } + + /// Add a zero-copy band from a raw data pointer. + /// Use default contiguous row-major pixel and line offsets. + /// + /// # Safety + /// + /// The caller must ensure `data_ptr` points to a valid buffer of at least + /// `height * width * data_type.byte_size()` bytes, is properly aligned for + /// `data_type`, and outlives the built [`Dataset`]. + pub unsafe fn add_band(self, data_type: GdalDataType, data_ptr: *mut u8) -> Self { + self.add_band_with_options(data_type, data_ptr, None, None, None) + } Review Comment: Are you sure the data pointer has to be mutable? (This would preclude backing a GDAL mem dataset with Arrow data?) ########## c/sedona-gdal/src/mem.rs: ########## @@ -0,0 +1,441 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! High-level builder for creating in-memory (MEM) GDAL datasets. +//! +//! [`MemDatasetBuilder`] provides a fluent, type-safe API for constructing GDAL MEM +//! datasets with zero-copy band attachment, optional geo-transform, projection, and +//! per-band nodata values. + +use crate::dataset::Dataset; +use crate::errors::Result; +use crate::gdal::Gdal; +use crate::gdal_api::{call_gdal_api, GdalApi}; +use crate::gdal_dyn_bindgen::CE_Failure; +use crate::raster::types::GdalDataType; + +/// Nodata value for a raster band. +/// +/// GDAL has three separate APIs for setting nodata depending on the band data type: +/// - [`f64`] for most types (UInt8 through Float64, excluding Int64/UInt64) +/// - [`i64`] for Int64 bands +/// - [`u64`] for UInt64 bands +/// +/// This enum encapsulates the three nodata value representations exposed by GDAL. +#[derive(Debug, Clone, Copy, PartialEq)] +pub enum Nodata { + F64(f64), + I64(i64), + U64(u64), +} + +/// A band specification for [`MemDatasetBuilder`]. +struct MemBand { + data_type: GdalDataType, + data_ptr: *mut u8, + pixel_offset: Option<i64>, + line_offset: Option<i64>, + nodata: Option<Nodata>, +} + +/// A builder for constructing in-memory (MEM) GDAL datasets. +/// +/// This creates datasets using `MEMDataset::Create` (bypassing GDAL's open-dataset-list +/// mutex for better concurrency) and attaches bands via `GDALAddBand` with `DATAPOINTER` +/// options for zero-copy operation. +/// +/// # Safety +/// +/// All `add_band*` methods are `unsafe` because the caller must ensure that the +/// provided data pointers remain valid for the lifetime of the built [`Dataset`], +/// satisfy the alignment requirements of the band data type, and refer to writable +/// memory if GDAL may write through the attached `DATAPOINTER` band. +pub struct MemDatasetBuilder { + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: Option<GdalDataType>, + bands: Vec<MemBand>, + geo_transform: Option<[f64; 6]>, + projection: Option<String>, +} + +impl MemDatasetBuilder { + /// Create a builder for a MEM dataset with the given dimensions. + pub fn new(width: usize, height: usize) -> Self { + Self { + width, + height, + n_owned_bands: 0, + owned_bands_data_type: None, + bands: Vec::new(), + geo_transform: None, + projection: None, + } + } + + /// Create a builder for a MEM dataset with GDAL-owned bands. + pub fn new_with_owned_bands( + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: GdalDataType, + ) -> Self { + Self { + width, + height, + n_owned_bands, + owned_bands_data_type: Some(owned_bands_data_type), + bands: Vec::new(), + geo_transform: None, + projection: None, + } + } + + /// Create a MEM dataset with GDAL-owned bands. + /// This is a safe shortcut for `new_with_owned_bands(...).build(gdal)`. + pub fn create( + gdal: &Gdal, + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: GdalDataType, + ) -> Result<Dataset> { + // SAFETY: `new_with_owned_bands` creates a builder with zero external bands, + // so no data pointers need to outlive the dataset. + unsafe { + Self::new_with_owned_bands(width, height, n_owned_bands, owned_bands_data_type) + .build(gdal) + } + } + + /// Add a zero-copy band from a raw data pointer. + /// Use default contiguous row-major pixel and line offsets. + /// + /// # Safety + /// + /// The caller must ensure `data_ptr` points to a valid buffer of at least + /// `height * width * data_type.byte_size()` bytes, is properly aligned for + /// `data_type`, and outlives the built [`Dataset`]. + pub unsafe fn add_band(self, data_type: GdalDataType, data_ptr: *mut u8) -> Self { + self.add_band_with_options(data_type, data_ptr, None, None, None) + } + + /// Add a zero-copy band with custom offsets and optional nodata. + /// + /// # Safety + /// + /// The caller must ensure `data_ptr` points to a valid buffer of sufficient size + /// for the given dimensions and offsets, is properly aligned for `data_type`, and + /// outlives the built [`Dataset`]. + pub unsafe fn add_band_with_options( + mut self, + data_type: GdalDataType, + data_ptr: *mut u8, + pixel_offset: Option<i64>, + line_offset: Option<i64>, + nodata: Option<Nodata>, + ) -> Self { + self.bands.push(MemBand { + data_type, + data_ptr, + pixel_offset, + line_offset, + nodata, + }); + self + } Review Comment: This is so cool ########## c/sedona-gdal/src/mem.rs: ########## @@ -0,0 +1,441 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! High-level builder for creating in-memory (MEM) GDAL datasets. +//! +//! [`MemDatasetBuilder`] provides a fluent, type-safe API for constructing GDAL MEM +//! datasets with zero-copy band attachment, optional geo-transform, projection, and +//! per-band nodata values. + +use crate::dataset::Dataset; +use crate::errors::Result; +use crate::gdal::Gdal; +use crate::gdal_api::{call_gdal_api, GdalApi}; +use crate::gdal_dyn_bindgen::CE_Failure; +use crate::raster::types::GdalDataType; + +/// Nodata value for a raster band. +/// +/// GDAL has three separate APIs for setting nodata depending on the band data type: +/// - [`f64`] for most types (UInt8 through Float64, excluding Int64/UInt64) +/// - [`i64`] for Int64 bands +/// - [`u64`] for UInt64 bands +/// +/// This enum encapsulates the three nodata value representations exposed by GDAL. +#[derive(Debug, Clone, Copy, PartialEq)] +pub enum Nodata { + F64(f64), + I64(i64), + U64(u64), +} + +/// A band specification for [`MemDatasetBuilder`]. +struct MemBand { + data_type: GdalDataType, + data_ptr: *mut u8, + pixel_offset: Option<i64>, + line_offset: Option<i64>, + nodata: Option<Nodata>, +} + +/// A builder for constructing in-memory (MEM) GDAL datasets. +/// +/// This creates datasets using `MEMDataset::Create` (bypassing GDAL's open-dataset-list +/// mutex for better concurrency) and attaches bands via `GDALAddBand` with `DATAPOINTER` +/// options for zero-copy operation. +/// +/// # Safety +/// +/// All `add_band*` methods are `unsafe` because the caller must ensure that the +/// provided data pointers remain valid for the lifetime of the built [`Dataset`], +/// satisfy the alignment requirements of the band data type, and refer to writable +/// memory if GDAL may write through the attached `DATAPOINTER` band. +pub struct MemDatasetBuilder { + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: Option<GdalDataType>, + bands: Vec<MemBand>, + geo_transform: Option<[f64; 6]>, + projection: Option<String>, +} + +impl MemDatasetBuilder { + /// Create a builder for a MEM dataset with the given dimensions. + pub fn new(width: usize, height: usize) -> Self { + Self { + width, + height, + n_owned_bands: 0, + owned_bands_data_type: None, + bands: Vec::new(), + geo_transform: None, + projection: None, + } + } + + /// Create a builder for a MEM dataset with GDAL-owned bands. + pub fn new_with_owned_bands( + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: GdalDataType, + ) -> Self { + Self { + width, + height, + n_owned_bands, + owned_bands_data_type: Some(owned_bands_data_type), + bands: Vec::new(), + geo_transform: None, + projection: None, + } + } + + /// Create a MEM dataset with GDAL-owned bands. + /// This is a safe shortcut for `new_with_owned_bands(...).build(gdal)`. + pub fn create( + gdal: &Gdal, + width: usize, + height: usize, + n_owned_bands: usize, + owned_bands_data_type: GdalDataType, + ) -> Result<Dataset> { + // SAFETY: `new_with_owned_bands` creates a builder with zero external bands, + // so no data pointers need to outlive the dataset. + unsafe { + Self::new_with_owned_bands(width, height, n_owned_bands, owned_bands_data_type) + .build(gdal) + } + } + + /// Add a zero-copy band from a raw data pointer. + /// Use default contiguous row-major pixel and line offsets. + /// + /// # Safety + /// + /// The caller must ensure `data_ptr` points to a valid buffer of at least + /// `height * width * data_type.byte_size()` bytes, is properly aligned for + /// `data_type`, and outlives the built [`Dataset`]. + pub unsafe fn add_band(self, data_type: GdalDataType, data_ptr: *mut u8) -> Self { + self.add_band_with_options(data_type, data_ptr, None, None, None) + } + + /// Add a zero-copy band with custom offsets and optional nodata. + /// + /// # Safety + /// + /// The caller must ensure `data_ptr` points to a valid buffer of sufficient size + /// for the given dimensions and offsets, is properly aligned for `data_type`, and + /// outlives the built [`Dataset`]. + pub unsafe fn add_band_with_options( + mut self, + data_type: GdalDataType, + data_ptr: *mut u8, + pixel_offset: Option<i64>, + line_offset: Option<i64>, + nodata: Option<Nodata>, + ) -> Self { + self.bands.push(MemBand { + data_type, + data_ptr, + pixel_offset, + line_offset, + nodata, + }); + self + } + + /// Set the dataset geotransform coefficients. + pub fn geo_transform(mut self, gt: [f64; 6]) -> Self { + self.geo_transform = Some(gt); + self + } + + /// Set the dataset projection definition string. + pub fn projection(mut self, wkt: impl Into<String>) -> Self { + self.projection = Some(wkt.into()); Review Comment: Is this actually WKT or can this by anything? (It may be less error prone to accept anything or an explicit spatialref object) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
