paleolimbot commented on code in PR #749:
URL: https://github.com/apache/sedona-db/pull/749#discussion_r3121100458
##########
rust/sedona-schema/src/raster.rs:
##########
@@ -16,34 +16,33 @@
// under the License.
use arrow_schema::{DataType, Field, FieldRef, Fields};
-/// Schema for storing raster data in Apache Arrow format.
-/// Utilizing nested structs and lists to represent raster metadata and bands.
+/// Schema for storing N-dimensional raster data in Apache Arrow format.
+///
+/// Each raster has a CRS, an affine transform, explicit spatial dimension
names
+/// (`x_dim`, `y_dim`), and a list of bands. Each band is an N-D chunk with
named
+/// dimensions, a shape, and optional strides for zero-copy slicing.
+///
+/// Legacy 2D rasters are represented as bands with `dim_names=["y","x"]` and
+/// `shape=[height, width]`.
#[derive(Debug, PartialEq, Clone)]
pub struct RasterSchema;
+
impl RasterSchema {
/// Returns the top-level fields for the raster schema structure.
pub fn fields() -> Fields {
Fields::from(vec![
- Field::new(column::METADATA, Self::metadata_type(), false),
- Field::new(column::CRS, Self::crs_type(), true), // Optional: may
be inferred from data
+ Field::new(column::CRS, Self::crs_type(), true),
+ Field::new(column::TRANSFORM, Self::transform_type(), false),
+ Field::new(column::X_DIM, DataType::Utf8View, false),
+ Field::new(column::Y_DIM, DataType::Utf8View, false),
Field::new(column::BANDS, Self::bands_type(), true),
])
}
- /// Raster metadata schema
- pub fn metadata_type() -> DataType {
- DataType::Struct(Fields::from(vec![
- // Raster dimensions
- Field::new(column::WIDTH, DataType::UInt64, false),
- Field::new(column::HEIGHT, DataType::UInt64, false),
- // Geospatial transformation parameters
- Field::new(column::UPPERLEFT_X, DataType::Float64, false),
- Field::new(column::UPPERLEFT_Y, DataType::Float64, false),
- Field::new(column::SCALE_X, DataType::Float64, false),
- Field::new(column::SCALE_Y, DataType::Float64, false),
- Field::new(column::SKEW_X, DataType::Float64, false),
- Field::new(column::SKEW_Y, DataType::Float64, false),
- ]))
+ /// Affine transform schema — 6-element GDAL GeoTransform:
+ /// `[origin_x, scale_x, skew_x, origin_y, skew_y, scale_y]`
+ pub fn transform_type() -> DataType {
+ DataType::List(FieldRef::new(Field::new("item", DataType::Float64,
false)))
}
Review Comment:
> Sorry I'm not clear what you want me to do.
- Change `Field::new(column::X_DIM, DataType::Utf8View, false),
Field::new(column::Y_DIM, DataType::Utf8View, false)` to
`Field::new(column::DIMS, DataType::new_list(DataType::Utf8View), false)`. This
be of length 2 for now and you don't have to change your Rust wrapper.
- Your Arrow representation of the transform is already Z-ready (it would
have 12 values instead of 6 in the future where Z is supported).
- Add `Field::new(column::SHAPE, DataType::new_list(DataType::Int64),
false)`, which is the number of values in each of `column::DIMS`, in the same
order. I may be reading it wrong but I think you currently have to peek into
the first band for this. I think it is clearer to keep the spatial source of
truth at the raster level and then validate every band against that (the
transform and the height/width are very closely related to each other and I
think having them at the same level makes sense). Also having a zero-band
raster is surprisingly useful (e.g., gdal uses this as a way to specify a
target grid to use for the result of an operation like warp).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]