Re: [PR] feat!: Support enums and tuples in `SchemaAwareSerializer` and implement `SchemaAwareDeserializer` [avro-rs]

via GitHub Tue, 31 Mar 2026 06:42:45 -0700


martin-g commented on code in PR #512:
URL: https://github.com/apache/avro-rs/pull/512#discussion_r3015908596



##########
avro/src/reader/single_object.rs:
##########
@@ -315,11 +354,11 @@ mod tests {
     fn avro_rs_164_generic_reader_alternate_header() -> TestResult {
         let schema_uuid = 
Uuid::parse_str("b2f1cf00-0434-013e-439a-125eb8485a5f")?;
         let header_builder = GlueSchemaUuidHeader::from_uuid(schema_uuid);
-        let generic_reader = 
GenericSingleObjectReader::new_with_header_builder(
-            TestSingleObjectReader::get_schema(),
-            header_builder,
-        )
-        .expect("failed to build reader");
+        let generic_reader = GenericSingleObjectReader::builder()
+            .schema(TestSingleObjectReader::get_schema())
+            .header(header_builder.build_header())
+            .build()
+            .expect("failed to build reader");
         let data_to_read: Vec<u8> = vec![
             3, 0, 178, 241, 207, 0, 4, 52, 1, 62, 67, 154, 18, 94, 184, 72, 
90, 95,
         ];

Review Comment:
   The `matches!(read_result, Err(Details::ReadBytes(_)));` below should 
probably be wrapped with `assert!(...)`
   GitHub UI does not allow me to add a suggestion



##########
avro/src/reader/block.rs:
##########
@@ -205,6 +215,44 @@ impl<'r, R: Read> Block<'r, R> {
         Ok(Some(item))
     }
 
+    pub(super) fn read_next_deser<T: DeserializeOwned>(
+        &mut self,
+        read_schema: Option<&Schema>,

Review Comment:
   ```suggestion
           reader_schema: Option<&Schema>,
   ```
   for consistency 



##########
avro/src/schema/union.rs:
##########
@@ -74,7 +81,116 @@ impl UnionSchema {
         &self.schemas
     }
 
-    /// Returns true if the any of the variants of this `UnionSchema` is 
`Null`.
+    /// Get the variant at the given index.
+    pub fn get_variant(&self, index: usize) -> Result<&Schema, Error> {
+        self.schemas.get(index).ok_or_else(|| {
+            Details::GetUnionVariant {
+                index: index as i64,
+                num_variants: self.schemas.len(),
+            }
+            .into()
+        })
+    }
+
+    /// Get the index of the provided [`SchemaKind`].
+    ///
+    /// The schema must not be a logical type, as only the base type are saved 
in the lookup index.
+    pub(crate) fn index_of_schema_kind(&self, kind: SchemaKind) -> 
Option<usize> {
+        self.variant_index.get(&kind).copied()
+    }
+
+    /// Get the index and schema for the provided name.
+    ///
+    /// Will use `names` to resolve references.
+    pub(crate) fn find_named_schema<'s>(
+        &'s self,
+        name: &str,
+        names: &'s HashMap<Name, impl Borrow<Schema>>,
+    ) -> Result<Option<(usize, &'s Schema)>, Error> {
+        for index in self.named_index.iter().copied() {
+            let schema = &self.schemas[index];
+            if let Some(schema_name) = schema.name()
+                && schema_name.name() == name

Review Comment:
   Shouldn't this use the fully qualified names ?



##########
avro/src/serde/with.rs:
##########
@@ -503,11 +502,323 @@ pub mod slice_opt {
     }
 }
 
+/// (De)serialize [`BigDecimal`] as a [`Schema::BigDecimal`] instead of a 
[`Schema::String`].
+///
+/// This module is intended to be used through the Serde `with` attribute.
+///
+/// Use [`apache_avro::serde::bigdecimal_opt`] for optional big decimals 
values.
+///
+/// When used with different serialization formats, this will write bytes.
+///
+/// See usage with below example:
+/// ```
+/// # use apache_avro::{AvroSchema, BigDecimal};
+/// # use serde::{Deserialize, Serialize};
+/// #[derive(AvroSchema, Serialize, Deserialize)]
+/// struct StructWithBigDecimal {
+///     #[avro(with)]
+///     #[serde(with = "apache_avro::serde::bigdecimal")]
+///     decimal: BigDecimal,
+/// }
+/// ```
+///
+/// [`BigDecimal`]: ::bigdecimal::BigDecimal
+/// [`Schema::BigDecimal`]: crate::Schema::BigDecimal
+/// [`Schema::String`]: crate::Schema::String
+/// [`apache_avro::serde::bigdecimal_opt`]: bigdecimal_opt
+pub mod bigdecimal {
+    use std::collections::HashSet;
+
+    use bigdecimal::BigDecimal;
+    use serde::{Deserializer, Serializer, de::Error as _, ser::Error as _};
+
+    use crate::{
+        Schema,
+        bigdecimal::{big_decimal_as_bytes, deserialize_big_decimal},
+        schema::{Name, NamespaceRef, RecordField},
+        serde::with::BytesType,
+    };
+
+    /// Returns [`Schema::BigDecimal`]
+    pub fn get_schema_in_ctxt(_: &mut HashSet<Name>, _: NamespaceRef) -> 
Schema {
+        Schema::BigDecimal
+    }
+
+    /// Returns `None`
+    pub fn get_record_fields_in_ctxt(
+        _: &mut HashSet<Name>,
+        _: NamespaceRef,
+    ) -> Option<Vec<RecordField>> {
+        None
+    }
+
+    pub fn serialize<S>(decimal: &BigDecimal, serializer: S) -> Result<S::Ok, 
S::Error>
+    where
+        S: Serializer,
+    {
+        let _guard = super::BytesTypeGuard::set(BytesType::Bytes);
+        let decimal_bytes = 
big_decimal_as_bytes(decimal).map_err(S::Error::custom)?;
+        serde_bytes::serialize(&decimal_bytes, serializer)
+    }
+
+    pub fn deserialize<'de, D>(deserializer: D) -> Result<BigDecimal, D::Error>
+    where
+        D: Deserializer<'de>,
+    {
+        let _bytes_guard = super::BytesTypeGuard::set(BytesType::Bytes);
+        let _guard = super::BorrowedGuard::set(true);
+        // We don't use &'de [u8] here as the deserializer doesn't support that
+        let bytes: Vec<u8> = serde_bytes::deserialize(deserializer)?;
+
+        deserialize_big_decimal(&bytes).map_err(D::Error::custom)
+    }
+}
+
+/// (De)serialize [`Option<BigDecimal>`] as a `Schema::Union(Schema::Null, 
Schema::BigDecimal)` instead of a `Schema::Union(Schema::Null, Schema::String)`.
+///
+/// This module is intended to be used through the Serde `with` attribute.
+///
+/// Use [`apache_avro::serde::bigdecimal`] for non-optional big decimals 
values.
+///
+/// When used with different serialization formats, this will write bytes.
+///
+/// See usage with below example:
+/// ```
+/// # use apache_avro::{AvroSchema, BigDecimal};
+/// # use serde::{Deserialize, Serialize};
+/// #[derive(AvroSchema, Serialize, Deserialize)]
+/// struct StructWithBigDecimal {
+///     #[avro(with)]
+///     #[serde(with = "apache_avro::serde::bigdecimal_opt")]
+///     decimal: Option<BigDecimal>,
+/// }
+/// ```
+///
+/// [`Option<BigDecimal>`]: ::bigdecimal::BigDecimal
+/// [`apache_avro::serde::bigdecimal`]: bigdecimal
+pub mod bigdecimal_opt {
+    use std::collections::HashSet;
+
+    use bigdecimal::BigDecimal;
+    use serde::{Deserializer, Serializer, de::Error as _, ser::Error as _};
+
+    use crate::{
+        Schema,
+        bigdecimal::{big_decimal_as_bytes, deserialize_big_decimal},
+        schema::{Name, NamespaceRef, RecordField, UnionSchema},
+        serde::with::BytesType,
+    };
+
+    /// Returns `Schema::Union(Schema::Null, Schema::BigDecimal)`
+    pub fn get_schema_in_ctxt(_: &mut HashSet<Name>, _: NamespaceRef) -> 
Schema {
+        Schema::Union(
+            UnionSchema::new(vec![Schema::Null, Schema::BigDecimal])
+                .unwrap_or_else(|_| unreachable!("This is a valid union")),
+        )
+    }
+
+    /// Returns `None`
+    pub fn get_record_fields_in_ctxt(
+        _: &mut HashSet<Name>,
+        _: NamespaceRef,
+    ) -> Option<Vec<RecordField>> {
+        None
+    }
+
+    pub fn serialize<S>(decimal: &Option<BigDecimal>, serializer: S) -> 
Result<S::Ok, S::Error>
+    where
+        S: Serializer,
+    {
+        let _guard = super::BytesTypeGuard::set(BytesType::Bytes);
+        if let Some(decimal) = decimal {
+            let decimal_bytes = 
big_decimal_as_bytes(decimal).map_err(S::Error::custom)?;
+            serde_bytes::serialize(&Some(decimal_bytes), serializer)
+        } else {
+            serde_bytes::serialize(&None::<Vec<u8>>, serializer)
+        }
+    }
+
+    pub fn deserialize<'de, D>(deserializer: D) -> Result<Option<BigDecimal>, 
D::Error>
+    where
+        D: Deserializer<'de>,
+    {
+        let _bytes_guard = super::BytesTypeGuard::set(BytesType::Bytes);
+        let _guard = super::BorrowedGuard::set(true);
+        let bytes: Option<Vec<u8>> = serde_bytes::deserialize(deserializer)?;
+        if let Some(bytes) = bytes {
+            deserialize_big_decimal(&bytes)
+                .map(Some)
+                .map_err(D::Error::custom)
+        } else {
+            Ok(None)
+        }
+    }
+}
+
+/// (De)serialize an Rust array (`[T; N]`) as an Avro [`Schema::Array`].
+///
+/// This module is intended to be used through the Serde `with` attribute.
+///
+/// Use [`apache_avro::serde::array_opt`] for optional array values.
+///
+/// See usage with below example:
+/// ```
+/// # use apache_avro::AvroSchema;
+/// # use serde::{Deserialize, Serialize};
+/// #[derive(AvroSchema, Serialize, Deserialize)]
+/// struct StructWithArray {
+///     #[avro(with = apache_avro::serde::array::get_schema_in_ctxt::<i32>)]
+///     #[serde(with = "apache_avro::serde::array")]
+///     array: [i32; 10],
+/// }
+/// ```
+///
+/// [`apache_avro::serde::array_opt`]: array_opt
+/// [`Schema::Array`]: crate::schema::Schema::Array
+pub mod array {
+    use std::collections::HashSet;
+
+    use serde::{Deserialize, Deserializer, Serialize, Serializer, de::Error as 
_};
+
+    use crate::{
+        AvroSchemaComponent, Schema,
+        schema::{Name, NamespaceRef, RecordField},
+    };
+
+    /// Returns `Schema::Array(T::get_schema_in_ctxt())`
+    pub fn get_schema_in_ctxt<T: AvroSchemaComponent>(
+        named_schemas: &mut HashSet<Name>,
+        enclosing_namespace: NamespaceRef,
+    ) -> Schema {
+        Schema::array(T::get_schema_in_ctxt(named_schemas, 
enclosing_namespace)).build()
+    }
+
+    /// Returns `None`
+    pub fn get_record_fields_in_ctxt(
+        _: &mut HashSet<Name>,
+        _: NamespaceRef,
+    ) -> Option<Vec<RecordField>> {
+        None
+    }
+
+    pub fn serialize<const N: usize, S, T>(value: &[T; N], serializer: S) -> 
Result<S::Ok, S::Error>
+    where
+        S: Serializer,
+        T: Serialize,
+    {
+        value.as_slice().serialize(serializer)
+    }
+
+    pub fn deserialize<'de, const N: usize, D, T>(deserializer: D) -> 
Result<[T; N], D::Error>
+    where
+        D: Deserializer<'de>,
+        T: Deserialize<'de>,
+    {
+        let bytes = <Vec<T> as Deserialize>::deserialize(deserializer)?;
+        bytes.try_into().map_err(|v: Vec<T>| {
+            D::Error::custom(format!(
+                "Deserialized array has length {} which does not match array 
length of {N}",
+                v.len()
+            ))
+        })
+    }
+}
+
+/// (De)serialize an optional Rust array (`Option<[T; N]>`) as an Avro 
`Schema::Union([Schema::Null, Schema::Array])`.
+///
+/// This module is intended to be used through the Serde `with` attribute.
+///
+/// Use [`apache_avro::serde::array`] for non-optional array values.
+///
+/// When used with different serialization formats, this is equivalent to 
[`serde_bytes`].

Review Comment:
   Is this copy/pasted ? It sounds unrelated here.



##########
avro/src/reader/single_object.rs:
##########
@@ -68,6 +79,26 @@ impl GenericSingleObjectReader {
             Err(io_error) => Err(Details::ReadHeader(io_error).into()),
         }
     }
+
+    pub fn read_deser<T: DeserializeOwned>(&self, reader: &mut impl Read) -> 
AvroResult<T> {
+        let mut header = vec![0; self.expected_header.len()];

Review Comment:
   The header reading and validation logic is the same in `read_value()` and 
`read_deser()`. It could be extracted to a helper function



##########
avro/src/documentation/serde_data_model_to_avro.rs:
##########
@@ -0,0 +1,116 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! # Mapping the Serde data model to the Avro data model
+//!
+//! When manually mapping Rust types to an Avro schema, or the reverse, it is 
important to understand
+//! how the different data models are mapped. When mapping from an Avro schema 
to Rust types,
+//! see [the documentation for the reverse](super::avro_data_model_to_serde).
+//!
+//! Only the schemas generated by the [`AvroSchema`] derive and the mapping as 
defined here are
+//! supported. Any other behavior might change in a minor version.
+//!
+//! The following list is based on [the data model defined by 
Serde](https://serde.rs/data-model.html):
+//! - **14 primitive types**
+//!     - `bool` => [`Schema::Boolean`]
+//!     - `i8`, `i16`, `i32`, `u8`, `u16` => [`Schema::Int`]
+//!     - `i64`, `u32` => [`Schema::Long`]
+//!     - `u64` => [`Schema::Fixed`]`(name: "u64", size: 8)`
+//!         - This is not a `Schema::Long` as that is a signed number of 
maximum 64 bits.
+//!     - `i128` => [`Schema::Fixed`]`(name: "i128", size: 16)`
+//!     - `u128` => [`Schema::Fixed`]`(name: "u128", size: 16)`
+//!     - `f32` => [`Schema::Float`]
+//!     - `f64` => [`Schema::Double`]
+//!     - `char` => [`Schema::String`]
+//!         - Only one character allowed, deserializer will return an error 
for strings with more than one character.
+//! - **string** => [`Schema::String`]
+//! - **byte array** => [`Schema::Bytes`] or [`Schema::Fixed`]
+//! - **option** => [`Schema::Union([Schema::Null, 
_])`](crate::schema::Schema::Union)
+//! - **unit** => [`Schema::Null`]
+//! - **unit struct** => [`Schema::Record`] with the unqualified name equal to 
the name of the struct and zero fields
+//! - **unit variant** => See [Enums](#enums)
+//! - **newtype struct** => [`Schema::Record`] with the unqualified name equal 
to the name of the struct and one field
+//! - **newtype variant** => See [Enums](#enums)
+//! - **seq** => [`Schema::Array`]
+//! - **tuple**
+//!     - For tuples with one element, the schema will be the schema of the 
only element
+//!     - For tuples with more than one element, the schema will be a 
[`Schema::Record`] with as many fields as there are elements.
+//!       The schema must have the attribute `org.apache.avro.rust.tuple` with 
the value set to `true`.
+//!     - **Note:** Serde (de)serializes arrays (`[T; N]`) as tuples. To 
(de)serialize an array as a
+//!       [`Schema::Array`] use [`apache_avro::serde::array`].
+//! - **tuple_struct** => [`Schema::Record`] with the unqualified name equal 
to the name of the struct and as many fields as there are elements
+//!     - **Note:** Tuple structs with 0 or 1 elements will also be 
(de)serialized as a [`Schema::Record`]. This
+//!       is different from normal tuples.
+//! - **tuple_variant** => See [Enums](#enums)
+//! - **map** => [`Schema::Map`]
+//!     - **Note:** the key type of the map will be (de)serialized as a 
[`Schema::String`]
+//! - **struct** => [`Schema::Record`]
+//! - **struct_variant** => See [Enums](#enums)
+//!
+//! ## Enums
+//!
+//! ### Externally tagged
+//! This is the default enum representation for Serde. It can be mapped in 
three ways to the Avro data model.
+//! For all three options it is important that the enum index matches the Avro 
index.
+//! - As a [`Schema::Enum`], this is only possible for enums with only unit 
variants.
+//! - As a [`Schema::Union`] with a [`Schema::Record`] for every variant:
+//!     - **unit_variant** => [`Schema::Record`] named as the variant and with 
no fields.
+//!     - **newtype_variant** => [`Schema::Record`] named as the variant and 
with one field.
+//!       The schema must have the attribute 
`org.apache.avro.rust.union_of_records` with the value set to `true`.
+//!     - **tuple_variant** => [`Schema::Record`] named as the variant and 
with as many fields as there are element.

Review Comment:
   ```suggestion
   //!     - **tuple_variant** => [`Schema::Record`] named as the variant and 
with as many fields as there are elements.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat!: Support enums and tuples in `SchemaAwareSerializer` and implement `SchemaAwareDeserializer` [avro-rs]

Reply via email to