anentropic opened a new issue, #1337:
URL: https://github.com/apache/iceberg-python/issues/1337

   ### Apache Iceberg version
   
   0.8.0 (latest release)
   
   ### Please describe the bug 🐞
   
   It looks like`transform` is intended to be an optional field (?):
   
   ```python
   class SortField(IcebergBaseModel):
       """Sort order field.
   
       Args:
         source_id (int): Source column id from the table’s schema.
         transform (str): Transform that is used to produce values to be sorted 
on from the source column.
                          This is the same transform as described in partition 
transforms.
         direction (SortDirection): Sort direction, that can only be either asc 
or desc.
         null_order (NullOrder): Null order that describes the order of null 
values when sorted. Can only be either nulls-first or nulls-last.
       """
   
       def __init__(
           self,
           source_id: Optional[int] = None,
           transform: Optional[Union[Transform[Any, Any], 
Callable[[IcebergType], Transform[Any, Any]]]] = None,
           direction: Optional[SortDirection] = None,
           null_order: Optional[NullOrder] = None,
           **data: Any,
       ):
           if source_id is not None:
               data["source-id"] = source_id
           if transform is not None:
               data["transform"] = transform
           if direction is not None:
               data["direction"] = direction
           if null_order is not None:
               data["null-order"] = null_order
           super().__init__(**data)
   ```
   
   But if I don't specify `SortField(source_id=field.field_id)` or pass None 
`SortField(source_id=field.field_id, transform=None)` then I get pydantic 
validation error:
   
   ```
   ValidationError: 1 validation error for SortField
   transform
     Field required [type=missing, input_value={'source-id': 4, 'directi...: 
NullOrder.NULLS_FIRST}, input_type=dict]
       For further information visit https://errors.pydantic.dev/2.9/v/missing
   ```
   
   `SortField(source_id=field.field_id, transform=IdentityTransform())` works
   
   `SortField(source_id=field.field_id, transform=IDENTITY)` also works, but 
type checkers don't like it
   
   I think both problems stem from here:
   ```python
       transform: Annotated[  # type: ignore
           Transform,
           BeforeValidator(parse_transform),
           PlainSerializer(lambda c: str(c), return_type=str),  # pylint: 
disable=W0108
           WithJsonSchema({"type": "string"}, mode="serialization"),
       ] = Field()
   ```
   the type annotation doesn't make it `Optional`
   
   and `BeforeValidator(parse_transform)` uses `parse_transform` to turn the 
`IDENTITY` string constant into `IdentityTransform()` so the type you pass 
doesn't match the annotation
   
   for the latter one, there is a method here 
https://docs.pydantic.dev/2.0/usage/types/custom/#handling-third-party-types 
that would allow passing string constant that is converted to an instance of 
the annotated `Transform` type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to