DinGo4DEV opened a new issue, #2002:
URL: https://github.com/apache/iceberg-python/issues/2002

   ### Apache Iceberg version
   
   0.9.0 (latest release)
   
   ### Please describe the bug 🐞
   
   ## Description
   When using UUIDType as a BucketTransform Partition, an error occurs during 
table operations such as upsert. The issue appears to be related to the 
partition key changing from int to str, which causes a type mismatch when the 
Avro encoder attempts to write an integer.
   
   ## Steps to Reproduce
   1. Create a table with UUIDType column
   2. Configure the table to use BucketTransform on that column for partitioning
   3. Attempt to upsert data into the table
   
   ## Current Behavior
   The operation fails with a TypeError as the system attempts to perform 
integer operations on a string value.
   
   ## Expected Behavior
   The operation should properly handle UUIDType columns when used with 
BucketTransform partitioning. The uuid bucket partition value should be `1` 
instead of `"1"`
   
   ## Error Stack Trace
   ```python
   Traceback (most recent call last):
       File "test_upsert.py", line 248, in <module>
           result = table.upsert(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       File ".venv\Lib\site-packages\pyiceberg\table\__init__.py", line 1216, 
in upsert
           tx.append(rows_to_insert)
       File ".venv\Lib\site-packages\pyiceberg\table\__init__.py", line 470, in 
append
           with self._append_snapshot_producer(snapshot_properties) as 
append_files:
       File ".venv\Lib\site-packages\pyiceberg\table\update\__init__.py", line 
71, in __exit__
           self.commit()
       File ".venv\Lib\site-packages\pyiceberg\table\update\__init__.py", line 
67, in commit
           self._transaction._apply(*self._commit())
                                                               ^^^^^^^^^^^^^^
       File ".venv\Lib\site-packages\pyiceberg\table\update\snapshot.py", line 
242, in _commit
           new_manifests = self._manifests()
                                           ^^^^^^^^^^^^^^^^^
       File ".venv\Lib\site-packages\pyiceberg\table\update\snapshot.py", line 
201, in _manifests
           return self._process_manifests(added_manifests.result() + 
delete_manifests.result() + existing_manifests.result())
                                                                        
^^^^^^^^^^^^^^^^^^^^^^^^
       File "~\Python312\Lib\concurrent\futures\_base.py", line 456, in result
           return self.__get_result()
                        ^^^^^^^^^^^^^^^^^^^
       File "~\Python312\Lib\concurrent\futures\_base.py", line 401, in 
__get_result
           raise self._exception
       File "~\Python312\Lib\concurrent\futures\thread.py", line 58, in run
           result = self.fn(*self.args, **self.kwargs)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       File ".venv\Lib\site-packages\pyiceberg\table\update\snapshot.py", line 
159, in _write_added_manifest        
           writer.add(
       File ".venv\Lib\site-packages\pyiceberg\manifest.py", line 847, in add
           
self.add_entry(self._reused_entry_wrapper._wrap_append(self._snapshot_id, None, 
entry.data_file))
       File ".venv\Lib\site-packages\pyiceberg\manifest.py", line 840, in 
add_entry
           self._writer.write_block([self.prepare_entry(entry)])
       File ".venv\Lib\site-packages\pyiceberg\avro\file.py", line 281, in 
write_block       
           self.writer.write(block_content_encoder, obj)
           writer.write(encoder, val[pos] if pos is not None else None)
       File ".venv\Lib\site-packages\pyiceberg\avro\writer.py", line 176, in 
write
           writer.write(encoder, val[pos] if pos is not None else None)
           writer.write(encoder, val[pos] if pos is not None else None)
       File ".venv\Lib\site-packages\pyiceberg\avro\writer.py", line 176, in 
write
           writer.write(encoder, val[pos] if pos is not None else None)
       File ".venv\Lib\site-packages\pyiceberg\avro\writer.py", line 66, in 
write
           encoder.write_int(val)
       File ".venv\Lib\site-packages\pyiceberg\avro\encoder.py", line 45, in 
write_int
           datum = (integer << 1) ^ (integer >> 63)
   ```
   
   ## Potential Fix
   The issue appears to be in the type handling in `partition_record_value` 
function when initial `PartitionKey` with the `PartitionFieldValue`.
   
   
https://github.com/apache/iceberg-python/blob/996a7ba4dbf4afdb3d46689f1715206b1c355f2a/pyiceberg/partitioning.py#L385-L406
   
   Would add Union type for `value` to handle the **transformed** value.
   
https://github.com/apache/iceberg-python/blob/996a7ba4dbf4afdb3d46689f1715206b1c355f2a/pyiceberg/partitioning.py#L469-L471
   
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to