andersbogsnes opened a new issue, #1987:
URL: https://github.com/apache/iceberg-python/issues/1987

   ### Apache Iceberg version
   
   0.9.0 (latest release)
   
   ### Please describe the bug 🐞
   
   (I'm on 0.9.1 but the dropdown is missing that one)
   
   Hi, I was trying to add partition transforms to an Iceberg table, but I get 
a `ModuleNotFoundError: No module named 'pyiceberg_core'` when I try to insert 
data after updating the transforms. The full traceback is below.
   
   For reference, I install pyiceberg as `pyiceberg[snappy,s3fs]`. Looking 
through the pyproject.toml, pyiceberg_core is listed as an optional dependency, 
but I'm guessing it's now being relied on in the `.append` method.
   
   ```python
   ---------------------------------------------------------------------------
   ModuleNotFoundError                       Traceback (most recent call last)
   Cell In[7], line 1
   ----> 1 
house_prices_t.append(df.to_arrow().cast(house_prices_schema.as_arrow()))
   
   File 
[/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:1229](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=1228),
 in Table.append(self, df, snapshot_properties)
      1221 """
      1222 Shorthand API for appending a PyArrow table to the table.
      1223 
      (...)   1226     snapshot_properties: Custom properties to be added to 
the snapshot summary
      1227 """
      1228 with self.transaction() as tx:
   -> 1229     tx.append(df=df, snapshot_properties=snapshot_properties)
   
   File 
[/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:473](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=472),
 in Transaction.append(self, df, snapshot_properties)
       470 with self._append_snapshot_producer(snapshot_properties) as 
append_files:
       471     # skip writing data files if the dataframe is empty
       472     if df.shape[0] > 0:
   --> 473         data_files = list(
       474             _dataframe_to_data_files(
       475                 table_metadata=self.table_metadata, 
write_uuid=append_files.commit_uuid, df=df, io=self._table.io
       476             )
       477         )
       478         for data_file in data_files:
       479             append_files.append_data_file(data_file)
   
   File 
[/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2601](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2600),
 in _dataframe_to_data_files(table_metadata, df, io, write_uuid, counter)
      2590     yield from write_file(
      2591         io=io,
      2592         table_metadata=table_metadata,
      (...)   2598         ),
      2599     )
      2600 else:
   -> 2601     partitions = _determine_partitions(spec=table_metadata.spec(), 
schema=table_metadata.schema(), arrow_table=df)
      2602     yield from write_file(
      2603         io=io,
      2604         table_metadata=table_metadata,
      (...)   2617         ),
      2618     )
   
   File 
[/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2648](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2647),
 in _determine_partitions(spec, schema, arrow_table)
      2645 for partition, name in zip(spec.fields, partition_fields):
      2646     source_field = schema.find_field(partition.source_id)
      2647     arrow_table = arrow_table.append_column(
   -> 2648         name, 
partition.transform.pyarrow_transform(source_field.field_type)(arrow_table[source_field.name])
      2649     )
      2651 unique_partition_fields = 
arrow_table.select(partition_fields).group_by(partition_fields).aggregate([])
      2653 table_partitions = []
   
   File 
[/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py:360](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py#line=359),
 in BucketTransform.pyarrow_transform(self, source)
       359 def pyarrow_transform(self, source: IcebergType) -> 
"Callable[[pa.Array], pa.Array]":
   --> 360     from pyiceberg_core import transform as pyiceberg_core_transform
       362     return 
self._pyiceberg_transform_wrapper(pyiceberg_core_transform.bucket, 
self._num_buckets)
   
   ModuleNotFoundError: No module named 'pyiceberg_core'
   ```
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to