andersbogsnes opened a new issue, #1987: URL: https://github.com/apache/iceberg-python/issues/1987
### Apache Iceberg version 0.9.0 (latest release) ### Please describe the bug 🐞 (I'm on 0.9.1 but the dropdown is missing that one) Hi, I was trying to add partition transforms to an Iceberg table, but I get a `ModuleNotFoundError: No module named 'pyiceberg_core'` when I try to insert data after updating the transforms. The full traceback is below. For reference, I install pyiceberg as `pyiceberg[snappy,s3fs]`. Looking through the pyproject.toml, pyiceberg_core is listed as an optional dependency, but I'm guessing it's now being relied on in the `.append` method. ```python --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[7], line 1 ----> 1 house_prices_t.append(df.to_arrow().cast(house_prices_schema.as_arrow())) File [/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:1229](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=1228), in Table.append(self, df, snapshot_properties) 1221 """ 1222 Shorthand API for appending a PyArrow table to the table. 1223 (...) 1226 snapshot_properties: Custom properties to be added to the snapshot summary 1227 """ 1228 with self.transaction() as tx: -> 1229 tx.append(df=df, snapshot_properties=snapshot_properties) File [/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py:473](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py#line=472), in Transaction.append(self, df, snapshot_properties) 470 with self._append_snapshot_producer(snapshot_properties) as append_files: 471 # skip writing data files if the dataframe is empty 472 if df.shape[0] > 0: --> 473 data_files = list( 474 _dataframe_to_data_files( 475 table_metadata=self.table_metadata, write_uuid=append_files.commit_uuid, df=df, io=self._table.io 476 ) 477 ) 478 for data_file in data_files: 479 append_files.append_data_file(data_file) File [/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2601](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2600), in _dataframe_to_data_files(table_metadata, df, io, write_uuid, counter) 2590 yield from write_file( 2591 io=io, 2592 table_metadata=table_metadata, (...) 2598 ), 2599 ) 2600 else: -> 2601 partitions = _determine_partitions(spec=table_metadata.spec(), schema=table_metadata.schema(), arrow_table=df) 2602 yield from write_file( 2603 io=io, 2604 table_metadata=table_metadata, (...) 2617 ), 2618 ) File [/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:2648](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=2647), in _determine_partitions(spec, schema, arrow_table) 2645 for partition, name in zip(spec.fields, partition_fields): 2646 source_field = schema.find_field(partition.source_id) 2647 arrow_table = arrow_table.append_column( -> 2648 name, partition.transform.pyarrow_transform(source_field.field_type)(arrow_table[source_field.name]) 2649 ) 2651 unique_partition_fields = arrow_table.select(partition_fields).group_by(partition_fields).aggregate([]) 2653 table_partitions = [] File [/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py:360](http://localhost:8080/usr/local/lib/python3.12/site-packages/pyiceberg/transforms.py#line=359), in BucketTransform.pyarrow_transform(self, source) 359 def pyarrow_transform(self, source: IcebergType) -> "Callable[[pa.Array], pa.Array]": --> 360 from pyiceberg_core import transform as pyiceberg_core_transform 362 return self._pyiceberg_transform_wrapper(pyiceberg_core_transform.bucket, self._num_buckets) ModuleNotFoundError: No module named 'pyiceberg_core' ``` ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org