jqin61 opened a new issue, #348: URL: https://github.com/apache/iceberg-python/issues/348
### Apache Iceberg version main (development) ### Please describe the bug 🐞 Hi I noticed the existing append and overwrite functions would break if any column of the input arrow table consists only of nulls. For example: ``` year_data = pa.array([None], type=pa.int64()) n_legs_data = pa.array([4], type=pa.int64()) # Assuming legs are always non-null and integer animals_data = pa.array(["Flamingo"], type=pa.string()) # Create an arrow table with one column full of nulls arrow_table_nulls = pa.Table.from_arrays([year_data, n_legs_data, animals_data], names=['year', 'n_legs', 'animals']) catalog = load_catalog("lacus", **properties) iceberg_table_nulls = catalog.load_table('test.append_arrow_with_nulls') iceberg_table_nulls.append(table) ``` would break in the step of metadata collection of the written parquet file. The error looks like this: ``` ... /lib/python3.10/site-packages/pyiceberg/io/pyarrow.py in fill_parquet_file_metadata(data_file, parquet_metadata, stats_columns, parquet_column_mapping) 1689 1690 for k, agg in col_aggs.items(): -> 1691 _min = agg.min_as_bytes() 1692 if _min is not None: 1693 lower_bounds[k] = _min /lib/python3.10/site-packages/pyiceberg/io/pyarrow.py in min_as_bytes(self) 1341 1342 def min_as_bytes(self) -> bytes: -> 1343 return self.serialize( 1344 self.current_min 1345 if self.trunc_length is None /lib/python3.10/site-packages/pyiceberg/io/pyarrow.py in serialize(self, value) 1332 1333 def serialize(self, value: Any) -> bytes: -> 1334 return to_bytes(self.primitive_type, value) 1335 1336 def update_min(self, val: Any) -> None: /lib/python3.10/functools.py in wrapper(*args, **kw) 887 '1 positional argument') 888 --> 889 return dispatch(args[0].__class__)(*args, **kw) 890 891 funcname = getattr(func, '__name__', 'singledispatch function') /lib/python3.10/site-packages/pyiceberg/conversions.py in _(_, value) 183 @to_bytes.register(LongType) 184 def _(_: PrimitiveType, value: int) -> bytes: --> 185 return _LONG_STRUCT.pack(value) 186 187 error: required argument is not an integer ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org