jqin61 opened a new issue, #348:
URL: https://github.com/apache/iceberg-python/issues/348

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   Hi I noticed the existing append and overwrite functions would break if any 
column of the input arrow table consists only of nulls.  For example:
   
   ```
   year_data = pa.array([None], type=pa.int64())
   n_legs_data = pa.array([4], type=pa.int64())  # Assuming legs are always 
non-null and integer
   animals_data = pa.array(["Flamingo"], type=pa.string())
   
   # Create an arrow table with one column full of nulls
   arrow_table_nulls = pa.Table.from_arrays([year_data, n_legs_data, 
animals_data], names=['year', 'n_legs', 'animals'])
   
   catalog = load_catalog("lacus", **properties)
   iceberg_table_nulls = catalog.load_table('test.append_arrow_with_nulls')
   iceberg_table_nulls.append(table)
   ```
   would break in the step of metadata collection of the written parquet file. 
   
   The error looks like this:
   ```
   ...
   
   /lib/python3.10/site-packages/pyiceberg/io/pyarrow.py in 
fill_parquet_file_metadata(data_file, parquet_metadata, stats_columns, 
parquet_column_mapping)
      1689 
      1690     for k, agg in col_aggs.items():
   -> 1691         _min = agg.min_as_bytes()
      1692         if _min is not None:
      1693             lower_bounds[k] = _min
   
   /lib/python3.10/site-packages/pyiceberg/io/pyarrow.py in min_as_bytes(self)
      1341 
      1342     def min_as_bytes(self) -> bytes:
   -> 1343         return self.serialize(
      1344             self.current_min
      1345             if self.trunc_length is None
   
   /lib/python3.10/site-packages/pyiceberg/io/pyarrow.py in serialize(self, 
value)
      1332 
      1333     def serialize(self, value: Any) -> bytes:
   -> 1334         return to_bytes(self.primitive_type, value)
      1335 
      1336     def update_min(self, val: Any) -> None:
   
   /lib/python3.10/functools.py in wrapper(*args, **kw)
       887                             '1 positional argument')
       888 
   --> 889         return dispatch(args[0].__class__)(*args, **kw)
       890 
       891     funcname = getattr(func, '__name__', 'singledispatch function')
   
   /lib/python3.10/site-packages/pyiceberg/conversions.py in _(_, value)
       183 @to_bytes.register(LongType)
       184 def _(_: PrimitiveType, value: int) -> bytes:
   --> 185     return _LONG_STRUCT.pack(value)
       186 
       187 
   
   error: required argument is not an integer
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to